Search | arXiv e-print repository

arXiv:2406.15091 [pdf, other]

Probing light sterile neutrinos in left-right symmetric models with displaced vertices and neutrinoless double beta decay

Authors: Jordy de Vries, Herbi K. Dreiner, Jelle Groot, Julian Y. Günther, Zeren Simon Wang

Abstract: An investigation of relatively light (GeV-scale), long-lived right-handed neutrinos is performed within minimal left-right symmetric models using the neutrino-extended Standard Model Effective Field Theory framework. Light sterile neutrinos can be produced through rare decays of kaons, $D$-mesons, and $B$-mesons at the Large Hadron Collider (LHC) and the Long-Baseline Neutrino Facility (LBNF) of F… ▽ More An investigation of relatively light (GeV-scale), long-lived right-handed neutrinos is performed within minimal left-right symmetric models using the neutrino-extended Standard Model Effective Field Theory framework. Light sterile neutrinos can be produced through rare decays of kaons, $D$-mesons, and $B$-mesons at the Large Hadron Collider (LHC) and the Long-Baseline Neutrino Facility (LBNF) of Fermilab. Their decays could result in displaced vertices, which can be reconstructed. By performing Monte-Carlo simulations, we assess the sensitivities of the future LHC far-detector experiments ANUBIS, CODEX-b, FACET, FASER(2), MoEDAL-MAPP1(2), MATHUSLA, the recently approved beam-dump experiment SHiP, and the upcoming neutrino experiment DUNE at the LBNF, to the right-handed gauge-boson mass $M_{W_R}$ as functions of neutrino masses. We find that DUNE and SHiP could be sensitive to right-handed gauge-boson masses up to $\sim 25$ TeV. We compare this reach to indirect searches such as neutrinoless double beta decay, finding that displaced-vertex searches are very competitive. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15030 [pdf, ps, other]

Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.14962 [pdf, other]

Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning

Authors: Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang

Abstract: Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addr… ▽ More Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the state-of-the-art (SOTA) performance demonstrates the superiority of our approach. The code is available at https://github.com/lisuyi/PBadv_czsl. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14903 [pdf, other]

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

Authors: Leyan Wang, Yonggang **, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

Abstract: As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individua… ▽ More As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individuals' group identities. To address this gap, we introduce GIEBench, a comprehensive benchmark that includes 11 identity dimensions, covering 97 group identities with a total of 999 single-choice questions related to specific group identities. GIEBench is designed to evaluate the empathy of LLMs when presented with specific group identities such as gender, age, occupation, and race, emphasizing their ability to respond from the standpoint of the identified group. This supports the ongoing development of empathetic LLM applications tailored to users with different identities. Our evaluation of 23 LLMs revealed that while these LLMs understand different identity standpoints, they fail to consistently exhibit equal empathy across these identities without explicit instructions to adopt those perspectives. This highlights the need for improved alignment of LLMs with diverse values to better accommodate the multifaceted nature of human identities. Our datasets are available at https://github.com/GIEBench/GIEBench. △ Less

Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14869 [pdf, other]

Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, **gran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms have emerged. In particular, deep learning (DL) has shown great benefits in automatically extracting complex and subtle features from raw data with high classification accuracy. However, DL algorithms face the computational cost problem as the difficulty of the RFF task and the size of the DNN have increased dramatically. To address the above challenge, this paper proposes a novel costeffective early-exit neural network consisting of a complex-valued neural network (CVNN) backbone with multiple random forest branches, called hybrid CVNN-RF. Unlike conventional studies that use a single fixed DL model to process all RF samples, our hybrid CVNN-RF considers differences in the recognition difficulty of RF samples and introduces an early-exit mechanism to dynamically process the samples. When processing "easy" samples that can be well classified with high confidence, the hybrid CVNN-RF can end early at the random forest branch to reduce computational cost. Conversely, subsequent network layers will be activated to ensure accuracy. To further improve the early-exit rate, an automated multi-dimensional early-exit strategy is proposed to achieve scheduling control from multiple dimensions within the network depth and classification category. Finally, our experiments on the public ADS-B dataset show that the proposed algorithm can reduce the computational cost by 83% while improving the accuracy by 1.6% under a classification task with 100 categories. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted by IEEE Internet of Things Journal

arXiv:2406.14863 [pdf, other]

Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

Authors: Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang

Abstract: Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we… ▽ More Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we propose a novel hardware-software co-design approach for DNN intellectual property (IP) protection that capitalizes on the inherent aging characteristics of circuits and a novel differential orientation fine-tuning (DOFT) to ensure effective protection. Hardware-wise, we employ random aging to produce authorized chips. This process circumvents the need for chip redesign, thereby eliminating any additional hardware overhead during the inference procedure of DNNs. Moreover, the authorized chips demonstrate a considerable disparity in DNN inference performance when compared to unauthorized chips. Software-wise, we propose a novel DOFT, which allows pre-trained DNNs to maintain their original accuracy on authorized chips with minimal fine-tuning, while the model's performance on unauthorized chips is reduced to random guessing. Extensive experiments on various models, including MLP, VGG, ResNet, Mixer, and SwinTransformer, with lightweight binary and practical multi-bit weights demonstrate that the proposed method achieves effective IP protection, with only 10\% accuracy on unauthorized chips, while preserving nearly the original accuracy on authorized ones. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Design Automation Conference 2024

arXiv:2406.14859 [pdf, other]

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Authors: Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei

Abstract: The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of… ▽ More The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14774 [pdf, other]

Evaluating Numerical Reasoning in Text-to-Image Models

Authors: Ivana Kajić, Olivia Wiles, Isabela Albuquerque, Matthias Bauer, Su Wang, Jordi Pont-Tuset, Aida Nematzadeh

Abstract: Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range of text-to-image models on numerical reasoning tasks of varying difficulty, and show that even the most advanced models have only rudimentary numerical skills. Specifically, their ability to correctly… ▽ More Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range of text-to-image models on numerical reasoning tasks of varying difficulty, and show that even the most advanced models have only rudimentary numerical skills. Specifically, their ability to correctly generate an exact number of objects in an image is limited to small numbers, it is highly dependent on the context the number term appears in, and it deteriorates quickly with each successive number. We also demonstrate that models have poor understanding of linguistic quantifiers (such as "a few" or "as many as"), the concept of zero, and struggle with more advanced concepts such as partial quantities and fractional representations. We bundle prompts, generated images and human annotations into GeckoNum, a novel benchmark for evaluation of numerical reasoning. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14230 [pdf, other]

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

Authors: Han Jiang, Xiaoyuan Yi, Zhihua Wei, Shu Wang, Xing Xie

Abstract: Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs,… ▽ More Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs, but they suffer from evaluation chronoeffect, that is, as models rapidly evolve, existing data becomes leaked or undemanding, overestimating ever-develo** LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach that dynamically probes the underlying moral baselines of LLMs. Distinct from previous adaptive testing methods that rely on static datasets with limited difficulty, GETA incorporates an iteratively-updated item generator which infers each LLM's moral boundaries and generates difficulty-tailored testing items, accurately reflecting the true alignment extent. This process theoretically learns a joint distribution of item and model response, with item difficulty and value conformity as latent variables, where the generator co-evolves with the LLM, addressing chronoeffect. We evaluate various popular LLMs with diverse capabilities and demonstrate that GETA can create difficulty-matching testing items and more accurately assess LLMs' values, better consistent with their performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.14194 [pdf, other]

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and narrow sources of bias. To address this problem, we introduce VLBiasBench, a benchmark aimed at evaluating biases in LVLMs comprehensively. In VLBiasBench, we construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status). To create a large-scale dataset, we use Stable Diffusion XL model to generate 46,848 high-quality images, which are combined with different questions to form 128,342 samples. These questions are categorized into open and close ended types, fully considering the sources of bias and comprehensively evaluating the biases of LVLM from multiple perspectives. We subsequently conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models. Our benchmark is available at https://github.com/Xiangkui-Cao/VLBiasBench. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14186 [pdf, other]

CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

Authors: Tingwei Liu, Miao Zhang, Leiye Liu, Jialong Zhong, Shuyao Wang, Yongri Piao, Huchuan Lu

Abstract: Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion… ▽ More Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion model features poses a great challenge to prostate segmentation. In this paper, we proposed CriDiff, a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation. The CIS maximizes the use of multi-level features by efficiently harnessing the complementarity of high and low-level features. To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS: the Boundary Enhance Conditioner (BEC) and the Core Enhance Conditioner (CEC), which discriminatively model the image edge regions and non-edge regions, respectively. Moreover, the GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted in MICCAI 2024

arXiv:2406.14117 [pdf, other]

An Investigation of Prompt Variations for Zero-shot LLM-based Rankers

Authors: Shuoqi Sun, Shengyao Zhuang, Shuai Wang, Guido Zuccon

Abstract: We provide a systematic understanding of the impact of specific components and wordings used in prompts on the effectiveness of rankers based on zero-shot Large Language Models (LLMs). Several zero-shot ranking methods based on LLMs have recently been proposed. Among many aspects, methods differ across (1) the ranking algorithm they implement, e.g., pointwise vs. listwise, (2) the backbone LLMs us… ▽ More We provide a systematic understanding of the impact of specific components and wordings used in prompts on the effectiveness of rankers based on zero-shot Large Language Models (LLMs). Several zero-shot ranking methods based on LLMs have recently been proposed. Among many aspects, methods differ across (1) the ranking algorithm they implement, e.g., pointwise vs. listwise, (2) the backbone LLMs used, e.g., GPT3.5 vs. FLAN-T5, (3) the components and wording used in prompts, e.g., the use or not of role-definition (role-playing) and the actual words used to express this. It is currently unclear whether performance differences are due to the underlying ranking algorithm, or because of spurious factors such as better choice of words used in prompts. This confusion risks to undermine future research. Through our large-scale experimentation and analysis, we find that ranking algorithms do contribute to differences between methods for zero-shot LLM ranking. However, so do the LLM backbones -- but even more importantly, the choice of prompt components and wordings affect the ranking. In fact, in our experiments, we find that, at times, these latter elements have more impact on the ranker's effectiveness than the actual ranking algorithms, and that differences among ranking methods become more blurred when prompt variations are considered. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14114 [pdf, other]

doi 10.1145/3658644.3670299

Dye4AI: Assuring Data Boundary on Generative AI Services

Authors: Shu Wang, Kun Sun, Yan Zhai

Abstract: Generative artificial intelligence (AI) is versatile for various applications, but security and privacy concerns with third-party AI vendors hinder its broader adoption in sensitive scenarios. Hence, it is essential for users to validate the AI trustworthiness and ensure the security of data boundaries. In this paper, we present a dye testing system named Dye4AI, which injects crafted trigger data… ▽ More Generative artificial intelligence (AI) is versatile for various applications, but security and privacy concerns with third-party AI vendors hinder its broader adoption in sensitive scenarios. Hence, it is essential for users to validate the AI trustworthiness and ensure the security of data boundaries. In this paper, we present a dye testing system named Dye4AI, which injects crafted trigger data into human-AI dialogue and observes AI responses towards specific prompts to diagnose data flow in AI model evolution. Our dye testing procedure contains 3 stages: trigger generation, trigger insertion, and trigger retrieval. First, to retain both uniqueness and stealthiness, we design a new trigger that transforms a pseudo-random number to a intelligible format. Second, with a custom-designed three-step conversation strategy, we insert each trigger item into dialogue and confirm the model memorizes the new trigger knowledge in the current session. Finally, we routinely try to recover triggers with specific prompts in new sessions, as triggers can present in new sessions only if AI vendors leverage user data for model fine-tuning. Extensive experiments on six LLMs demonstrate our dye testing scheme is effective in ensuring the data boundary, even for models with various architectures and parameter sizes. Also, larger and premier models tend to be more suitable for Dye4AI, e.g., trigger can be retrieved in OpenLLaMa-13B even with only 2 insertions per trigger item. Moreover, we analyze the prompt selection in dye testing, providing insights for future testing systems on generative AI services. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13891 [pdf, other]

DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

Authors: Zhuoxiao Chen, Zixin Wang, Sen Wang, Zi Huang, Yadan Luo

Abstract: LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizab… ▽ More LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizability of pre-trained models, which creates a sharp loss landscape during training. Such sharpness, when encountered during testing, can precipitate significant performance declines, even with minor data variations. To address the aforementioned challenges, we propose \textbf{dual-perturbation optimization (DPO)} for \textbf{\underline{T}est-\underline{t}ime \underline{A}daptation in \underline{3}D \underline{O}bject \underline{D}etection (TTA-3OD)}. We minimize the sharpness to cultivate a flat loss landscape to ensure model resiliency to minor data variations, thereby enhancing the generalization of the adaptation process. To fully capture the inherent variability of the test point clouds, we further introduce adversarial perturbation to the input BEV features to better simulate the noisy test environment. As the dual perturbation strategy relies on trustworthy supervision signals, we utilize a reliable Hungarian matcher to filter out pseudo-labels sensitive to perturbations. Additionally, we introduce early Hungarian cutoff to avoid error accumulation from incorrect pseudo-labels by halting the adaptation process. Extensive experiments across three types of transfer tasks demonstrate that the proposed DPO significantly surpasses previous state-of-the-art approaches, specifically on Waymo $\rightarrow$ KITTI, outperforming the most competitive baseline by 57.72\% in $\text{AP}_\text{3D}$ and reaching 91\% of the fully supervised upper bound. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 13 pages, 7 figures

arXiv:2406.13862 [pdf, other]

Knowledge Graph-Enhanced Large Language Models via Path Selection

Authors: Haochen Liu, Song Wang, Yaochen Zhu, Yushun Dong, Jundong Li

Abstract: Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most e… ▽ More Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most existing explorations rely on LLMs themselves to perform KG knowledge extraction, which is highly inflexible as LLMs can only provide binary judgment on whether a certain knowledge (e.g., a knowledge path in KG) should be used. In addition, LLMs tend to pick only knowledge with direct semantic relationship with the input text, while potentially useful knowledge with indirect semantics can be ignored. In this work, we propose a principled framework KELP with three stages to handle the above problems. Specifically, KELP is able to achieve finer granularity of flexible knowledge extraction by generating scores for knowledge paths with input texts via latent semantic matching. Meanwhile, knowledge paths with indirect semantic relationships with the input text can also be considered via trained encoding between the selected paths in KG and the input text. Experiments on real-world datasets validate the effectiveness of KELP. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13698 [pdf, other]

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Authors: Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

Abstract: Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper… ▽ More Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13490 [pdf, other]

The Surprising Benefits of Base Rate Neglect in Robust Aggregation

Authors: Yuqing Kong, Shu Wang, Ying Wang

Abstract: Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain deg… ▽ More Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain degree of base rate neglect helps with robust forecast aggregation. Specifically, we consider a forecast aggregation problem with two experts who each predict a binary world state after observing private signals. Unlike previous work, we model experts exhibiting base rate neglect, where they incorporate the base rate information to degree $λ\in[0,1]$, with $λ=0$ indicating complete ignorance and $λ=1$ perfect Bayesian updating. To evaluate aggregators' performance, we adopt Arieli et al. (2018)'s worst-case regret model, which measures the maximum regret across the set of considered information structures compared to an omniscient benchmark. Our results reveal the surprising V-shape of regret as a function of $λ$. That is, predictions with an intermediate incorporating degree of base rate $λ<1$ can counter-intuitively lead to lower regret than perfect Bayesian posteriors with $λ=1$. We additionally propose a new aggregator with low regret robust to unknown $λ$. Finally, we conduct an empirical study to test the base rate neglect model and evaluate the performance of various aggregators. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13284 [pdf]

The association of domain-specific physical activity and sedentary activity with stroke: A prospective cohort study

Authors: Xinyi He, Shidi Wang, Yi Li, Jiucun Wang, Guangrui Yang, Jun Chen, Zixin Hu

Abstract: Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with str… ▽ More Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with stroke. Methods Our analysis included 30,400 participants aged 20+ years from 2007 to 2018 National Health and Nutrition Examination Survey (NHANES). Stroke was identified based on the participant's self-reported diagnoses from previous medical consultations, and PA and SA were self-reported. Multivariable logistic and restricted cubic spline models were used to assess the associations. Results Participants achieving PA guidelines (performing PA more than 150 min/week) were 35.7% less likely to have a stroke based on both the total PA (odds ratio [OR] 0.643, 95% confidence interval [CI] 0.523-0.790) and LTPA (OR 0.643, 95% CI 0.514-0.805), while OPA or TPA did not demonstrate lower stroke risk. Furthermore, participants with less than 7.5 h/day SA levels were 21.6% (OR 0.784, 95% CI 0.665-0.925) less likely to have a stroke. The intensities of total PA and LTPA exhibited nonlinear U-shaped associations with stroke risk. In contrast, those of OPA and TPA showed negative linear associations, while SA intensities were positively linearly correlated with stroke risk. Conclusions LTPA, but not OPA or TPA, was associated with a lower risk of stroke at any amount, suggesting that significant cardiovascular health would benefit from increased PA. Additionally, the positive association between SA and stroke indicated that prolonged sitting was detrimental to cardiovascular health. Overall, increased PA within a reasonable range reduces the risk of stroke, while increased SA elevates it. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13241 [pdf, ps, other]

Achirality of Sol 3-Manifolds, Stevenhagen Conjecture and Shimizu's L-series

Authors: Ye Tian, Shicheng Wang, Zhongzi Wang

Abstract: A closed orientable manifold is {\em achiral} if it admits an orientation reversing homeomorphism. A commensurable class of closed manifolds is achiral if it contains an achiral element, or equivalently, each manifold in $\CM$ has an achiral finite cover. Each commensurable class containing non-orientable elements must be achiral. It is natural to wonder how many commensurable classes are ac… ▽ More A closed orientable manifold is {\em achiral} if it admits an orientation reversing homeomorphism. A commensurable class of closed manifolds is achiral if it contains an achiral element, or equivalently, each manifold in $\CM$ has an achiral finite cover. Each commensurable class containing non-orientable elements must be achiral. It is natural to wonder how many commensurable classes are achiral and how many achiral classes have non-orientable elements. We study this problem for Sol 3-manifolds. Each commensurable class $\CM$ of Sol 3-manifold has a complete topological invariant $D_{\CM}$, the discriminant of $\CM$. Our main result is: (1) Among all commensurable classes of Sol 3-manifolds, there are infinitely many achiral classes; however ordered by discriminants, the density of achiral commensurable classes is 0. (2) Among all achiral commensurable classes of Sol 3-manifolds, ordered by discriminants, the density of classes containing non-orientable elements is $1-ρ$, where $$ρ:=\prod_{j=1}^\infty \left(1+2^{-j}\right)^{-1} = 0.41942\cdots.$$ △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2406.13179 [pdf, other]

Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.13060 [pdf, other]

Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization

Authors: Zhang Wan, Shuo Wang, Xudong Zhang

Abstract: Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter d… ▽ More Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter data from remote sensing instruments. However, cloud cover in optical remote sensing images variably obscures ground information, leading to blurred or missing surface observations. As such, this paper aims at altimeter-based machine learning solutions to automatically locate ISWs. The challenges, however, lie in the following two aspects: 1) the altimeter data has low resolution, which requires a strong machine learner; 2) labeling data is extremely labor-intensive, leading to very limited data for training. In recent years, the grand progress of deep learning demonstrates strong learning capacity given abundant data. Besides, more recent studies on efficient learning and self-supervised learning laid solid foundations to tackle the aforementioned challenges. In this paper, we propose to inject prior knowledge to achieve a strong and efficient learner. Specifically, intrinsic patterns in altimetry data are efficiently captured using a scale-translation equivariant convolutional neural network (ST-ECNN). By considering inherent symmetries in neural network design, ST-ECNN achieves higher efficiency and better performance than baseline models. Furthermore, we also introduce prior knowledge from massive unsupervised data to enhance our solution using the SimCLR framework for pre-training. Our final solution achieves an overall better performance than baselines on our handcrafted altimetry dataset. Data and codes are available at https://github.com/ZhangWan-byte/Internal_Solitary_Wave_Localization . △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages, 5 figures

arXiv:2406.12798 [pdf, other]

doi 10.3847/2041-8213/ad5967

The Aligned Orbit of a Hot Jupiter around the M Dwarf TOI-4201

Authors: Tianjun Gan, Sharon X. Wang, Fei Dai, Joshua N. Winn, Shude Mao, Siyi Xu, Enric Pallé, Jacob L. Bean, Madison Brady, Nina Brown, Cicero Lu, Rafael Luque, Teo Mocnik, Andreas Seifahrt, Guðmundur K. Stefánsson

Abstract: Measuring the obliquities of stars hosting giant planets may shed light on the dynamical history of planetary systems. Significant efforts have been made to measure the obliquities of FGK stars with hot Jupiters, mainly based on observations of the Rossiter-McLaughlin effect. In contrast, M dwarfs with hot Jupiters have hardly been explored, because such systems are rare and often not favorable fo… ▽ More Measuring the obliquities of stars hosting giant planets may shed light on the dynamical history of planetary systems. Significant efforts have been made to measure the obliquities of FGK stars with hot Jupiters, mainly based on observations of the Rossiter-McLaughlin effect. In contrast, M dwarfs with hot Jupiters have hardly been explored, because such systems are rare and often not favorable for such precise observations. Here, we report the first detection of the Rossiter-McLaughlin effect for an M dwarf with a hot Jupiter, TOI-4201, using the Gemini-North/MAROON-X spectrograph. We find TOI-4201 to be well-aligned with its giant planet, with a sky-projected obliquity of $λ=-3.0_{-3.2}^{+3.7}\ ^{\circ}$ and a true obliquity of $ψ=21.3_{-12.8}^{+12.5}\ ^{\circ}$ with an upper limit of $40^{\circ}$ at a 95% confidence level. The result agrees with dynamically quiet formation or tidal obliquity dam** that realigned the system. As the first hot Jupiter around an M dwarf with its obliquity measured, TOI-4201b joins the group of aligned giant planets around cool stars ($T_{\rm eff}<6250\ K$), as well as the small but growing sample of planets with relatively high planet-to-star mass ratio ($M_p/M_\ast\gtrsim 3\times 10^{-3}$) that also appear to be mostly aligned. △ Less

Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures, 3 tables, accepted to ApJL

arXiv:2406.12757 [pdf, other]

MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu

Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' n… ▽ More Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 13pages,5figures

arXiv:2406.12646 [pdf, other]

An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potential for performance biases that could mirror those found in task-specific deep learning models like nnU-Net. In this paper, we explored the fairness dilemma concerning large segmentation foundation models. We prospectively curate a benchmark dataset of 3D MRI and CT scans of the organs including liver, kidney, spleen, lung and aorta from a total of 1056 healthy subjects with expert segmentations. Crucially, we document demographic details such as gender, age, and body mass index (BMI) for each subject to facilitate a nuanced fairness analysis. We test state-of-the-art foundation models for medical image segmentation, including the original SAM, medical SAM and SAT models, to evaluate segmentation efficacy across different demographic groups and identify disparities. Our comprehensive analysis, which accounts for various confounding factors, reveals significant fairness concerns within these foundational models. Moreover, our findings highlight not only disparities in overall segmentation metrics, such as the Dice Similarity Coefficient but also significant variations in the spatial distribution of segmentation errors, offering empirical evidence of the nuanced challenges in ensuring fairness in medical image segmentation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to MICCAI-2024

arXiv:2406.12641 [pdf, other]

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice questions, with an average of 994 tokens per question. Each question contains an average of 4.55 pieces of implicit evidence, and solving the problem typically requires 7.62 logical jumps to find the correct answer. To enhance the performance of LLMs in evidence detection, this paper proposes Detective Reasoning Prompt and Finetune. Experiments demonstrate that the existing LLMs' abilities to detect evidence in long contexts are far inferior to humans. However, the Detective Reasoning Prompt effectively enhances the capability of powerful LLMs in evidence detection, while the Finetuning method shows significant effects in enhancing the performance of weaker LLMs. Moreover, when the abilities of LLMs in evidence detection are improved, their final reasoning performance is also enhanced accordingly. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12566 [pdf, other]

RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

Authors: Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, Zhicheng Dou

Abstract: Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant asp… ▽ More Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant aspects. To tackle this important yet underexplored problem, we propose a novel RAG framework, namely RichRAG. It includes a sub-aspect explorer to identify potential sub-aspects of input questions, a multi-faceted retriever to build a candidate pool of diverse external documents related to these sub-aspects, and a generative list-wise ranker, which is a key module to provide the top-k most valuable documents for the final generator. These ranked documents sufficiently cover various query aspects and are aware of the generator's preferences, hence incentivizing it to produce rich and comprehensive responses for users. The training of our ranker involves a supervised fine-tuning stage to ensure the basic coverage of documents, and a reinforcement learning stage to align downstream LLM's preferences to the ranking of documents. Experimental results on two publicly available datasets prove that our framework effectively and efficiently provides comprehensive and satisfying responses to users. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12463 [pdf, other]

LFMamba: Light Field Image Super-Resolution with State Space Model

Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12435 [pdf, other]

Federated Learning with Limited Node Labels

Authors: Bisheng Tang, Xiaojun Chen, Shaopu Wang, Yuexin Xuan, Zhendong Zhao

Abstract: Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-… ▽ More Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12415 [pdf, other]

doi 10.1145/3643991.3644886

MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

Authors: Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt… ▽ More We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt advanced tools to ensure the extracted code integrality and enrich the code with four different transformed representations. In total, MegaVul contains 17,380 vulnerabilities collected from 992 open-source repositories spanning 169 different vulnerability types disclosed from January 2006 to October 2023. Thus, MegaVul can be used for a variety of software security-related tasks including detecting vulnerabilities and assessing vulnerability severity. All information is stored in the JSON format for easy usage. MegaVul is publicly available on GitHub and will be continuously updated. It can be easily extended to other programming languages. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 5 pages, 4figures

arXiv:2406.12320 [pdf, other]

On semi-implicit schemes for the incompressible Euler equations via the vanishing viscosity limit

Authors: Xinyu Cheng, Zhaonan Luo, Sheng Wang

Abstract: A new type of systematic approach to study the incompressible Euler equations numerically via the vanishing viscosity limit is proposed in this work. We show the new strategy is unconditionally stable that the $L^2$-energy dissipates and $H^s$-norm is uniformly bounded in time without any restriction on the time step. Moreover, first-order convergence of the proposed method is established includin… ▽ More A new type of systematic approach to study the incompressible Euler equations numerically via the vanishing viscosity limit is proposed in this work. We show the new strategy is unconditionally stable that the $L^2$-energy dissipates and $H^s$-norm is uniformly bounded in time without any restriction on the time step. Moreover, first-order convergence of the proposed method is established including both low regularity and high regularity error estimates. The proposed method is extended to full discretization with a newly developed iterative Fourier spectral scheme. Another main contributions of this work is to propose a new integration by parts technique to lower the regularity requirement from $H^4$ to $H^3$ in order to perform the $L^2$-error estimate. To our best knowledge, this is one of the very first work to study incompressible Euler equations by designing stable numerical schemes via the inviscid limit with rigorous analysis. Furthermore, we will present both low and high regularity errors from numerical experiments and demonstrate the dynamics in several benchmark examples. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2406.12243 [pdf, other]

CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework

Authors: Shaohuang Wang, Lun Wang, Yunhan Bu, Tianwei Huang

Abstract: Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regr… ▽ More Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regressive generation of LLMs leads to slow inference speeds, making them less effective in real-time recommendations.To address this, we propose a framework for news recommendation using LLMs, named \textit{CherryRec}, which ensures the quality of recommendations while accelerating the recommendation process. Specifically, we employ a Knowledge-aware News Rapid Selector to retrieve candidate options based on the user's interaction history. The history and retrieved items are then input as text into a fine-tuned LLM, the Content-aware News Llm Evaluator, designed to enhance news recommendation capabilities. Finally, the Value-aware News Scorer integrates the scores to compute the CherryRec Score, which serves as the basis for the final recommendation.We validate the effectiveness of the proposed framework by comparing it with state-of-the-art baseline methods on benchmark datasets. Our experimental results consistently show that CherryRec outperforms the baselines in both recommendation performance and efficiency.The project resource can be accessed at: \url{https://github.com/xxxxxx} △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12211 [pdf, other]

PCIE_LAM Solution for Ego4D Looking At Me Challenge

Authors: Kanokphan Lertniphonphan, Jun Xie, Yaqing Meng, Shi**g Wang, Feng Chen, Zhepeng Wang

Abstract: This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL… ▽ More This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL extracts spatial features, while the Bi-LSTM extracts temporal features. However, this task is highly challenging due to the distance between the person in the scene and the camera movement, which results in significant blurring in the face image. To address the complexity of the task, we implemented a Gaze Smoothing filter to eliminate noise or spikes from the output. Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate. Code is available at https://github.com/KanokphanL/Ego4D_LAM_InternLSTM △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12197 [pdf, other]

Debate as Optimization: Adaptive Conformal Prediction and Diverse Retrieval for Event Extraction

Authors: Sijia Wang, Lifu Huang

Abstract: We propose a multi-agent debate as optimization (DAO) system for event extraction, where the primary objective is to iteratively refine the large language models (LLMs) outputs through debating without parameter tuning. In DAO, we introduce two novel modules: the Diverse-RAG (DRAG) module and the Adaptive Conformal Prediction (AdaCP) module. DRAG systematically retrieves supporting information tha… ▽ More We propose a multi-agent debate as optimization (DAO) system for event extraction, where the primary objective is to iteratively refine the large language models (LLMs) outputs through debating without parameter tuning. In DAO, we introduce two novel modules: the Diverse-RAG (DRAG) module and the Adaptive Conformal Prediction (AdaCP) module. DRAG systematically retrieves supporting information that best fits the debate discussion, while AdaCP enhances the accuracy and reliability of event extraction by effectively rejecting less promising answers. Experimental results demonstrate a significant reduction in the performance gap between supervised approaches and tuning-free LLM-based methods by 18.1% and 17.8% on ACE05 and 17.9% and 15.2% on CASIE for event detection and argument extraction respectively. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12196 [pdf, other]

CITADEL: Context Similarity Based Deep Learning Framework Bug Finding

Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Shiwei Wang, Chao Shen

Abstract: With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi… ▽ More With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the environment. This problem is challenging due to the difficulty of getting test oracles of performance bugs. Moreover, existing tools are inefficient, generating hundreds of test cases with few trigger bugs. In this paper, we propose CITADEL, a method that accelerates the finding of bugs in terms of efficiency and effectiveness. We observe that many DL framework bugs are similar due to the similarity of operators and algorithms belonging to the same family (e.g., Conv2D and Conv3D). Orthogonal to existing bug-finding tools, CITADEL aims to find new bugs that are similar to reported ones that have known test oracles. It works by first collecting existing bug reports and identifying problematic APIs. CITADEL defines context similarity to measure the similarity of DL framework API pairs and automatically generates test cases with oracles for APIs that are similar to the problematic APIs in existing bug reports. CITADEL respectively covers 1,436 PyTorch and 5,380 TensorFlow APIs and effectively detects 79 and 80 API bugs, among which 58 and 68 are new, and 36 and 58 have been confirmed, many of which, e.g., the 11 performance bugs cannot be detected by existing tools. Moreover, a remarkable 35.40% of the test cases generated by CITADEL can trigger bugs, which significantly transcends the ratios of 0.74%, 1.23%, and 3.90% exhibited by the state-of-the-art methods, DocTer, DeepREL, and TitanFuzz. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 10 figures

arXiv:2406.12012 [pdf, other]

Highly Efficient Superconducting Diodes and Rectifiers for Quantum Circuitry

Authors: Josep Ingla-Aynés, Yasen Hou, Sarah Wang, En-De Chu, Oleg A. Mukhanov, Peng Wei, Jagadeesh S. Moodera

Abstract: Superconducting electronics is essential for energy-efficient quantum and classical high-end computing applications. Towards this goal, non-reciprocal superconducting circuit elements, such as superconducting diodes (SDs) can fulfill many critical needs. SDs have been the subject of multiple studies, but integrating several SDs in a superconducting circuit remains a challenge. Here we implement th… ▽ More Superconducting electronics is essential for energy-efficient quantum and classical high-end computing applications. Towards this goal, non-reciprocal superconducting circuit elements, such as superconducting diodes (SDs) can fulfill many critical needs. SDs have been the subject of multiple studies, but integrating several SDs in a superconducting circuit remains a challenge. Here we implement the first SD bridge with multiple SDs exhibiting reproducible characteristics operating at temperatures of a few Kelvin. We demonstrate its functionality as a full wave rectifier using elemental superconductors and insulating ferromagnets, with efficiency up to 43%, and ac to dc signal conversion capabilities at frequencies up to 40 kHz. Our results show a pathway with a highly scalable thin film platform for nonreciprocal superconducting circuits. They could significantly reduce energy consumption as well as decohering thermal and electromagnetic noise in quantum computing. △ Less

Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures

arXiv:2406.11799 [pdf, other]

Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation

Authors: Song Wang, Zhong Zhang, Huan Yan, Ming Xu, Guanghui Wang

Abstract: H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pa… ▽ More H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pair. However, most of them overemphasize the correspondence between domains or patches, overlooking the side information provided by the non-corresponding objects. In this paper, we propose a Mix-Domain Contrastive Learning (MDCL) method to leverage the supervision information in unpaired H&E-to-IHC stain translation. Specifically, the proposed MDCL method aggregates the inter-domain and intra-domain pathology information by estimating the correlation between the anchor patch and all the patches from the matching images, encouraging the network to learn additional contrastive knowledge from mixed domains. With the mix-domain pathology information aggregation, MDCL enhances the pathological consistency between the corresponding patches and the component discrepancy of the patches from the different positions of the generated IHC image. Extensive experiments on two H&E-to-IHC stain translation datasets, namely MIST and BCI, demonstrate that the proposed method achieves state-of-the-art performance across multiple metrics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2406.11644 [pdf, other]

Detecting Planetary Oblateness in the Era of JWST: A Case Study of Kepler-167e

Authors: Quanyi Liu, Wei Zhu, Yifan Zhou, Zhecheng Hu, Zitao Lin, Fei Dai, Kento Masuda, Sharon X. Wang

Abstract: Planets may be rotationally flattened, and their oblateness thus provide useful information on their formation and evolution. Here we develop a new algorithm that can compute the transit light curve due to an oblate planet very efficiently and use it to study the detectability of planet oblateness (and spin obliquity) with the James Webb Space Telescope (JWST). Using the Jupiter analog, Kepler-167… ▽ More Planets may be rotationally flattened, and their oblateness thus provide useful information on their formation and evolution. Here we develop a new algorithm that can compute the transit light curve due to an oblate planet very efficiently and use it to study the detectability of planet oblateness (and spin obliquity) with the James Webb Space Telescope (JWST). Using the Jupiter analog, Kepler-167e, as an example, we show that observations of a single transit with JWST are able to detect a Saturn-like oblateness ($f=0.1$) with high confidence, or set a stringent upper limit on the oblateness parameter, as long as the planetary spin is slightly misaligned ($\gtrsim 20^\circ$) with respect to its orbital direction. Based on known obliquity measurements and theoretical arguments, it is reasonable to believe that this level of misalignment may be common. We estimate the sensitivity limit of JWST in oblateness detections and highlight the importance of better characterizations of cold planets in planning future JWST transit observations. The potential to detect rings, moons, and atmospheric species of the cold giants with JWST is also discussed. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 17 pages, 8 figures. Submitted to Astronomical Journal

arXiv:2406.11459 [pdf, other]

Measurement of $J/ψ$ and $ψ\left(2S\right)$ production in $p+p$ and $p+d$ interactions at 120 GeV

Authors: C. H. Leung, K. Nagai, K. Nakano, D. Nawarathne, J. Dove, S. Prasad, N. Wuerfel, C. A. Aidala, J. Arrington, C. Ayuso, C. L. Barker, C. N. Brown, W. C. Chang, A. Chen, D. C. Christian, B. P. Dannowitz, M. Daugherity, L. El Fassi, D. F. Geesaman, R. Gilman, Y. Goto, R. Guo, T. J. Hague, R. J. Holt, M. F. Hossain , et al. (36 additional authors not shown)

Abstract: We report the $p+p$ and $p+d$ differential cross sections measured in the SeaQuest experiment for $J/ψ$ and $ψ\left(2S\right)$ production at 120 GeV beam energy covering the forward $x$-Feynman ($x_F$) range of $0.5 < x_F <0.9$. The measured cross sections are in good agreement with theoretical calculations based on the nonrelativistic QCD (NRQCD) using the long-distance matrix elements deduced fr… ▽ More We report the $p+p$ and $p+d$ differential cross sections measured in the SeaQuest experiment for $J/ψ$ and $ψ\left(2S\right)$ production at 120 GeV beam energy covering the forward $x$-Feynman ($x_F$) range of $0.5 < x_F <0.9$. The measured cross sections are in good agreement with theoretical calculations based on the nonrelativistic QCD (NRQCD) using the long-distance matrix elements deduced from a recent global analysis of proton- and pion-induced charmonium production data. The $σ_{ψ\left(2S\right)} / σ_{J/ψ}$ cross section ratios are found to increase as $x_F$ increases, indicating that the $q \bar{q}$ annihilation process has larger contributions in the $ψ\left(2S\right)$ production than the $J/ψ$ production. The $σ_{pd}/2σ_{pp}$ cross section ratios are observed to be significantly different for the Drell-Yan process and $J/ψ$ production, reflecting their different production mechanisms. We find that the $σ_{pd}/2σ_{pp}$ ratios for $J/ψ$ production at the forward $x_F$ region are sensitive to the $\bar{d}/ \bar{u}$ flavor asymmetry of the proton sea, analogous to the Drell-Yan process. The transverse momentum ($p_T$) distributions for $J/ψ$ and $ψ\left(2S\right)$ production are also presented and compared with data collected at higher center-of-mass energies. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages, 7 figures

arXiv:2406.11451 [pdf, other]

MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

Authors: Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang

Abstract: When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effect… ▽ More When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effective way to address model hallucination. Moreover, the limited availability of training data in the medical field and privacy concerns greatly hinder the model's accuracy and generalization capabilities. In this paper, we introduce a method that mimics human cognitive processes to construct fine-grained instruction pairs and apply the concept of chain-of-thought (CoT) from inference scenarios to training scenarios, thereby proposing a method called MedThink. Our experiments on various LVLMs demonstrate that our novel data construction method tailored for the medical domain significantly improves the model's performance in medical image report generation tasks and substantially mitigates the hallucinations. All resources of this work will be released soon. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11389 [pdf, other]

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods such as a GNNExplainer. However, post-hoc explanations can not facilitate the model predictions and the computational cost of these methods cannot meet practical requirements, thus limiting their application in real-world scenarios. To address these issues, we propose SEFraud, a novel graph-based self-explainable fraud detection framework that simultaneously tackles fraud detection and result in interpretability. Concretely, SEFraud first leverages customized heterogeneous graph transformer networks with learnable feature masks and edge masks to learn expressive representations from the informative heterogeneously typed transactions. A new triplet loss is further designed to enhance the performance of mask learning. Empirical results on various datasets demonstrate the effectiveness of SEFraud as it shows considerable advantages in both the fraud detection performance and interpretability of prediction results. Moreover, SEFraud has been deployed and offers explainable fraud detection service for the largest bank in China, Industrial and Commercial Bank of China Limited (ICBC). Results collected from the production environment of ICBC show that SEFraud can provide accurate detection results and comprehensive explanations that align with the expert business understanding, confirming its efficiency and applicability in large-scale online services. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024

arXiv:2406.11288 [pdf, other]

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

Authors: Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, **g Ma

Abstract: Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from actual facts due to inherent bias or incorre… ▽ More Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from actual facts due to inherent bias or incorrect inference. To address this issue, we introduce MFC-Bench, a rigorous and comprehensive benchmark designed to evaluate the factual accuracy of LVLMs across three tasks: Manipulation, Out-of-Context, and Veracity Classification. Through our evaluation on MFC-Bench, we benchmarked 12 diverse and representative LVLMs, uncovering that current models still fall short in multimodal fact-checking and demonstrate insensitivity to various forms of manipulated content. We hope that MFC-Bench could raise attention to the trustworthy artificial intelligence potentially assisted by LVLMs in the future. The MFC-Bench and accompanying resources are publicly accessible at https://github.com/wskbest/MFC-Bench, contributing to ongoing research in the multimodal fact-checking field. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 22 pages, 8 figures

arXiv:2406.11281 [pdf, ps, other]

Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces

Authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

Abstract: We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assume… ▽ More We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assumed to be independent and identically distributed, with a distribution that is either fully known or can be consistently estimated. However, the occurrence of distributional shifts, typical in engineering settings, necessitates the consideration of the robustness of the policy. This paper introduces a distributionally robust stochastic control paradigm that accommodates possibly adaptive adversarial perturbation to the noise distribution within a prescribed ambiguity set. We examine two adversary models: current-action-aware and current-action-unaware, leading to different dynamic programming equations. Furthermore, we characterize the optimal finite sample minimax rates for achieving uniform learning of the robust value function across continuum states under both adversary types, considering ambiguity sets defined by $f_k$-divergence and Wasserstein distance. Finally, we demonstrate the applicability of our framework across various real-world settings. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10833 [pdf, other]

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Authors: Yu Zhang, Xiusi Chen, Bowen **, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han

Abstract: In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper,… ▽ More In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper, we aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs regarding their architectures and pre-training techniques. To this end, we comprehensively survey over 250 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality. Moreover, we investigate how LLMs have been deployed to benefit scientific discovery. Resources related to this survey are available at https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 33 pages (GitHub: https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models)

arXiv:2406.10622 [pdf, other]

The Honeycomb Conjecture in normed planes and an alpha-convex variant of a theorem of Dowker

Authors: Zsolt Lángi, Shanshan Wang

Abstract: The Honeycomb Conjecture states that among tilings with unit area cells in the Euclidean plane, the average perimeter of a cell is minimal for a regular hexagonal tiling. This conjecture was proved by L. Fejes Tóth for convex tilings, and by Hales for not necessarily convex tilings. In this paper we investigate the same question for tilings of a given normed plane, and show that among normal, conv… ▽ More The Honeycomb Conjecture states that among tilings with unit area cells in the Euclidean plane, the average perimeter of a cell is minimal for a regular hexagonal tiling. This conjecture was proved by L. Fejes Tóth for convex tilings, and by Hales for not necessarily convex tilings. In this paper we investigate the same question for tilings of a given normed plane, and show that among normal, convex tilings in a normed plane, the average squared perimeter of a cell is minimal for a tiling whose cells are translates of a centrally symmetric hexagon. We also show that the question whether the same statement is true for the average perimeter of a cell is closely related to an $α$-convex variant of a theorem of Dowker on the area of polygons circumscribed about a convex disk. Exploring this connection we find families of norms in which the average perimeter of a cell of a tiling is minimal for a hexagonal tiling, and prove some additonal related results. Finally, we apply our method to give a partial answer to a problem of Steinhaus about the isoperimetric ratios of cells of certain tilings in the Euclidean plane, appeared in an open problem book of Croft, Falconer and Guy. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 20 pages, 3 figures

arXiv:2406.10537 [pdf, other]

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Authors: **chuan Ma, Rui Ding, Qiang Fu, Jiaru Zhang, Shuai Wang, Shi Han, Dongmei Zhang

Abstract: Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to lar… ▽ More Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to larger ones (e.g., with more than 50 variables). The key insight in this paper is that the causal skeleton, which is the undirected version of the causal graph, has potential for improving accuracy and reducing the search space of the optimization procedure, thereby enhancing the performance of differentiable causal discovery. Therefore, we seek to address a two-fold challenge to harness the potential of the causal skeleton for differentiable causal discovery in the presence of latent confounders: (1) scalable and accurate estimation of skeleton and (2) universal integration of skeleton estimation with differentiable causal discovery. To this end, we propose SPOT (Skeleton Posterior-guided OpTimization), a two-phase framework that harnesses skeleton posterior for differentiable causal discovery in the presence of latent confounders. On the contrary to a ``point-estimation'', SPOT seeks to estimate the posterior distribution of skeletons given the dataset. It first formulates the posterior inference as an instance of amortized inference problem and concretizes it with a supervised causal learning (SCL)-enabled solution to estimate the skeleton posterior. To incorporate the skeleton posterior with differentiable causal discovery, SPOT then features a skeleton posterior-guided stochastic optimization procedure to guide the optimization of MAGs. [abridged due to length limit] △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10531 [pdf, other]

doi 10.1109/TIP.2024.3415963

PIG: Prompt Images Guidance for Night-Time Scene Parsing

Authors: Zhifeng Xie, Rui Qiu, Sen Wang, Xin Tan, Yuan Xie, Lizhuang Ma

Abstract: Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hamper… ▽ More Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hampers dataset construction and restricts generalization across night scenes in different datasets. Moreover, UDA, focusing on network architecture and training strategies, faces difficulties in handling classes with few domain similarities. In this paper, we leverage Prompt Images Guidance (PIG) to enhance UDA with supplementary night knowledge. We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images. To generate high-quality pseudo-labels, we propose Pseudo-label Fusion via Domain Similarity Guidance (FDSG). Classes with fewer domain similarities are predicted by NFNet, which excels in parsing night features, while classes with more domain similarities are predicted by UDA, which has rich labeled semantics. Additionally, we propose two data augmentation strategies: the Prompt Mixture Strategy (PMS) and the Alternate Mask Strategy (AMS), aimed at mitigating the overfitting of the NFNet to a few prompt images. We conduct extensive experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC. The results indicate that utilizing PIG can enhance the parsing accuracy of UDA. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: This paper is accepted by IEEE TIP. Code: https://github.com/qiurui4shu/PIG

arXiv:2406.10470 [pdf]

Tandem Photovoltaics from 2D Transition Metal Dichalcogenides on Silicon

Authors: Zekun Hu, Sudong Wang, Jason Lynch, Adam Alfieri, Deep Jariwala

Abstract: The demand for high-efficiency photovoltaic systems necessitates innovations that transcend the efficiency limitations of single-junction solar cells. This study investigates a tandem photovoltaic architecture comprising a top-cell with a transition metal dichalcogenide (TMDC) superlattice absorber and a bottom-cell of crystalline silicon (c-Si), focusing on optimizing the light absorption and ele… ▽ More The demand for high-efficiency photovoltaic systems necessitates innovations that transcend the efficiency limitations of single-junction solar cells. This study investigates a tandem photovoltaic architecture comprising a top-cell with a transition metal dichalcogenide (TMDC) superlattice absorber and a bottom-cell of crystalline silicon (c-Si), focusing on optimizing the light absorption and electrical performance of the combined structure. Through the transfer matrix method and electrical simulations, we optimized the geometry of the superlattice, determining that a siz-layer MoSe2 configuration with a 40 nm SiO2 antireflective layer maximizes photon absorption while mitigating additional weight and preserving the cell's structural integrity. The results show that the optimized TMDC superlattice significantly improves the PCE of the tandem design to 28.96%, and increase of 5.68% over the original single-junction c-Si solar cell's efficiency. This advancement illustrates the potential of TMDC material in next-generation solar cells and presents a promising avenue for the development of highly efficient, tandem photovoltaic systems via van der Waals integration of the top cell on c-Si △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10443 [pdf, ps, other]

Attosecond pulse synthesis from high-order harmonic generation in intense squeezed light

Authors: ShiJun Wang, XuanYang Lai, XiaoJun Liu

Abstract: High-order harmonic generation (HHG) provides a broad spectral bandwidth for synthesizing attosecond pulses. However, in the current HHG schemes, only part of the harmonics can be phase-locked, which limits the ability to achieve shorter attosecond pulses. Here, we study attosecond pulse synthesis from HHG of an atom driven by an intense quantum light, i.e., squeezed light. It is interestingly fou… ▽ More High-order harmonic generation (HHG) provides a broad spectral bandwidth for synthesizing attosecond pulses. However, in the current HHG schemes, only part of the harmonics can be phase-locked, which limits the ability to achieve shorter attosecond pulses. Here, we study attosecond pulse synthesis from HHG of an atom driven by an intense quantum light, i.e., squeezed light. It is interestingly found that the harmonics in the whole spectrum can be phase-locked and, by using these harmonics, the width of the synthesized attosecond pulse is greatly reduced. By develo** strong-field approximation theory in squeezed light, the physics of the phase-locked harmonic generation throughout the HHG spectrum is revealed and is found to be independent of the target system. Furthermore, we uncover the dependence of the synthesized attosecond pulse width on the squeezing parameter of the squeezed light. Our findings provide a robust tool for obtaining phase-locked harmonics throughout the HHG spectrum for synthesizing short attosecond pulses. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2406.10425 [pdf, other]

doi 10.1145/3637528.3671829

Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling

Authors: Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, Suhang Wang

Abstract: In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph s… ▽ More In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method. △ Less

Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

Showing 51–100 of 8,583 results for author: Wang, S