Search | arXiv e-print repository

Acceleration without Disruption: DFT Software as a Service

Authors: Fusong Ju, Xinran Wei, Lin Huang, Andrew J. Jenkins, Leo Xia, Jia Zhang, Jianwei Zhu, Han Yang, Bin Shao, Peggy Dai, Ashwin Mayya, Zahra Hooshmand, Alexandra Efimovskaya, Nathan A. Baker, Matthias Troyer, Hongbin Liu

Abstract: Density functional theory (DFT) has been a cornerstone in computational chemistry, physics, and materials science for decades, benefiting from advancements in computational power and theoretical methods. This paper introduces a novel, cloud-native application, Accelerated DFT, which offers an order of magnitude acceleration in DFT simulations. By integrating state-of-the-art cloud infrastructure a… ▽ More Density functional theory (DFT) has been a cornerstone in computational chemistry, physics, and materials science for decades, benefiting from advancements in computational power and theoretical methods. This paper introduces a novel, cloud-native application, Accelerated DFT, which offers an order of magnitude acceleration in DFT simulations. By integrating state-of-the-art cloud infrastructure and redesigning algorithms for graphic processing units (GPUs), Accelerated DFT achieves high-speed calculations without sacrificing accuracy. It provides an accessible and scalable solution for the increasing demands of DFT calculations in scientific communities. The implementation details, examples, and benchmark results illustrate how Accelerated DFT can significantly expedite scientific discovery across various domains. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11178 [pdf, other]

Increasing certainty in systems biology models using Bayesian multimodel inference

Authors: Nathaniel Linden-Santangeli, ** Zhang, Boris Kramer, Padmini Rangamani

Abstract: Mathematical models are indispensable to the system biology toolkit for studying the structure and behavior of intracellular signaling networks. A common approach to modeling is to develop a system of equations that encode the known biology using approximations and simplifying assumptions. As a result, the same signaling pathway can be represented by multiple models, each with its set of underlyin… ▽ More Mathematical models are indispensable to the system biology toolkit for studying the structure and behavior of intracellular signaling networks. A common approach to modeling is to develop a system of equations that encode the known biology using approximations and simplifying assumptions. As a result, the same signaling pathway can be represented by multiple models, each with its set of underlying assumptions, which opens up challenges for model selection and decreases certainty in model predictions. Here, we use Bayesian multimodel inference to develop a framework to increase certainty in systems biology models. Using models of the extracellular regulated kinase (ERK) pathway, we first show that multimodel inference increases predictive certainty and yields predictors that are robust to changes in the set of available models. We then show that predictions made with multimodel inference are robust to data uncertainties introduced by decreasing the measurement duration and reducing the sample size. Finally, we use multimodel inference to identify a new model to explain experimentally measured sub-cellular location-specific ERK activity dynamics. In summary, our framework highlights multimodel inference as a disciplined approach to increasing the certainty of intracellular signaling activity predictions. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 25 pages; 5 figures

arXiv:2406.11177 [pdf, other]

TIFG: Text-Informed Feature Generation with Large Language Models

Authors: Xinhao Zhang, **ghan Zhang, Fengran Mo, Yuzhong Chen, Kunpeng Liu

Abstract: Textual information of data is of vital importance for data mining and feature engineering. However, existing methods focus on learning the data structures and overlook the textual information along with the data. Consequently, they waste this valuable resource and miss out on the deeper data relationships embedded within the texts. In this paper, we introduce Text-Informed Feature Generation (TIF… ▽ More Textual information of data is of vital importance for data mining and feature engineering. However, existing methods focus on learning the data structures and overlook the textual information along with the data. Consequently, they waste this valuable resource and miss out on the deeper data relationships embedded within the texts. In this paper, we introduce Text-Informed Feature Generation (TIFG), a novel LLM-based text-informed feature generation framework. TIFG utilizes the textual information to generate features by retrieving possible relevant features within external knowledge with Retrieval Augmented Generation (RAG) technology. In this approach, the TIFG can generate new explainable features to enrich the feature space and further mine feature relationships. We design the TIFG to be an automated framework that continuously optimizes the feature generation process, adapts to new data inputs, and improves downstream task performance over iterations. A broad range of experiments in various downstream tasks showcases that our approach can generate high-quality and meaningful features, and is significantly superior to existing methods. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11121 [pdf, other]

Catalytic evolution of cooperation in a population with behavioural bimodality

Authors: Anhui Sheng, **g Zhang, Guozhong Zheng, Jiqiang Zhang, Weiran Cai, Li Chen

Abstract: The remarkable adaptability of humans in response to complex environments is often demonstrated by the context-dependent adoption of different behavioral modes. However, the existing game-theoretic studies mostly focus on the single-mode assumption, and the impact of this behavioral multimodality on the evolution of cooperation remains largely unknown. Here, we study how cooperation evolves in a p… ▽ More The remarkable adaptability of humans in response to complex environments is often demonstrated by the context-dependent adoption of different behavioral modes. However, the existing game-theoretic studies mostly focus on the single-mode assumption, and the impact of this behavioral multimodality on the evolution of cooperation remains largely unknown. Here, we study how cooperation evolves in a population with two behavioral modes. Specifically, we incorporate Q-learning and Tit-for-Tat (TFT) rules into our toy model, where prisoner's dilemma game is played and we investigate the impact of the mode mixture on the evolution of cooperation. While players in Q-learning mode aim to maximize their accumulated payoffs, players within TFT mode repeat what their neighbors have done to them. In a structured mixing implementation where the updating rule is fixed for each individual, we find that the mode mixture greatly promotes the overall cooperation prevalence. The promotion is even more significant in the probabilistic mixing, where players randomly select one of the two rules at each step. Finally, this promotion is robust when players are allowed to adaptively choose the two modes by real-time comparison. In all three scenarios, players within the Q-learning mode act as catalyzer that turns the TFT players to be more cooperative, and as a result drive the whole population to be highly cooperative. The analysis of Q-tables explains the underlying mechanism of cooperation promotion, which captures the ``psychologic evolution" in the players' mind. Our study indicates that the variety of behavioral modes is non-negligible, and could be crucial to clarify the emergence of cooperation in the real world. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 11 pages, 12 figure. Comments are appreciated

arXiv:2406.11086 [pdf, other]

A Bayesian Drift-Diffusion Model of Schachter-Singer's Two Factor Theory of Emotion

Authors: Lance Ying, Audrey Michal, Jun Zhang

Abstract: Bayesian inference has been used in the past to model visual perception (Kersten, Mamassian, & Yuille, 2004), accounting for the Helmholtz principle of perception as "unconscious inference" that is constrained by bottom-up sensory evidence (likelihood) while subject to top-down expectation, priming, or other contextual influences (prior bias); here "unconsciousness" merely relates to the "directne… ▽ More Bayesian inference has been used in the past to model visual perception (Kersten, Mamassian, & Yuille, 2004), accounting for the Helmholtz principle of perception as "unconscious inference" that is constrained by bottom-up sensory evidence (likelihood) while subject to top-down expectation, priming, or other contextual influences (prior bias); here "unconsciousness" merely relates to the "directness" of perception in the sense of Gibson. Here, we adopt the same Bayesian framework to model emotion process in accordance with Schachter-Singer's Two-Factor theory, which argues that emotion is the outcome of cognitive labeling or attribution of a diffuse pattern of autonomic arousal (Schachter & Singer, 1962). In analogous to visual perception, we conceptualize the emotion process as an instance of Bayesian inference, combining the contextual information with a person's physiological arousal patterns. Drift-diffusion models were constructed to simulate emotional processes, where the decision boundaries correspond to the emotional state experienced by the participants, and boundary-crossing constitutes "labeling" in Schachter-Singer's sense. Our model is tested against experimental data from the Schachter & Singer's study (1962) and the Ross et al. study (1969). Two model scenarios are investigated, in which arousal pattern as one factor is pitted against contextual interaction with an confederate (in Schachter-Singer case) or explicitly instructed mis-attribution (in Ross et al. case) as another factor, map** onto the Bayesian prior (initial position of the drift) and the likelihood function (evidence accumulation or drift rate). We find that the first scenario (arousal as the prior and context as the likelihood) has a better fit with Schachter & Singer (1962) whereas the second scenario (context as the prior and arousal as the likelihood) has a better fit with Ross et al. (1969). △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: published at CogSci 2022

arXiv:2406.10988 [pdf, other]

Quantum coupon collector with mixed-state encoding

Authors: **g-Peng Zhang, Min-Quan He, Dan-Bo Zhang

Abstract: The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing… ▽ More The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing elements are labelled with Pauli strings. Remarkably, the encoded mixed state has no quantum entangled state and is easy to prepare. With such mixed-state encoding, it can be efficient to learn the set by performing Bell measurements on two copies and then extracting the missing element by solving a series of equations obtained from the measurements. Our protocol further reduces the sample complexity from $O(n)$ in the case of pure-state encoding to $O(\log n)$ when the missing element is one, where $n$ is the number of elements in the set. The mixed-state encoding scheme provides a new avenue for quantum learning and enlarges the realm for exploring quantum advantages. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10959 [pdf, ps, other]

On Convergence and Rate of Convergence of Policy Improvement Algorithms

Authors: ** Ma, Gaozhan Wang, Jianfeng Zhang

Abstract: In this paper we provide a simple proof from scratch for the convergence of Policy Improvement Algorithm (PIA) for a continuous time entropy-regularized stochastic control problem. Such convergence has been established by Huang-Wang-Zhou(2023) by using sophisticated PDE estimates for the iterative PDEs involved in the PIA. Our approach builds on some Feynman-Kac type probabilistic representation f… ▽ More In this paper we provide a simple proof from scratch for the convergence of Policy Improvement Algorithm (PIA) for a continuous time entropy-regularized stochastic control problem. Such convergence has been established by Huang-Wang-Zhou(2023) by using sophisticated PDE estimates for the iterative PDEs involved in the PIA. Our approach builds on some Feynman-Kac type probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the infinite horizon model with a large discount factor and in the finite horizon model, we obtain the exponential rate of convergence with similar arguments. Finally, in the one dimensional setting, we extend the convergence result to the diffusion control case. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: In this version we added a convergence result for the 1-dimensional model with diffusion controls

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10661 [pdf, other]

A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking

Authors: Jun Zhang, Wenxuan Ao, Junbo Yan, Depeng **, Yong Li

Abstract: With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision optimization methods. Learning-based optimization methods require extensive interaction with highly realistic microscopic traffic simulators for optimization. However, existing microscopic traffic… ▽ More With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision optimization methods. Learning-based optimization methods require extensive interaction with highly realistic microscopic traffic simulators for optimization. However, existing microscopic traffic simulators are computationally inefficient in large-scale scenarios and therefore significantly reduce the efficiency of the data sampling process of optimization algorithms. In addition, the optimization scenarios supported by existing simulators are limited, mainly focusing on the traffic signal control. To address these challenges and limitations, we propose the first open-source GPU-accelerated large-scale microscopic simulator for transportation system simulation. The simulator is able to iterate at 84.09Hz, which achieves 88.92 times computational acceleration in the large-scale scenario with more than a million vehicles compared to the best baseline. Based on the simulator, we implement a set of microscopic and macroscopic controllable objects and metrics to support most typical transportation system optimization scenarios. These controllable objects and metrics are all provided by Python API for ease of use. We choose five important and representative transportation system optimization scenarios and benchmark classical rule-based algorithms, reinforcement learning, and black-box optimization in four cities. The codes are available at \url{https://github.com/tsinghua-fib-lab/moss-benchmark} with the MIT License. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPS 2024 Datasets and Benchmarks Track

arXiv:2406.10563 [pdf, other]

Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

Authors: Yukai Xu, **gfeng Zhang, Yujie Gu

Abstract: In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively traine… ▽ More In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively trained while ensuring their confidentiality to address intellectual property concerns. To tackle these challenges, we propose a new framework termed Abstention-Aware Federated Voting (AAFV) that can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy. This is achieved by integrating a novel abstention-aware voting mechanism and a differential privacy mechanism onto local models' predictions. In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models, which not only enhances the learning utility but also protects model confidentiality. Furthermore, we implement AAFV on two practical prediction tasks of diabetes and in-hospital patient mortality. The experiments demonstrate the effectiveness and confidentiality of AAFV in testing accuracy and privacy protection. △ Less

Submitted 4 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted to the 2024 IEEE Conference on Artificial Intelligence (IEEE CAI 2024)

arXiv:2406.10539 [pdf, other]

Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On

Authors: Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen Qian, Jianfu Zhang, Liqing Zhang

Abstract: Virtual clothes try-on has emerged as a vital feature in online shop**, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, gene… ▽ More Virtual clothes try-on has emerged as a vital feature in online shop**, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. Techniques such as conditional guidance and focus on key regions have been integrated into our approach. These combined strategies empower the diffusion model to reproduce clothing details with increased clarity and realism. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences, significantly surpassing the capabilities of existing technologies. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10537 [pdf, other]

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Authors: **chuan Ma, Rui Ding, Qiang Fu, Jiaru Zhang, Shuai Wang, Shi Han, Dongmei Zhang

Abstract: Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to lar… ▽ More Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to larger ones (e.g., with more than 50 variables). The key insight in this paper is that the causal skeleton, which is the undirected version of the causal graph, has potential for improving accuracy and reducing the search space of the optimization procedure, thereby enhancing the performance of differentiable causal discovery. Therefore, we seek to address a two-fold challenge to harness the potential of the causal skeleton for differentiable causal discovery in the presence of latent confounders: (1) scalable and accurate estimation of skeleton and (2) universal integration of skeleton estimation with differentiable causal discovery. To this end, we propose SPOT (Skeleton Posterior-guided OpTimization), a two-phase framework that harnesses skeleton posterior for differentiable causal discovery in the presence of latent confounders. On the contrary to a ``point-estimation'', SPOT seeks to estimate the posterior distribution of skeletons given the dataset. It first formulates the posterior inference as an instance of amortized inference problem and concretizes it with a supervised causal learning (SCL)-enabled solution to estimate the skeleton posterior. To incorporate the skeleton posterior with differentiable causal discovery, SPOT then features a skeleton posterior-guided stochastic optimization procedure to guide the optimization of MAGs. [abridged due to length limit] △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10522 [pdf, other]

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Authors: Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak

Abstract: We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning… ▽ More We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10508 [pdf, other]

Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis

Authors: Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren

Abstract: Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model fo… ▽ More Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model for specific tasks. In this work, we introduce a simplified approach called Adapt foundation models with a low-rank adaptation (LoRA) technique for easier customization. Our method, inspired by the DINOv2 foundation model, applies low-rank adaptation learning to tailor foundation models for capsule endoscopy diagnosis effectively. Unlike traditional fine-tuning methods, our strategy includes LoRA layers designed to absorb specific surgical domain knowledge. During the training process, we keep the main model (the backbone encoder) fixed and focus on optimizing the LoRA layers and the disease classification component. We tested our method on two publicly available datasets for capsule endoscopy disease classification. The results were impressive, with our model achieving 97.75% accuracy on the Kvasir-Capsule dataset and 98.81% on the Kvasirv2 dataset. Our solution demonstrates that foundation models can be adeptly adapted for capsule endoscopy diagnosis, highlighting that mere reliance on straightforward fine-tuning or pre-trained models from general computer vision tasks is inadequate for such specific applications. △ Less

Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: To appear in ICBIR 2024

arXiv:2406.10502 [pdf, other]

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Authors: Jiahan Zhang, Qi Wei, Feng Liu, Lei Feng

Abstract: Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs exhibit low zero-shot performance in downstream tasks. To alleviate this issue, we propose a Candidate Pseudolabel Learning method, termed CPL, to fine-tune VLM… ▽ More Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs exhibit low zero-shot performance in downstream tasks. To alleviate this issue, we propose a Candidate Pseudolabel Learning method, termed CPL, to fine-tune VLMs with suitable candidate pseudolabels of unlabeled data in downstream tasks. The core of our method lies in the generation strategy of candidate pseudolabels, which progressively generates refined candidate pseudolabels by both intra- and inter-instance label selection, based on a confidence score matrix for all unlabeled data. This strategy can result in better performance in true label inclusion and class-balanced instance selection. In this way, we can directly apply existing loss functions to learn with generated candidate psueudolabels. Extensive experiments on nine benchmark datasets with three learning paradigms demonstrate the effectiveness of our method. Our code can be found at https://github.com/vanillaer/CPL-ICML2024. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted by ICML2024

arXiv:2406.10457 [pdf, other]

Noise-induced quantum synchronization and maximally entangled mixed states in superconducting circuits

Authors: Ziyu Tao, Finn Schmolke, Chang-Kang Hu, Wenhui Huang, Yuxuan Zhou, Jiawei Zhang, Ji Chu, Libo Zhang, Xuandong Sun, Zecheng Guo, **g**g Niu, Wenle Weng, Song Liu, Youpeng Zhong, Dian Tan, Dapeng Yu, Eric Lutz

Abstract: Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are… ▽ More Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are entangled, with nonzero concurrence, and that they belong to a class of generalized Bell states known as maximally entangled mixed states, whose entanglement cannot be increased by any global unitary. We further demonstrate the stability against frequency detuning of both synchronization and entanglement by determining the corresponding generalized Arnold tongue diagrams. Our results highlight the constructive influence of noise in a quantum many-body system and uncover the potential role of synchronization for mixed-state quantum information science. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.10157 [pdf, other]

RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model

Authors: Hantao Zhou, Tianying Ji, Jianwei Zhang, Fuchun Sun, Huazhe Xu

Abstract: Minigolf, a game with countless court layouts, and complex ball motion, constitutes a compelling real-world testbed for the study of embodied intelligence. As it not only challenges spatial and kinodynamic reasoning but also requires reflective and corrective capacities to address erroneously designed courses. We introduce RoboGolf, a framework that perceives dual-camera visual inputs with nested… ▽ More Minigolf, a game with countless court layouts, and complex ball motion, constitutes a compelling real-world testbed for the study of embodied intelligence. As it not only challenges spatial and kinodynamic reasoning but also requires reflective and corrective capacities to address erroneously designed courses. We introduce RoboGolf, a framework that perceives dual-camera visual inputs with nested VLM-empowered closed-loop control and reflective equilibrium loop. Extensive experiments demonstrate the effectiveness of RoboGolf on challenging minigolf courts including those that are impossible to finish. △ Less

Submitted 23 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: Project page: https://jity16.github.io/RoboGolf/

arXiv:2406.10146 [pdf]

Multimodal Radiomics Model for Predicting Gold Nanoparticles Accumulation in Mouse Tumors

Authors: Jiajia Tang, Jie Zhang, Jiulou Zhang, Yuxia Tang, Hao Ni, Shouju Wang

Abstract: Background: Nanoparticles can accumulate in solid tumors, serving as diagnostic or therapeutic agents for cancer. Clinical translation is challenging due to low accumulation in tumors and heterogeneity between tumor types and individuals. Tools to identify this heterogeneity and predict nanoparticle accumulation are needed. Advanced imaging techniques combined with radiomics and AI may offer a sol… ▽ More Background: Nanoparticles can accumulate in solid tumors, serving as diagnostic or therapeutic agents for cancer. Clinical translation is challenging due to low accumulation in tumors and heterogeneity between tumor types and individuals. Tools to identify this heterogeneity and predict nanoparticle accumulation are needed. Advanced imaging techniques combined with radiomics and AI may offer a solution. Methods: 183 mice were used to create seven subcutaneous tumor models, with three sizes (15nm, 40nm, 70nm) of gold nanoparticles injected via the tail vein. Accumulation was measured using ICP-OES. Data were divided into training and test sets (7:3). Tumors were categorized into high and low uptake groups based on the median value of the training set. Before injection, multimodal imaging data (CT, B-mode ultrasound, SWE, CEUS) were acquired, and radiomics features extracted. LASSO and RFE algorithms built a radiomics signature. This, along with tumor type and mean values from CT and SWE, constructed the best model using SVM. For each tumor in the test set, the radiomics signature predicted gold nanoparticle uptake. Model performance was evaluated by AUC. Results: Significant variability in gold nanoparticle accumulation was observed among tumors (P < 0.001). The median accumulation in the training set was 3.37% ID/g. Nanoparticle size was not a main determinant of uptake (P > 0.05). The composite model based on radiomics signature outperformed the basic model in both training (AUC 0.93 vs. 0.68) and testing (0.78 vs. 0.61) datasets. Conclusion: The composite model identifies tumor heterogeneity and predicts high uptake of gold nanoparticles, improving patient stratification and supporting nanomedicine's clinical application. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09867 [pdf, other]

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox

Authors: Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

Abstract: Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. However, some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide th… ▽ More Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. However, some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide the test samples into subsets with different semantic and covariate shift degrees relative to the ID dataset. The data division is achieved through a shift measuring method based on our proposed Language Aligned Image feature Decomposition (LAID). Moreover, we construct a Synthetic Incremental Shift (Syn-IS) dataset that contains high-quality generated images with more diverse covariate contents to complement the IS-OOD benchmark. We evaluate current OOD detection methods on our benchmark and find several important insights: (1) The performance of most OOD detection methods significantly improves as the semantic shift increases; (2) Some methods like GradNorm may have different OOD detection mechanisms as they rely less on semantic shifts to make decisions; (3) Excessive covariate shifts in the image are also likely to be considered as OOD for some methods. Our code and data are released in https://github.com/qqwsad5/IS-OOD. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: v1

arXiv:2406.09843 [pdf, other]

An Exploratory Study on Using Large Language Models for Mutation Testing

Authors: Bo Wang, Mingda Chen, Youfang Lin, Mike Papadakis, Jie M. Zhang

Abstract: The question of how to generate high-utility mutations, to be used for testing purposes, forms a key challenge in mutation testing literature. %Existing approaches rely either on human-specified syntactic rules or learning-based approaches, all of which produce large numbers of redundant mutants. Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mut… ▽ More The question of how to generate high-utility mutations, to be used for testing purposes, forms a key challenge in mutation testing literature. %Existing approaches rely either on human-specified syntactic rules or learning-based approaches, all of which produce large numbers of redundant mutants. Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mutation testing remains unexplored. To this end, we systematically investigate the performance of LLMs in generating effective mutations w.r.t. to their usability, fault detection potential, and relationship with real bugs. In particular, we perform a large-scale empirical study involving 4 LLMs, including both open- and closed-source models, and 440 real bugs on two Java benchmarks. We find that compared to existing approaches, LLMs generate more diverse mutations that are behaviorally closer to real bugs, which leads to approximately 18% higher fault detection than current approaches (i.e., 87% vs. 69%) in a newly collected set of bugs, purposely selected for evaluating learning-based approaches, i.e., mitigating potential data leakage concerns. Additionally, we explore alternative prompt engineering strategies and the root causes of uncompilable mutations, produced by the LLMs, and provide valuable insights for the use of LLMs in the context of mutation testing. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 13 pages, 3 figures

ACM Class: D.2.5

arXiv:2406.09834 [pdf, other]

How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs, the specific problem of deprecated API usage in LLM-based code completion has not been thoroughly investigated. To address this gap, we conducted the first evaluation study on deprecated API usage in LLM-based code completion. This study involved seven advanced LLMs, 145 API map**s from eight popular Python libraries, and 28,125 completion prompts. The study results reveal the \textit{status quo} and \textit{root causes} of deprecated API usage in LLM-based code completion from the perspectives of \textit{model}, \textit{prompt}, and \textit{library}. Based on these findings, we propose two lightweight fixing approaches, \textsc{ReplaceAPI} and \textsc{InsertPrompt}, which can serve as baseline approaches for future research on mitigating deprecated API usage in LLM-based completion. Additionally, we provide implications for future research on integrating library evolution with LLM-driven software development. △ Less

Submitted 3 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09794 [pdf, other]

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

Authors: Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

Abstract: SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add… ▽ More SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To address this issue, we propose SuperSVG, a superpixel-based vectorization model that achieves fast and high-precision image vectorization. Specifically, we decompose the input image into superpixels to help the model focus on areas with similar colors and textures. Then, we propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details. Moreover, we propose a novel dynamic path war** loss to help the refinement-stage model to inherit knowledge from the coarse-stage model. Extensive qualitative and quantitative experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches. The code is available in \url{https://github.com/sjtuplayer/SuperSVG}. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: CVPR 2024

arXiv:2406.09773 [pdf]

Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

Authors: Haowei Yang, Liyang Wang, **gyu Zhang, Yu Cheng, Ao Xiang

Abstract: With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain map**, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this… ▽ More With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain map**, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this study proposes an edge detection method for LiDAR images based on artificial intelligence technology. This paper first reviews the current state of research on LiDAR technology and image edge detection, introducing common edge detection algorithms and their applications in LiDAR image processing. Subsequently, a deep learning-based edge detection model is designed and implemented, optimizing the model training process through preprocessing and enhancement of the LiDAR image dataset. Experimental results indicate that the proposed method outperforms traditional methods in terms of detection accuracy and computational efficiency, showing significant practical application value. Finally, improvement strategies are proposed for the current method's shortcomings, and the improvements are validated through experiments. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09765 [pdf]

Application of Natural Language Processing in Financial Risk Detection

Authors: Liyang Wang, Yu Cheng, Ao Xiang, **gyu Zhang, Haowei Yang

Abstract: This paper explores the application of Natural Language Processing (NLP) in financial risk detection. By constructing an NLP-based financial risk detection model, this study aims to identify and predict potential risks in financial documents and communications. First, the fundamental concepts of NLP and its theoretical foundation, including text mining methods, NLP model design principles, and mac… ▽ More This paper explores the application of Natural Language Processing (NLP) in financial risk detection. By constructing an NLP-based financial risk detection model, this study aims to identify and predict potential risks in financial documents and communications. First, the fundamental concepts of NLP and its theoretical foundation, including text mining methods, NLP model design principles, and machine learning algorithms, are introduced. Second, the process of text data preprocessing and feature extraction is described. Finally, the effectiveness and predictive performance of the model are validated through empirical research. The results show that the NLP-based financial risk detection model performs excellently in risk identification and prediction, providing effective risk management tools for financial institutions. This study offers valuable references for the field of financial risk management, utilizing advanced NLP techniques to improve the accuracy and efficiency of financial risk detection. △ Less

Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09710 [pdf, other]

Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

Authors: Shilu Yuan, Dongfeng Li, Wei Liu, Xinxin Zhang, Meng Chen, Junjie Zhang, Yongshun Gong

Abstract: Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between differe… ▽ More Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between different-scale regions within the city. Different-scale geographical features can capture redundant information from the same spatial areas. In order to effectively learn multi-scale information across time and space, we propose an effective fine-grained urban flow inference model called UrbanMSR, which uses self-supervised contrastive learning to obtain dynamic multi-scale representations of neighborhood-level and city-level geographic information, and fuses multi-scale representations to improve fine-grained accuracy. The fusion of multi-scale representations enhances fine-grained. We validate the performance through extensive experiments on three real-world datasets. The resutls compared with state-of-the-art methods demonstrate the superiority of the proposed model. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09683 [pdf, other]

Interstellar Nitrogen Isotope Ratios: Measurements on tracers of C$^{14}$N and C$^{15}$N

Authors: J. L. Chen, J. S. Zhang, C. Henkel, Y. T. Yan, H. Z. Yu, Y. X. Wang, Y. P. Zou, J. Y. Zhao, X. Y. Wang

Abstract: The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios a… ▽ More The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios also including 12C/13C, which introduces additional uncertainties. Here we therefore present observations of C14N and its rare isotopologue, C15N, toward a sample of star forming regions, measured by the IRAM 30 m and/or the ARO 12 m telescope at $λ$ ~3 mm wavelength. For those 35 sources detected in both isotopologues, physical parameters are determined. Furthermore we have obtained nitrogen isotope ratios using the strongest hyperfine components of CN and C15N. For those sources showing small deviations from Local Thermodynamical Equilibrium and/or self-absorption, the weakest hyperfine component, likely free of the latter effect, was used to obtain reliable 14N/15N values. Our measured 14N/15N isotope ratios from C14N and C15N measurements are compatible with those from our earlier measurements of NH3 and 15NH3 (Paper I), i.e., increasing ratios to a Galacticentric distance of ~9 kpc. The unweighted second order polynomial fit yields $\frac{{\rm C^{14}N}}{{\rm C^{15}N}} = (-4.85 \pm 1.89)\;{\rm kpc^{-2}} \times R_{\rm GC}^{2} + (82.11 \pm 31.93) \;{\rm kpc^{-1}} \times R_{\rm GC} - (28.12 \pm 126.62)$. Toward the outer galaxy, the isotope ratio tends to decrease, supporting an earlier finding by H13CN/HC15N. Galactic chemical evolution models are consistent with our measurements of the 14N/15N isotope ratio, i.e. a rising trend from the Galactic center region to approximately 9 kpc, followed by a decreasing trend with increasing $R_{\rm GC}$ toward the outer Galaxy. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 34 pages, 9 figures, 6 tables

Journal ref: The Astrophysical Journal (2004)

arXiv:2406.09534 [pdf, other]

FeatNavigator: Automatic Feature Augmentation on Tabular Data

Authors: Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala

Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and gene… ▽ More Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality features in relational tables for ML models. FeatNavigator evaluates a feature from two aspects: (1) the intrinsic value of a feature towards an ML task (i.e., feature importance) and (2) the efficacy of a join path connecting the feature to the base table (i.e., integration quality). FeatNavigator strategically selects a small set of available features and their corresponding join paths to train a feature importance estimation model and an integration quality prediction model. Furthermore, FeatNavigator's search algorithm exploits both estimated feature importance and integration quality to identify the optimized feature augmentation plan. Our experimental results show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 15 pages, 41 figures

arXiv:2406.09485 [pdf, other]

Integrated Modeling, Verification, and Code Generation for Unmanned Aerial Systems

Authors: Jianyu Zhang, Long Zhang, Yixuan Wu, Linru Ma, Feng Yang

Abstract: Unmanned Aerial Systems (UAS) are currently widely used in safety-critical fields such as industrial production, military operations, and disaster relief. Due to the diversity and complexity of application scenarios, UAS have become increasingly intricate. The challenge of designing and implementing highly reliable UAS while effectively controlling development costs and enhancing efficiency is a p… ▽ More Unmanned Aerial Systems (UAS) are currently widely used in safety-critical fields such as industrial production, military operations, and disaster relief. Due to the diversity and complexity of application scenarios, UAS have become increasingly intricate. The challenge of designing and implementing highly reliable UAS while effectively controlling development costs and enhancing efficiency is a pressing issue faced by both academia and industry. Addressing this challenge, this paper aims to investigate an integrated approach to modeling, verification, and code generation for UAS. The paper begins by utilizing Architecture Analysis and Design Language (AADL) to model the UAS, proposing a set of generic UAS models. Based on these models, formal specifications are written to describe the system's safety properties and functions. Finally, the paper introduces a method for generating flight controller code for UAS based on the verified models. Experiments conducted with the proposed method demonstrate its effectiveness in identifying potential vulnerabilities in the UAS during the early design phase and in generating viable flight controller code from the verified models. This approach can enhance the efficiency of designing and verifying high-reliability UAS. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09313 [pdf, other]

doi 10.1145/3660803

Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

Authors: Shuqing Li, Cuiyun Gao, Jian** Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of the user's brain, leading to user discomfort and even adverse health effects. Such issues commonly exist but remain underexplored. We conduct an empirical analysis on 282 SVI bug reports from 15 VR platforms, summarizing 15 types of manifestations. The empirical analysis reveals that automatically detecting SVI issues is challenging, mainly because: (1) lack of training data; (2) the manifestations of SVI issues are diverse, complicated, and often application-specific; (3) most accessible VR apps are closed-source commercial software. Existing pattern-based supervised classification approaches may be inapplicable or ineffective in detecting the SVI issues. To counter these challenges, we propose an unsupervised black-box testing framework named StereoID to identify the stereoscopic visual inconsistencies, based only on the rendered GUI states. StereoID generates a synthetic right-eye image based on the actual left-eye image and computes distances between the synthetic right-eye image and the actual right-eye image to detect SVI issues. We propose a depth-aware conditional stereo image translator to power the image generation process, which captures the expected perspective shifts between left-eye and right-eye images. We build a large-scale unlabeled VR stereo screenshot dataset with larger than 171K images from 288 real-world VR apps for experiments. After substantial experiments, StereoID demonstrates superior performance for detecting SVI issues in both user reports and wild VR apps. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

arXiv:2406.09304 [pdf]

Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.09288 [pdf, other]

Zero-Shot Learning Over Large Output Spaces : Utilizing Indirect Knowledge Extraction from Large Language Models

Authors: **bin Zhang, Nasib Ullah, Rohit Babbar

Abstract: Extreme Multi-label Learning (XMC) is a task that allocates the most relevant labels for an instance from a predefined label set. Extreme Zero-shot XMC (EZ-XMC) is a special setting of XMC wherein no supervision is provided; only the instances (raw text of the document) and the predetermined label set are given. The scenario is designed to address cold-start problems in categorization and recommen… ▽ More Extreme Multi-label Learning (XMC) is a task that allocates the most relevant labels for an instance from a predefined label set. Extreme Zero-shot XMC (EZ-XMC) is a special setting of XMC wherein no supervision is provided; only the instances (raw text of the document) and the predetermined label set are given. The scenario is designed to address cold-start problems in categorization and recommendation. Traditional state-of-the-art methods extract pseudo labels from the document title or segments. These labels from the document are used to train a zero-shot bi-encoder model. The main issue with these generated labels is their misalignment with the tagging task. In this work, we propose a framework to train a small bi-encoder model via the feedback from the large language model (LLM), the bi-encoder model encodes the document and labels into embeddings for retrieval. Our approach leverages the zero-shot ability of LLM to assess the correlation between labels and the document instead of using the low-quality labels extracted from the document itself. Our method also guarantees fast inference without the involvement of LLM. The performance of our approach outperforms the SOTA methods on various datasets while retaining a similar training time for large datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09270 [pdf, other]

Discovery and Extensive Follow-Up of SN 2024ggi, a nearby type IIP supernova in NGC 3621

Authors: Ting-Wan Chen, Sheng Yang, Shubham Srivastav, Takashi J. Moriya, Stephen J. Smartt, Sofia Rest, Armin Rest, Hsing Wen Lin, Hao-Yu Miao, Yu-Chi Cheng, Amar Aryan, Chia-Yu Cheng, Morgan Fraser, Li-Ching Huang, Meng-Han Lee, Cheng-Han Lai, Yu Hsuan Liu, Aiswarya Sankar. K, Ken W. Smith, Heloise F. Stevance, Ze-Ning Wang, Joseph P. Anderson, Charlotte R. Angus, Thomas de Boer, Kenneth Chambers , et al. (23 additional authors not shown)

Abstract: We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o… ▽ More We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o- and r-band light curves show a rapid rise of 3.3 magnitudes in 13.7 hours, much faster than SN 2023ixf (another recent, nearby, and well-observed SN II). Between 13.8 and 18.8 hours after explosion SN 2024ggi became bluer, with u-g colour drop** from 0.53 to 0.15 mag. The rapid blueward evolution indicates a wind shock breakout (SBO) scenario. No hour-long brightening expected for the SBO from a bare stellar surface was detected during our observations. The classification spectrum, taken 17 hours after the SN explosion, shows flash features of high-ionization species such as Balmer lines, He I, C III, and N III. Detailed light curve modeling reveals critical insights into the properties of the circumstellar material (CSM). Our favoured model has an explosion energy of 2 x 10^51 erg, a mass-loss rate of 10^-3 solar_mass/yr (with an assumed 10 km/s wind), and a confined CSM radius of 6 x 10^14 cm. The corresponding CSM mass is 0.4 solar_mass. Comparisons with SN 2023ixf highlight that SN 2024ggi has a smaller CSM density, resulting in a faster rise and fainter UV flux. The extensive dataset and the involvement of citizen astronomers underscore that a collaborative network is essential for SBO searches, leading to more precise and comprehensive SN characterizations. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures in manuscript, 6 pages in appendix, submitted to ApJL

arXiv:2406.09190 [pdf, other]

Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation

Authors: Zhiqiang Xiao, Xianda Liu, Yong Zeng, J. Andrew Zhang, Shi **, Rui Zhang

Abstract: Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design… ▽ More Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09187 [pdf, other]

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

Abstract: The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the f… ▽ More The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. Specifically, GuardAgent oversees a target LLM agent by checking whether its inputs/outputs satisfy a set of given guard requests defined by the users. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines. In both steps, an LLM is utilized as the core reasoning component, supplemented by in-context demonstrations retrieved from a memory module. Such knowledge-enabled reasoning allows GuardAgent to understand various textual guard requests and accurately "translate" them into executable code that provides reliable guardrails. Furthermore, GuardAgent is equipped with an extendable toolbox containing functions and APIs and requires no additional LLM training, which underscores its generalization capabilities and low operational overhead. Additionally, we propose two novel benchmarks: an EICU-AC benchmark for assessing privacy-related access control for healthcare agents and a Mind2Web-SC benchmark for safety evaluation for web agents. We show the effectiveness of GuardAgent on these two benchmarks with 98.7% and 90.0% accuracy in moderating invalid inputs and outputs for the two types of agents, respectively. We also show that GuardAgent is able to define novel functions in adaption to emergent LLM agents and guard requests, which underscores its strong generalization capabilities. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09095 [pdf, other]

Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

Authors: Yuhao Dan, Junfeng Tian, Jie Zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

Abstract: Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is… ▽ More Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is quite common in our daily life. In this paper, we introduce a new D2T task named comparative logical relation generation (CLRG). Additionally, we propose a Comparative Logic (CoLo) based text generation method, which generates texts following specific comparative logical relations with contrastive learning. Specifically, we first construct various positive and negative samples by fine-grained perturbations in entities, aspects and opinions. Then, we perform contrastive learning in the encoder layer to have a better understanding of the comparative logical relations, and integrate it in the decoder layer to guide the model to correctly generate the relations. Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations. Extensive experiments show that our method achieves impressive performance in both automatic and human evaluations. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09072 [pdf, other]

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

Authors: Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min zhang

Abstract: Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world. Current temporal reasoning datasets are limited to questions about single or isolated events, falling short in mirroring the realistic temporal characteristics involving concurrent nature and intricate temporal interconnections. In this paper, we introduce CoTempQA, a comprehensive co-temporal Question Answ… ▽ More Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world. Current temporal reasoning datasets are limited to questions about single or isolated events, falling short in mirroring the realistic temporal characteristics involving concurrent nature and intricate temporal interconnections. In this paper, we introduce CoTempQA, a comprehensive co-temporal Question Answering (QA) benchmark containing four co-temporal scenarios (Equal, Overlap, During, Mix) with 4,748 samples for evaluating the co-temporal comprehension and reasoning abilities of LLMs. Our extensive experiments reveal a significant gap between the performance of current LLMs and human-level reasoning on CoTempQA tasks. Even when enhanced with Chain of Thought (CoT) methodologies, models consistently struggle with our task. In our preliminary exploration, we discovered that mathematical reasoning plays a significant role in handling co-temporal events and proposed a strategy to boost LLMs' co-temporal reasoning from a mathematical perspective. We hope that our CoTempQA datasets will encourage further advancements in improving the co-temporal reasoning capabilities of LLMs. Our code is available at https://github.com/zhaochen0110/Cotempqa. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: This paper has been accepted to the ACL 2024 main conference

arXiv:2406.09016 [pdf, other]

Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

Authors: Gaochang Wu, Yapeng Zhang, Lan Deng, **gxin Zhang, Tianyou Chai

Abstract: Fused Magnesium Furnace (FMF) is a crucial industrial equipment in the production of magnesia, and anomaly detection plays a pivotal role in ensuring its efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while… ▽ More Fused Magnesium Furnace (FMF) is a crucial industrial equipment in the production of magnesia, and anomaly detection plays a pivotal role in ensuring its efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while overlooking the intrinsic correlation of cross-modal information. This paper proposes a cross-modal Transformer (dubbed FmFormer), designed to facilitate anomaly detection in fused magnesium smelting processes by exploring the correlation between visual features (video) and process variables (current). Our approach introduces a novel tokenization paradigm to effectively bridge the substantial dimensionality gap between the 3D video modality and the 1D current modality in a multiscale manner, enabling a hierarchical reconstruction of pixel-level anomaly detection. Subsequently, the FmFormer leverages self-attention to learn internal features within each modality and bidirectional cross-attention to capture correlations across modalities. To validate the effectiveness of the proposed method, we also present a pioneering cross-modal benchmark of the fused magnesium smelting process, featuring synchronously acquired video and current data for over 2.2 million samples. Leveraging cross-modal learning, the proposed FmFormer achieves state-of-the-art performance in detecting anomalies, particularly under extreme interferences such as current fluctuations and visual occlusion caused by heavy water mist. The presented methodology and benchmark may be applicable to other industrial applications with some amendments. The benchmark will be released at https://github.com/GaochangWu/FMF-Benchmark. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 figures, 5 tables. Submitted to IEEE

arXiv:2406.08757 [pdf, other]

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Authors: Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

Abstract: Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,… ▽ More Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https://sprateam-ustc.github.io/SRFUND △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: NeurIPS 2024 Track on Datasets and Benchmarks under review

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08673 [pdf, ps, other]

HelpSteer2: Open-source dataset for training top-performing reward models

Authors: Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev

Abstract: High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods… ▽ More High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers. To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). Using a powerful internal base model trained on HelpSteer2, we are able to achieve the SOTA score (92.0%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of June 12th, 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e.g., HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. In particular, we propose SteerLM 2.0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models. HelpSteer2 is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and code is available at https://github.com/NVIDIA/NeMo-Aligner △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08646 [pdf, other]

PETSc/TAO Developments for Early Exascale Systems

Authors: Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang

Abstract: The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascal… ▽ More The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, submitted to IJHPCA

MSC Class: 00A69

arXiv:2406.08465 [pdf, other]

Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data

Authors: Jiaojiao Zhang, Jiang Hu, Anthony Man-Cho So, Mikael Johansson

Abstract: Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated l… ▽ More Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08266 [pdf, other]

Refining Self-Supervised Learnt Speech Representation using Brain Activations

Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Li** Chen, Jie Zhang, Zhenhua Ling

Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work, we therefore propose to use the brain activations recorded by fMRI to refine the often-used wav2vec2.0 model by aligning model representations toward human neural responses. Experimental results on SUPERB reveal that this operation is beneficial for several downstream tasks, e.g., speaker verification, automatic speech recognition, intent classification.One can then consider the proposed method as a new alternative to improve self-supervised speech models. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: accpeted by Interspeech2024

arXiv:2406.08252 [pdf, other]

Optimal Sharding for Scalable Blockchains with Deconstructed SMR

Authors: Jianting Zhang, Zhongtang Luo, Raghavendra Ramesh, Aniket Kate

Abstract: Sharding is proposed to enhance blockchain scalability. However, a size-security dilemma where every shard must be large enough to ensure its security constrains the efficacy of individual shards and the degree of sharding itself. Most existing sharding solutions therefore rely on either weakening the adversary or making stronger assumptions on network links. This paper presents Arete, an optima… ▽ More Sharding is proposed to enhance blockchain scalability. However, a size-security dilemma where every shard must be large enough to ensure its security constrains the efficacy of individual shards and the degree of sharding itself. Most existing sharding solutions therefore rely on either weakening the adversary or making stronger assumptions on network links. This paper presents Arete, an optimally scalable blockchain sharding protocol designed to resolve the dilemma based on an observation that if individual shards can tolerate a higher fraction of (Byzantine) faults, we can securely create smaller shards in a larger quantity. The key idea of Arete, therefore, is to improve the security resilience/threshold of shards by dividing the blockchain's State Machine Replication (SMR) process itself. Similar to modern blockchains, Arete first decouples SMR in three steps: transaction dissemination, ordering, and execution. However, unlike other blockchains, for Arete, a single ordering shard performs the ordering task while multiple processing shards perform the dissemination and execution of blocks. As processing shards do not run consensus, each of those can tolerate up to half compromised nodes. Moreover, the SMR process in the ordering shard is lightweight as it only operates on the block digests. Second, Arete considers safety and liveness against Byzantine failures separately to improve the safety threshold further while tolerating temporary liveness violations in a controlled manner. Apart from the creation of more optimal-size shards, such a deconstructed SMR scheme also empowers us to devise a novel certify-order-execute architecture to fully parallelize transaction handling, thereby improving the performance of sharded blockchain systems. We implement Arete and evaluate it on a geo-distributed AWS environment, showing that Arete outperforms the state-of-the-art sharding protocol. △ Less

Submitted 11 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08225 [pdf, ps, other]

Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08160 [pdf, other]

Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments

Authors: Shoujie Li, Yan Huang, Changqing Guo, Tong Wu, Jiawei Zhang, Linrui Zhang, Wenbo Ding

Abstract: The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates exte… ▽ More The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates extensive chemical and robotic knowledge. Chemistry3D not only enables robots to perform chemical experiments but also provides real-time visualization of temperature, color, and pH changes during reactions. Built on the NVIDIA Omniverse platform, Chemistry3D offers interfaces for robot operation, visual inspection, and liquid flow control, facilitating the simulation of special objects such as liquids and transparent entities. Leveraging this toolkit, we have devised RL tasks, object detection, and robot operation scenarios. Additionally, to discern disparities between the rendering engine and the real world, we conducted transparent object detection experiments using Sim2Real, validating the toolkit's exceptional simulation performance. The source code is available at https://github.com/huangyan28/Chemistry3D, and a related tutorial can be found at https://www.omni-chemistry.com. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08118 [pdf, ps, other]

Non-maximal Anosov representations from surface groups to $\mathrm{SO}_0(2,3)$

Authors: Junming Zhang

Abstract: We prove the representation given by a stable $α_1$-cyclic parabolic $\mathrm{SO}_0(2,3)$-Higgs bundle through the non-Abelian Hodge correspondence is $\{α_2\}$-almost dominated. This is a generalization of Filip's result on weight $3$ variation of Hodge structures and answers a question asked by Collier, Tholozan and Toulisse. We prove the representation given by a stable $α_1$-cyclic parabolic $\mathrm{SO}_0(2,3)$-Higgs bundle through the non-Abelian Hodge correspondence is $\{α_2\}$-almost dominated. This is a generalization of Filip's result on weight $3$ variation of Hodge structures and answers a question asked by Collier, Tholozan and Toulisse. △ Less

Submitted 19 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Proofs of Theorem 4.3 and Lemma 4.4 are added for completeness, 31 pages, comments are very welcome

arXiv:2406.08079 [pdf, other]

A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

Authors: Lixian Zhang, Yi Zhao, Runmin Dong, **xiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu

Abstract: Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limita… ▽ More Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limitation persists: the inability to effectively integrate spatial, temporal, and spectral information within a single unified model. To unlock the potential of RS data, we construct a Spatial-Temporal-Spectral Structured Dataset (STSSD) characterized by the incorporation of multiple RS sources, diverse coverage, unified locations within image sets, and heterogeneity within images. Building upon this structured dataset, we propose an Anchor-Aware Masked AutoEncoder method (A$^{2}$-MAE), leveraging intrinsic complementary information from the different kinds of images and geo-information to reconstruct the masked patches during the pre-training phase. A$^{2}$-MAE integrates an anchor-aware masking strategy and a geographic encoding module to comprehensively exploit the properties of RS images. Specifically, the proposed anchor-aware masking strategy dynamically adapts the masking process based on the meta-information of a pre-selected anchor image, thereby facilitating the training on images captured by diverse types of RS sources within one model. Furthermore, we propose a geographic encoding method to leverage accurate spatial patterns, enhancing the model generalization capabilities for downstream applications that are generally location-related. Extensive experiments demonstrate our method achieves comprehensive improvements across various downstream tasks compared with existing RS pre-training methods, including image classification, semantic segmentation, and change detection tasks. △ Less

Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Showing 201–250 of 13,120 results for author: Zhang, J