-
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Authors:
Zhenyu Zhang,
Bingguang Hao,
**peng Li,
Zekai Zhang,
Dongyan Zhao
Abstract:
Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis…
▽ More
Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis of the stability of LLMs in resisting prompt perturbations in real-world scenarios. In this work, we propose to evaluate the ease-of-use of LLMs and construct E-Bench, simulating the actual situation of human use from synonymous perturbation (including paraphrasing, simplification, and colloquialism) and typographical perturbation (such as ty**). On this basis, we also discuss the combination of these two types of perturbation and analyze the main reasons for performance degradation. Experimental results indicate that with the increase of model size, although the ease-of-use are significantly improved, there is still a long way to go to build a sufficiently user-friendly model.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding
Authors:
Zhejun Zhang,
Christos Sakaridis,
Luc Van Gool
Abstract:
In this technical report we present TrafficBots V1.5, a baseline method for the closed-loop simulation of traffic agents. TrafficBots V1.5 achieves baseline-level performance and a 3rd place ranking in the Waymo Open Sim Agents Challenge (WOSAC) 2024. It is a simple baseline that combines TrafficBots, a CVAE-based multi-agent policy conditioned on each agent's individual destination and personalit…
▽ More
In this technical report we present TrafficBots V1.5, a baseline method for the closed-loop simulation of traffic agents. TrafficBots V1.5 achieves baseline-level performance and a 3rd place ranking in the Waymo Open Sim Agents Challenge (WOSAC) 2024. It is a simple baseline that combines TrafficBots, a CVAE-based multi-agent policy conditioned on each agent's individual destination and personality, and HPTR, the heterogeneous polyline transformer with relative pose encoding. To improve the performance on the WOSAC leaderboard, we apply scheduled teacher-forcing at the training time and we filter the sampled scenarios at the inference time. The code is available at https://github.com/zhejz/TrafficBotsV1.5.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
Authors:
Zhikai Zhang,
Yitang Li,
Haofeng Huang,
Mingxian Lin,
Li Yi
Abstract:
Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same…
▽ More
Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same time, foundation models trained with internet-scale image and text data have demonstrated surprising world knowledge and reasoning ability for various downstream tasks. Utilizing these foundation models may help with human motion synthesis, which some recent works have superficially explored. However, these methods didn't fully unveil the foundation models' potential for this task and only support several simple actions and environments. In this paper, we for the first time, without any motion data, explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs across any motion task and environment. Our framework can be split into two stages: 1) sequential keyframe generation by utilizing MLLMs as a keyframe designer and animator; 2) motion filling between keyframes through interpolation and motion tracking. Our method can achieve general human motion synthesis for many downstream tasks. The promising results demonstrate the worth of mocap-free human motion synthesis aided by MLLMs and pave the way for future research.
△ Less
Submitted 21 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
Authors:
Guowen Zhang,
Lue Fan,
Chenhang He,
Zhen Lei,
Zhaoxiang Zhang,
Lei Zhang
Abstract:
Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based me…
▽ More
Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based methods due to the quadratic complexity of Transformers with feature sizes. Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence. The linear complexity of SSMs encourages our group-free design, alleviating the loss of spatial proximity of voxels. To further enhance the spatial proximity, we propose a Dual-scale SSM Block to establish a hierarchical structure, enabling a larger receptive field in the 1D serialization curve, as well as more complete local regions in 3D space. Moreover, we implicitly apply window partition under the group-free framework by positional encoding, which further enhances spatial proximity by encoding voxel positional information. Our experiments on Waymo Open Dataset and nuScenes dataset show that Voxel Mamba not only achieves higher accuracy than state-of-the-art methods, but also demonstrates significant advantages in computational efficiency.
△ Less
Submitted 18 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models
Authors:
Chengzhengxu Li,
Xiaoming Liu,
Zhaohan Zhang,
Yichen Wang,
Chen Liu,
Yu Lan,
Chao Shen
Abstract:
Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep la…
▽ More
Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep layers are more generalizable and (ii) Prompts with more stable attention distributions in PLMs' deep layers are more generalizable. Thus, we offer a fresh objective towards domain-generalizable prompts optimization named "Concentration", which represents the "lookback" attention from the current decoding token to the prompt tokens, to increase the attention strength on prompts and reduce the fluctuation of attention distribution. We adapt this new objective to popular soft prompt and hard prompt optimization methods, respectively. Extensive experiments demonstrate that our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting, while maintaining satisfying in-domain performance. The promising results validate the effectiveness of our proposed prompt optimization objective and provide key insights into domain-generalizable prompts.
△ Less
Submitted 27 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
A simple and fast finite difference method for the integral fractional Laplacian of variable order
Authors:
Zhaopeng Hao,
Siyuan Shi,
Zhongqiang Zhang,
Rui Du
Abstract:
For the fractional Laplacian of variable order, an efficient and accurate numerical evaluation in multi-dimension is a challenge for the nature of a singular integral. We propose a simple and easy-to-implement finite difference scheme for the multi-dimensional variable-order fractional Laplacian defined by a hypersingular integral. We prove that the scheme is of second-order convergence and apply…
▽ More
For the fractional Laplacian of variable order, an efficient and accurate numerical evaluation in multi-dimension is a challenge for the nature of a singular integral. We propose a simple and easy-to-implement finite difference scheme for the multi-dimensional variable-order fractional Laplacian defined by a hypersingular integral. We prove that the scheme is of second-order convergence and apply the developed finite difference scheme to solve various equations with the variable-order fractional Laplacian. We present a fast solver with quasi-linear complexity of the scheme for computing variable-order fractional Laplacian and corresponding PDEs. Several numerical examples demonstrate the accuracy and efficiency of our algorithm and verify our theory.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A Unified Graph Selective Prompt Learning for Graph Neural Networks
Authors:
Bo Jiang,
Hao Wu,
Ziyan Zhang,
Beibei Wang,
** Tang
Abstract:
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to mod…
▽ More
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to modify the input graph data by adding some (learnable) prompt vectors into graph node features to better align with the downstream tasks on the smaller dataset. However, existing GPFs generally suffer from two main limitations. First, GPFs generally focus on node prompt learning which ignore the prompting for graph edges. Second, existing GPFs generally conduct the prompt learning on all nodes equally which fails to capture the importances of different nodes and may perform sensitively w.r.t noisy nodes in aligning with the downstream tasks. To address these issues, in this paper, we propose a new unified Graph Selective Prompt Feature learning (GSPF) for GNN fine-tuning. The proposed GSPF integrates the prompt learning on both graph node and edge together, which thus provides a unified prompt model for the graph data. Moreover, it conducts prompt learning selectively on nodes and edges by concentrating on the important nodes and edges for prompting which thus make our model be more reliable and compact. Experimental results on many benchmark datasets demonstrate the effectiveness and advantages of the proposed GSPF method.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Learning Flexible Time-windowed Granger Causality Integrating Heterogeneous Interventional Time Series Data
Authors:
Ziyi Zhang,
Shaogang Ren,
Xiaoning Qian,
Nick Duffield
Abstract:
Granger causality, commonly used for inferring causal structures from time series data, has been adopted in widespread applications across various fields due to its intuitive explainability and high compatibility with emerging deep neural network prediction models. To alleviate challenges in better deciphering causal structures unambiguously from time series, the use of interventional data has bec…
▽ More
Granger causality, commonly used for inferring causal structures from time series data, has been adopted in widespread applications across various fields due to its intuitive explainability and high compatibility with emerging deep neural network prediction models. To alleviate challenges in better deciphering causal structures unambiguously from time series, the use of interventional data has become a practical approach. However, existing methods have yet to be explored in the context of imperfect interventions with unknown targets, which are more common and often more beneficial in a wide range of real-world applications. Additionally, the identifiability issues of Granger causality with unknown interventional targets in complex network models remain unsolved. Our work presents a theoretically-grounded method that infers Granger causal structure and identifies unknown targets by leveraging heterogeneous interventional time series data. We further illustrate that learning Granger causal structure and recovering interventional targets can mutually promote each other. Comparative experiments demonstrate that our method outperforms several robust baseline methods in learning Granger causal structure from interventional time series data.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Byzantine-Robust Decentralized Federated Learning
Authors:
Minghong Fang,
Zifan Zhang,
Hairi,
Prashant Khanduri,
Jia Liu,
Songtao Lu,
Yuchen Liu,
Neil Gong
Abstract:
Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bot…
▽ More
Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks.
△ Less
Submitted 20 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs
Authors:
Zhuofeng Li,
Zixing Gou,
Xiangnan Zhang,
Zhongyuan Liu,
Sirui Li,
Yuntong Hu,
Chen Ling,
Zheng Zhang,
Liang Zhao
Abstract:
Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotat…
▽ More
Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in develo** methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks. The entire TEG-DB project is publicly accessible as an open-source repository on Github, accessible at https://github.com/Zhuofeng-Li/TEG-Benchmark.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Hardware-based stack buffer overflow attack detection on RISC-V architectures
Authors:
Cristiano Pegoraro Chenet,
Ziteng Zhang,
Alessandro Savino,
Stefano Di Carlo
Abstract:
This work evaluates how well hardware-based approaches detect stack buffer overflow (SBO) attacks in RISC-V systems. We conducted simulations on the PULP platform and examined micro-architecture events using semi-supervised anomaly detection techniques. The findings showed the challenge of detection performance. Thus, a potential solution combines software and hardware-based detectors concurrently…
▽ More
This work evaluates how well hardware-based approaches detect stack buffer overflow (SBO) attacks in RISC-V systems. We conducted simulations on the PULP platform and examined micro-architecture events using semi-supervised anomaly detection techniques. The findings showed the challenge of detection performance. Thus, a potential solution combines software and hardware-based detectors concurrently, with hardware as the primary defense. The hardware-based approaches present compelling benefits that could enhance RISC-V-based architectures.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
On the Worst Prompt Performance of Large Language Models
Authors:
Bowen Cao,
Deng Cai,
Zhisong Zhang,
Yuexian Zou,
Wai Lam
Abstract:
The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail…
▽ More
The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dip** as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs.
△ Less
Submitted 21 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subty** based on EHR Data
Authors:
Ziyang Zhang,
Hejie Cui,
Ran Xu,
Yuzhang Xie,
Joyce C. Ho,
Carl Yang
Abstract:
The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this w…
▽ More
The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System
Authors:
Ziheng Zhang,
Wen Chen,
Qingqing Wu,
Zhendong Li,
Xusheng Zhu,
**gfeng Chen,
Nan Cheng
Abstract:
This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl…
▽ More
This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multiple targets and process echo signals, respectively. Based on the above model, we derive the Fisher information matrix of the echo signal with respect to the time delay. By employing the chain rule and exploiting the geometric relationship between time delay and position, the Cramer-Rao bound (CRB) for estimating the target's Cartesian coordinate position is derived. Then, we propose a two-stage algorithmic framework to minimize CRB in single- and multi-target localization systems by joint optimizing active beamforming at BS, passive beamforming at multiple IRSs and IRS selection. For the single-target case, we derive the optimal closed-form solution for multiple IRSs coefficients design and propose a lowcomplexity algorithm based on alternating direction method of multipliers to obtain the optimal solution for active beaming design. For the multi-target case, alternating optimization is used to transform the original problem into two subproblems where semi-definite relaxation and successive convex approximation are applied to tackle the quadraticity and indefiniteness in the CRB expression, respectively. Finally, numerical simulation results validate the effectiveness of the proposed algorithm for multiple IRSs collaborative localization system compared to other benchmark schemes as well as the significant performance gains.
△ Less
Submitted 17 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks
Authors:
Zhiwei Zhang,
Minhua Lin,
Junjie Xu,
Zongyu Wu,
Enyan Dai,
Suhang Wang
Abstract:
Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor at…
▽ More
Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor attacks where generated triggers have different properties. Hence, we first empirically verify that prediction variance under edge drop** is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge drop** to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. Furthermore, we introduce a novel robust training strategy to efficiently counteract the impact of the triggers. Extensive experiments on real-world datasets show that our framework can effectively identify poisoned nodes, significantly degrade the attack success rate, and maintain clean accuracy when defending against various types of graph backdoor attacks with different properties.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst
Authors:
**gtao Cao,
Zheng Zhang,
Hongru Wang,
Bin Liang,
Hao Wang,
Kam-Fai Wong
Abstract:
Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Langu…
▽ More
Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the understanding capability of GPT-4V for harmful memes to our system. Our framework achieves top-1 at the public leaderboard of the Online Safety Prize Challenge hosted by AI Singapore, with the AUROC as 0.7749 and accuracy as 0.7087, significantly ahead of the other teams. Notably, our approach outperforms previous benchmarks, with FLAVA achieving an AUROC of 0.5695 and VisualBERT an AUROC of 0.5561.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Automated Molecular Concept Generation and Labeling with Large Language Models
Authors:
Shichang Zhang,
Botao Xia,
Zimin Zhang,
Qianli Wu,
Fang Sun,
Ziniu Hu,
Yizhou Sun
Abstract:
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like G…
▽ More
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like Graph Neural Networks (GNNs), primarily due to their requirement for predefined concepts and manual label for each instance, which demand domain knowledge and can be labor-intensive. This paper introduces a novel framework for Automated Molecular Concept (AutoMolCo) generation and labeling. AutoMolCo leverages the knowledge in Large Language Models (LLMs) to automatically generate predictive molecular concepts and label them for each molecule. Such procedures are repeated through iterative interactions with LLMs to refine concepts, enabling simple linear models on the refined concepts to outperform GNNs and LLM in-context learning on several benchmarks. The whole AutoMolCo framework is automated without any human knowledge inputs in either concept generation, labeling, or refinement, thereby surpassing the limitations of extant CMs while maintaining their explainability and allowing easy intervention. Through systematic experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets, we demonstrate that the AutoMolCo-induced explainable CMs are beneficial and promising for molecular science research.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors
Authors:
Tingyu Cheng,
Zhihan Zhang,
Han Huang,
Yingting Gao,
Wei Sun,
Gregory D. Abowd,
HyunJoo Oh,
Josiah Hester
Abstract:
For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi…
▽ More
For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materials-specifically, polyvinyl alcohol (PVA) and liquid metal (LM)-alongside accessible manufacturing techniques to produce electronic components and systems with versatile form factors. Our work centers on the development of recyclable electronics through three methods: 1) creating sheet electronics by screen printing LM traces on PVA substrates; 2) develo** foam-based electronics by immersing mechanically stirred PVA foam into an LM solution; and 3) fabricating recyclable electronic tubes by injecting LM into mold cast PVA tubes, which can then be woven into various structures. To further assess the sustainability of our proposed methods, we conducted a life cycle assessment (LCA) to evaluate the environmental impact of our recyclable electronics in comparison to their conventional counterparts.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Molecular gas excitation in the circumgalactic medium of MACS1931-26
Authors:
L. Ghodsi,
J. Zhou,
P. Andreani,
C. De Breuck,
A. W. S. Man,
Y. Miyamoto,
T. G. Bisbas,
A. Lundgren,
Z. -Y. Zhang
Abstract:
The evolution of galaxies is largely affected by exchanging material with their close environment, the circumgalactic medium (CGM). In this work, we investigate the CGM and the interstellar medium (ISM) of the bright central galaxy (BCG) of the galaxy cluster, MACS1931-26 at z~0.35. We detected [CI](2-1), CO(1-0), and CO(7-6) emission lines with the APEX 12-m and NRO 45-m telescopes. We complement…
▽ More
The evolution of galaxies is largely affected by exchanging material with their close environment, the circumgalactic medium (CGM). In this work, we investigate the CGM and the interstellar medium (ISM) of the bright central galaxy (BCG) of the galaxy cluster, MACS1931-26 at z~0.35. We detected [CI](2-1), CO(1-0), and CO(7-6) emission lines with the APEX 12-m and NRO 45-m telescopes. We complemented these single-dish observations with CO(1-0), CO(3-2), and CO(4-3) ALMA interferometric data and inferred the cold molecular hydrogen physical properties. Using a modified large velocity gradient (LVG) model, we modelled the CO and CI emission of the CGM and BCG to extract the gas thermodynamical properties, including the kinetic temperature, the density, and the virialisation factor. Our study shows that the gas in the BCG is highly excited, comparable to the gas in local ultra luminous infrared galaxies (ULIRGs), while the CGM is likely less excited, colder, less dense, and less bound compared to the ISM of the BCG. The molecular hydrogen mass of the whole system derived using [CI](2-1) is larger than the mass derived from CO(1-0) in literature, showing that part of the gas in this system is CO-poor. Additional spatially resolved CI observations in both transitions, CO(1-0) and [CI](2-1), and the completion of the CO SLED with higher CO transitions are crucial to trace the different phases of the gas in such systems and constrain their properties.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
JWST/NIRCam 4-5 $μ$m Imaging of the Giant Planet AF Lep b
Authors:
Kyle Franson,
William O. Balmer,
Brendan P. Bowler,
Laurent Pueyo,
Yifan Zhou,
Emily Rickman,
Zhoujian Zhang,
Sagnick Mukherjee,
Tim D. Pearce,
Daniella C. Bardalez Gagliuffi,
Lauren I. Biddle,
Timothy D. Brandt,
Rachel Bowens-Rubin,
Justin R. Crepp,
James W. Davidson, Jr.,
Jacqueline Faherty,
Christian Ginski,
Elliott P. Horch,
Marvin Morgan,
Caroline V. Morley,
Marshall D. Perrin,
Aniket Sanghi,
Maissa Salama,
Christopher A. Theissen,
Quang H. Tran
, et al. (1 additional authors not shown)
Abstract:
With a dynamical mass of $3 \, M_\mathrm{Jup}$, the recently discovered giant planet AF Lep b is the lowest-mass imaged planet with a direct mass measurement. Its youth and spectral type near the L/T transition make it a promising target to study the impact of clouds and atmospheric chemistry at low surface gravities. In this work, we present JWST/NIRCam imaging of AF Lep b. Across two epochs, we…
▽ More
With a dynamical mass of $3 \, M_\mathrm{Jup}$, the recently discovered giant planet AF Lep b is the lowest-mass imaged planet with a direct mass measurement. Its youth and spectral type near the L/T transition make it a promising target to study the impact of clouds and atmospheric chemistry at low surface gravities. In this work, we present JWST/NIRCam imaging of AF Lep b. Across two epochs, we detect AF Lep b in F444W ($4.4 \, \mathrm{μm}$) with S/N ratios of 9.6 and 8.7, respectively. At the planet's separation of $320 \, \mathrm{mas}$ during the observations, the coronagraphic throughput is ${\approx}7\%$, demonstrating that NIRCam's excellent sensitivity persists down to small separations. The F444W photometry of AF Lep b affirms the presence of disequilibrium carbon chemistry and enhanced atmospheric metallicity. These observations also place deep limits on wider-separation planets in the system, ruling out $1.1 \, M_\mathrm{Jup}$ planets beyond $15.6 \, \mathrm{au}$ (0.58 arcsec), $1.1 \, M_\mathrm{Sat}$ planets beyond $27 \, \mathrm{au}$ (1 arcsec), and $2.8 \, M_\mathrm{Nep}$ planets beyond $67 \, \mathrm{au}$ (2.5 arcsec). We also present new Keck/NIRC2 $L'$ imaging of AF Lep b; combining this with the two epochs of F444W photometry and previous Keck $L'$ photometry provides limits on the long-term 3-$5 \, \mathrm{μm}$ variability of AF Lep b on months-to-years timescales. AF Lep b is the closest-separation planet imaged with JWST to date, demonstrating that planets can be recovered well inside the nominal (50% throughput) NIRCam coronagraph inner working angle.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Authors:
Miaosen Zhang,
Yixuan Wei,
Zhen Xing,
Yifei Ma,
Zuxuan Wu,
Ji Li,
Zheng Zhang,
Qi Dai,
Chong Luo,
Xin Geng,
Baining Guo
Abstract:
Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.…
▽ More
Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system. Advanced retrieval systems usually adopt a cascade of aesthetic models as re-rankers or filters, which are limited to low-level features like saturation and perform poorly when stylistic, cultural or knowledge contexts are involved. We find that utilizing the reasoning ability of large language models (LLMs) to rephrase the search query and extend the aesthetic expectations can make up for this shortcoming. Based on the above findings, we propose a preference-based reinforcement learning method that fine-tunes the vision models to distill the knowledge from both LLMs reasoning and the aesthetic models to better align the vision models with human aesthetics. Meanwhile, with rare benchmarks designed for evaluating retrieval systems, we leverage large multi-modality model (LMM) to evaluate the aesthetic performance with their strong abilities. As aesthetic assessment is one of the most subjective tasks, to validate the robustness of LMM, we further propose a novel dataset named HPIR to benchmark the alignment with human aesthetics. Experiments demonstrate that our method significantly enhances the aesthetic behaviors of the vision models, under several metrics. We believe the proposed algorithm can be a general practice for aligning vision models with human values.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
Authors:
Baiang Li,
Sizhuo Ma,
Yanhong Zeng,
Xiaogang Xu,
Youqing Fang,
Zhao Zhang,
Jian Wang,
Kai Chen
Abstract:
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness…
▽ More
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while kee** the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
Authors:
Chunyi Li,
Xiele Wu,
Haoning Wu,
Donghui Feng,
Zicheng Zhang,
Guo Lu,
Xiongkuo Min,
Xiaohong Liu,
Guangtao Zhai,
Weisi Lin
Abstract:
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in…
▽ More
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Authors:
Ke Fan,
Zechen Bai,
Tianjun Xiao,
Tong He,
Max Horn,
Yanwei Fu,
Francesco Locatello,
Zheng Zhang
Abstract:
Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot…
▽ More
Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only necessitates prior knowledge of the dataset but also overlooks the inherent variability in the number of objects present in each instance. To overcome this fundamental limitation, we present a novel complexity-aware object auto-encoder framework. Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots based on the content of the data. This is achieved by proposing a discrete slot sampling module that is responsible for selecting an appropriate number of slots from a candidate list. Furthermore, we introduce a masked slot decoder that suppresses unselected slots during the decoding process. Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models. Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research. Project will be available at https://kfan21.github.io/AdaSlot/
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Beyond Recommendations: From Backward to Forward AI Support of Pilots' Decision-Making Process
Authors:
Zelun Tony Zhang,
Sebastian S. Feger,
Lucas Dullenkopf,
Rulu Liao,
Lukas Süsslin,
Yuanting Liu,
Andreas Butz
Abstract:
AI is anticipated to enhance human decision-making in high-stakes domains like aviation, but adoption is often hindered by challenges such as inappropriate reliance and poor alignment with users' decision-making. Recent research suggests that a core underlying issue is the recommendation-centric design of many AI systems, i.e., they give end-to-end recommendations and ignore the rest of the decisi…
▽ More
AI is anticipated to enhance human decision-making in high-stakes domains like aviation, but adoption is often hindered by challenges such as inappropriate reliance and poor alignment with users' decision-making. Recent research suggests that a core underlying issue is the recommendation-centric design of many AI systems, i.e., they give end-to-end recommendations and ignore the rest of the decision-making process. Alternative support paradigms are rare, and it remains unclear how the few that do exist compare to recommendation-centric support. In this work, we aimed to empirically compare recommendation-centric support to an alternative paradigm, continuous support, in the context of diversions in aviation. We conducted a mixed-methods study with 32 professional pilots in a realistic setting. To ensure the quality of our study scenarios, we conducted a focus group with four additional pilots prior to the study. We found that continuous support can support pilots' decision-making in a forward direction, allowing them to think more beyond the limits of the system and make faster decisions when combined with recommendations, though the forward support can be disrupted. Participants' statements further suggest a shift in design goal away from providing recommendations, to supporting quick information gathering. Our results show ways to design more helpful and effective AI decision support that goes beyond end-to-end recommendations.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Hölder Continuity for Fully Fractional Parabolic Equations with Space-time Nonlocal Operators
Authors:
Lingwei Ma,
Qi Xiong,
Zhenqiu Zhang
Abstract:
We study the local Hölder regularity of weak solutions to the fully fractional parabolic equations involving spatial fractional diffusion and fractional time derivatives of the Marchaud type. It is worth noting that we do not impose boundedness assumptions on the weak solutions and nonhomogeneous terms. Within the space-time nonlocal framework, it is crucial to consider both space-dependent nonloc…
▽ More
We study the local Hölder regularity of weak solutions to the fully fractional parabolic equations involving spatial fractional diffusion and fractional time derivatives of the Marchaud type. It is worth noting that we do not impose boundedness assumptions on the weak solutions and nonhomogeneous terms. Within the space-time nonlocal framework, it is crucial to consider both space-dependent nonlocal tail terms and the first introduced time-dependent nonlocal tail term. By adapting a nonlocal variant of the parabolic De Giorgi iterative technique, we initially establish a priori local boundedness with tail terms for weak solutions and then prove the local Hölder continuity.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Revealing hidden medium-range order in silicate glass-formers using many-body correlation functions
Authors:
Zhen Zhang,
Walter Kob
Abstract:
The medium range order (MRO) in amorphous systems has been linked to complex features such as the dynamic heterogeneity of supercooled liquids or the plastic deformation of glasses. However, the nature of the MRO in these materials has remained elusive, primarily due to the lack of methods capable of characterizing this order. Here, we leverage standard two-body structural correlators and advanced…
▽ More
The medium range order (MRO) in amorphous systems has been linked to complex features such as the dynamic heterogeneity of supercooled liquids or the plastic deformation of glasses. However, the nature of the MRO in these materials has remained elusive, primarily due to the lack of methods capable of characterizing this order. Here, we leverage standard two-body structural correlators and advanced many-body correlation functions to probe numerically the MRO in prototypical network glassformers, i.e., silica and sodium silicates, systems that are of importance in natural as well as industrial settings. With increasing Na concentration, one finds that the local environment of Na becomes more structured and the spatial distribution of Na on intermediate length scales changes from blob-like to channel-like, indicating a growing inhomogeneity in the spatial Na arrangement. In parallel, we find that the Si-O network becomes increasingly depolymerized, resulting in a ring size distribution that broadens. The radius of gyration of the rings is well described by a power-law with an exponent around 0.75, indicating that the rings are progressively more crumbled with increasing size. Using a recently proposed four-point correlation function, we reveal that the relative orientation of the tetrahedra shows a transition at a distance around 4 Angstroms, a structural modification that is not seen in standard two-point correlation functions. Furthermore, we find that the length scale characterizing the MRO is non-monotonic as a function of temperature, caused by the competition between energetic and entropic terms. Finally, we demonstrate that the structural correlation lengths as obtained from the correlation functions that quantify the MRO are correlated with macroscopic observables such as the kinetic fragility of the liquids and the elastic properties of the glasses.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
Authors:
Da Mu,
Zhicheng Zhang,
Haobo Yue
Abstract:
Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structu…
▽ More
Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structure named Multi-scale Feature Fusion (MFF) module to fully extract multi-scale features across spectral, spatial, and temporal domains. The MFF module utilizes parallel subnetworks architecture to generate multi-scale spectral and spatial features. The TF-Convolution Module is employed to provide multi-scale temporal features. We incorporated MFF into EINV2 and term the proposed method as MFF-EINV2. Experimental results in 2022 and 2023 DCASE challenge task3 datasets show the effectiveness of our MFF-EINV2, which achieves state-of-the-art (SOTA) performance compared to published methods.
△ Less
Submitted 15 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Authors:
Jiefeng Ma,
Yan Wang,
Chenyu Liu,
Jun Du,
Yu Hu,
Zhenrong Zhang,
Pengfei Hu,
Qing Wang,
Jianshu Zhang
Abstract:
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,…
▽ More
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https://sprateam-ustc.github.io/SRFUND
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Novel Diamond-like Carbon based photocathode for PICOSEC Micromegas detectors
Authors:
X. Wang,
R. Aleksan,
Y. Angelis,
J. Bortfeldt,
F. Brunbauer,
M. Brunoldi,
E. Chatzianagnostou,
J. Datta,
K. Degmelt,
G. Fanourakis,
D. Fiorina,
K. J. Floethner,
M. Gallinaro,
F. Garcia,
I. Giomataris,
K. Gnanvo,
F. J. Iguaz,
D. Janssens,
A. Kallitsopoulou,
M. Kovacic,
B. Kross,
P. Legou,
M. Lisowska,
J. Liu,
I. Maniatis
, et al. (26 additional authors not shown)
Abstract:
The PICOSEC Micromegas (MM) detector is a precise timing gaseous detector based on a MM detector operating in a two-stage amplification mode and a Cherenkov radiator. Prototypes equipped with cesium iodide (CsI) photocathodes have shown promising time resolutions as precise as 24 picoseconds (ps) for Minimum Ionizing Particles. However, due to the high hygroscopicity and susceptibility to ion bomb…
▽ More
The PICOSEC Micromegas (MM) detector is a precise timing gaseous detector based on a MM detector operating in a two-stage amplification mode and a Cherenkov radiator. Prototypes equipped with cesium iodide (CsI) photocathodes have shown promising time resolutions as precise as 24 picoseconds (ps) for Minimum Ionizing Particles. However, due to the high hygroscopicity and susceptibility to ion bombardment of the CsI photocathodes, alternative photocathode materials are needed to improve the robustness of PICOSEC MM. Diamond-like Carbon (DLC) film have been introduced as a novel robust photocathode material, which have shown promising results. A batch of DLC photocathodes with different thicknesses were produced and evaluated using ultraviolet light. The quantum efficiency measurements indicate that the optimized thickness of the DLC photocathode is approximately 3 nm. Furthermore, DLC photocathodes show good resistance to ion bombardment in aging test compared to the CsI photocathode. Finally, a PICOSEC MM prototype equipped with DLC photocathodes was tested in muon beams. A time resolution of around 42 ps with a detection efficiency of 97% for 150 GeV/c muons were obtained. These results indicate the great potential of DLC as a photocathode for the PICOSEC MM detector.
△ Less
Submitted 25 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Authors:
Yi-Fan Zhang,
Qingsong Wen,
Chaoyou Fu,
Xue Wang,
Zhang Zhang,
Liang Wang,
Rong **
Abstract:
Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means th…
▽ More
Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means that higher resolution requires more local patches, resulting in exorbitant computational expenses, and meanwhile, the dominance of local image tokens may diminish the global context. In this paper, we dive into the problems and propose a new framework as well as an elaborate optimization strategy. Specifically, we extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks. With regard to local patches, learnable query embeddings are introduced to reduce image tokens, the most important tokens accounting for the user question will be further selected by a similarity-based selector. Our empirical results demonstrate a `less is more' pattern, where \textit{utilizing fewer but more informative local image tokens leads to improved performance}. Besides, a significant challenge lies in the training strategy, as simultaneous end-to-end training of the global mining block and local compression block does not yield optimal results. We thus advocate for an alternating training way, ensuring balanced learning between global and local aspects. Finally, we also introduce a challenging dataset with high requirements for image detail, enhancing the training of the local compression layer. The proposed method, termed LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME), achieves leading performance across various benchmarks with only 2 million training data.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Enhancing End-to-End Autonomous Driving with Latent World Model
Authors:
Yingyan Li,
Lue Fan,
Jiawei He,
Yuqi Wang,
Yuntao Chen,
Zhaoxiang Zhang,
Tieniu Tan
Abstract:
End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e…
▽ More
End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
HiFAST : An HI Data Calibration and Imaging Pipeline for FAST II. Flux Density Calibration
Authors:
Ziming Liu,
Jie Wang,
Yingjie **g,
Zhi-Yu Zhang,
Chen Xu,
Tiantian Liang,
Qingze Chen,
Ningyu Tang,
Qingliang Yang
Abstract:
Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain param…
▽ More
Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain parameter on the time and environmental factors. A comparison is carried out in various observation modes (e.g. tracking and scanning modes) to determine the flux density gain ($G$), revealing insignificant discrepancies in $G$ among different methods. Long-term monitoring data shows a linear correlation between $G$ and atmospheric temperature. After subtracting the $G$--Temperature dependence, the dispersion of $G$ is reduced to $<$3% over a one-year time scale. The stability of the receiver response of FAST is considered sufficient to facilitate HI observations that can accommodate a moderate error in flux calibration (e.g., $>\sim5\%$) when utilizing a constant $G$ for calibration purposes. Our study will serve as a useful addition to the results provided by Jiang et al. (2020). Detailed measurement of $G$ for the 19 beams of FAST, covering the frequency range 1000 MHz -- 1500 MHz can be found on the HIFAST homepage: https://hifast.readthedocs.io/fluxgain.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Authors:
Duanyu Feng,
Bowen Qin,
Chen Huang,
Youcheng Huang,
Zheng Zhang,
Wenqiang Lei
Abstract:
The success of the reward model in distinguishing between responses with subtle safety differences depends critically on the high-quality preference dataset, which should capture the fine-grained nuances of harmful and harmless responses. This motivates the need to develop a dataset involving preference margins, which accurately quantify how harmless one response is compared to another. In this pa…
▽ More
The success of the reward model in distinguishing between responses with subtle safety differences depends critically on the high-quality preference dataset, which should capture the fine-grained nuances of harmful and harmless responses. This motivates the need to develop a dataset involving preference margins, which accurately quantify how harmless one response is compared to another. In this paper, we take the first step to propose an effective and cost-efficient framework to promote the margin-enhanced preference dataset development. Our framework, Legend, Leverages representation engineering to annotate preference datasets. It constructs the specific direction within the LLM's embedding space that represents safety. By leveraging this safety direction, Legend can then leverage the semantic distances of paired responses along this direction to annotate margins automatically. We experimentally demonstrate our effectiveness in both reward modeling and harmless alignment for LLMs. Legend also stands out for its efficiency, requiring only the inference time rather than additional training. This efficiency allows for easier implementation and scalability, making Legend particularly valuable for practical applications in aligning LLMs with safe conversations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey
Authors:
Feng Liang,
Zhen Zhang,
Haifeng Lu,
Chengming Li,
Victor C. M. Leung,
Yanyi Guo,
Xi** Hu
Abstract:
With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for…
▽ More
With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for resource allocation and workload scheduling in distributed deep learning, such as scheduling complexity, resource and workload heterogeneity, and fault tolerance. To uncover these challenges and corresponding solutions, this survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We explore these strategies by focusing on various resource types, scheduling granularity levels, and performance goals during distributed training and inference processes. We highlight critical challenges for each topic and discuss key insights of existing technologies. To illustrate practical large-scale resource allocation and workload scheduling in real distributed deep learning scenarios, we use a case study of training large language models. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances and explore future research directions for efficient framework strategies for large-scale distributed deep learning.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization
Authors:
Ziran Zhang,
Yongrui Ma,
Yueting Chen,
Feng Zhang,
**wei Gu,
Tianfan Xue,
Shi Guo
Abstract:
Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr…
▽ More
Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably trailing artifacts and signal latency, which hinder their direct applicability and generalization. Addressing these issues, we propose a novel per-scene optimization strategy tailored for low-light conditions. This approach utilizes the internal statistics of a sequence to handle degraded event data under low-light conditions, improving the generalizability to different lighting and camera settings. To evaluate its robustness in low-light condition, we further introduce EVFI-LL, a unique RGB+Event dataset captured under low-light conditions. Our results demonstrate state-of-the-art performance in low-light environments. Both the dataset and the source code will be made publicly available upon publication. Project page: https://naturezhanghn.github.io/sim2real.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
On the density patch problem for the 2-D inhomogeneous Navier-Stokes equations
Authors:
Tiantian Hao,
Feng Shao,
Dongyi Wei,
Zhifei Zhang
Abstract:
In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the sam…
▽ More
In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the same as the strong solution with the same initial data. In particular, this gives a complete resolution of the density patch problem proposed by Lions: {\it for the density patch data $ρ_0=1_{D}$ with a smooth bounded domain $D\subset\mathbb{R}^2$, the regularity of $D$ is preserved by the time evolution of Lions's weak solution.}
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Micro-expression recognition based on depth map to point cloud
Authors:
Ren Zhang,
Jianqin Yin,
Chao Qi,
Zehao Wang,
Zhicheng Zhang,
Yonghao Dang
Abstract:
Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these ap…
▽ More
Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these approaches are restricted by facial texture information and are susceptible to environmental factors, such as lighting. Conversely, depth information can effectively represent motion information related to facial structure changes and is not affected by lighting. Motion information derived from facial structures can describe motion features that pixel textures cannot delineate. We proposed a network for micro-expression recognition based on facial depth information, and our experiments have demonstrated the crucial role of depth maps in the micro-expression recognition task. Initially, we transform the depth map into a point cloud and obtain the motion information for each point by aligning the initiating frame with the apex frame and performing a differential operation. Subsequently, we adjusted all point cloud motion feature input dimensions and used them as inputs for multiple point cloud networks to assess the efficacy of this representation. PointNet++ was chosen as the ultimate outcome for micro-expression recognition due to its superior performance. Our experiments show that our proposed method significantly outperforms the existing deep learning methods, including the baseline, on the $CAS(ME)^3$ dataset, which includes depth information.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
IceCube Search for Neutrino Emission from X-ray Bright Seyfert Galaxies
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (400 additional authors not shown)
Abstract:
The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation.…
▽ More
The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation. Therefore, any potential neutrino emission from similar sources is not expected to correlate with high-energy $γ$-rays. Disk-corona models predict neutrino emission from Seyfert galaxies to correlate with keV X-rays, as they are tracers of coronal activity. Using through-going track events from the Northern Sky recorded by IceCube between 2011 and 2021, we report results from a search for individual and aggregated neutrino signals from 27 additional Seyfert galaxies that are contained in the BAT AGN Spectroscopic Survey (BASS). Besides the generic single power-law, we evaluate the spectra predicted by the disk-corona model. Assuming all sources to be intrinsically similar to NGC 1068, our findings constrain the collective neutrino emission from X-ray bright Seyfert galaxies in the Northern Hemisphere, but, at the same time, show excesses of neutrinos that could be associated with the objects NGC 4151 and CGCG 420-015. These excesses result in a 2.7$σ$ significance with respect to background expectations.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Trim 3D Gaussian Splatting for Accurate Geometry Representation
Authors:
Lue Fan,
Yuxue Yang,
Minxing Li,
Hongsheng Li,
Zhaoxiang Zhang
Abstract:
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while pre…
▽ More
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. Our project page is https://trimgs.github.io
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database
Authors:
Wanling Gao,
Yuan Liu,
Zhuoming Yu,
Dandan Cui,
Wen**g Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Fan Huang,
Gangyuan Zhao,
Chongrong Jiang,
Tianyi Wei,
Zhifei Zhang,
Yunyou Huang,
Jianfeng Zhan
Abstract:
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f…
▽ More
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment.
△ Less
Submitted 15 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
A directional total variation minimization algorithm for isotropic resolution in digital breast tomosynthesis
Authors:
Emil Y. Sidky,
Xiangyi Wu,
Xiaoyu Duan,
Hailing Huang,
Wei Zhao,
Leo Y. Zhang,
John Paul Phillips,
Zheng Zhang,
Buxin Chen,
Dan Xia,
Ingrid S. Reiser,
Xiaochuan Pan
Abstract:
An optimization-based image reconstruction algorithm is developed for contrast enhanced digital breast tomosynthesis (DBT) using dual-energy scanning. The algorithm minimizes directional total variation (TV) with a data discrepancy and non-negativity constraints. Iodinated contrast agent (ICA) imaging is performed by reconstructing images from dual-energy DBT data followed by weighted subtraction.…
▽ More
An optimization-based image reconstruction algorithm is developed for contrast enhanced digital breast tomosynthesis (DBT) using dual-energy scanning. The algorithm minimizes directional total variation (TV) with a data discrepancy and non-negativity constraints. Iodinated contrast agent (ICA) imaging is performed by reconstructing images from dual-energy DBT data followed by weighted subtraction. Physical DBT data is acquired with a Siemens Mammomat scanner of a structured breast phantom with ICA inserts. Results are shown for both directional TV minimization and filtered back-projection for reference. It is seen that directional TV is able to substantially reduce depth blur for the ICA objects.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Structures and Superconductivity of Hydrogen and Hydrides under Extreme Pressure
Authors:
Zihan Zhang,
Wendi Zhao,
Defang Duan,
Tian Cui
Abstract:
Metallic hydrogen, existing in remarkably extreme environments, was predicted to exhibit long-sought room-temperature superconductivity. Although the superconductivity of metallic hydrogen has not been confirmed experimentally, superconductivity of hydrogen in hydrides was recently discovered with remarkably high critical temperature as theoretically predicted. In recent years, theoretical simulat…
▽ More
Metallic hydrogen, existing in remarkably extreme environments, was predicted to exhibit long-sought room-temperature superconductivity. Although the superconductivity of metallic hydrogen has not been confirmed experimentally, superconductivity of hydrogen in hydrides was recently discovered with remarkably high critical temperature as theoretically predicted. In recent years, theoretical simulations have become a new paradigm for material science, especially exploration of material at extreme pressure. As the typical high-pressure material, metallic hydrogen has been providing a fertile playground for advanced simulations for long time. Simulations not only provide the substitute of experiments for hydrogen at high-pressure, but also encouraged the discovery of almost all the experimentally discovered superconducting hydrides with the record high superconducting transition temperature. This work reviews recent progress in hydrogen and hydrides under extreme pressure, focusing on phase diagram, structures and the long-sought goal of high-temperature superconductivity. In the end, we highlight structural features of hydrides for realization of hydrogen-driven superconducting hydrides near ambient pressure.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Tool for Test Case Scenarios Generation Using Large Language Models
Authors:
Abdul Malik Sami,
Zeeshan Rasheed,
Muhammad Waseem,
Zheying Zhang,
Herda Tomas,
Pekka Abrahamsson
Abstract:
Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating test scripts or automating test cases demands test suite documentation that comprehensively covers functional requirements. Such documentation must enable thoroug…
▽ More
Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating test scripts or automating test cases demands test suite documentation that comprehensively covers functional requirements. Such documentation must enable thorough testing within a constrained scope and timeframe, particularly as requirements and user demands evolve. This article centers on generating user requirements as epics and high-level user stories and crafting test case scenarios based on these stories. It introduces a web-based software tool that employs an LLM-based agent and prompt engineering to automate the generation of test case scenarios against user requirements.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Personalized Binomial DAGs Learning with Network Structured Covariates
Authors:
Boxin Zhao,
Weishi Wang,
Dingyuan Zhu,
Ziqi Liu,
Dong Wang,
Zhiqiang Zhang,
Jun Zhou,
Mladen Kolar
Abstract:
The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram c…
▽ More
The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Search for neutrino emission from hard X-ray AGN with IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (401 additional authors not shown)
Abstract:
Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and…
▽ More
Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and 12 years of IceCube muon track data. First, upon performing a stacked search, no significant emission was found. Second, we searched for neutrinos from a list of 43 candidate sources and found an excess from the direction of two sources, Seyfert galaxies NGC 1068 and NGC 4151. We observed NGC 1068 at flux $φ_{ν_μ+\barν_μ}$ = $4.02_{-1.52}^{+1.58} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV, with power-law spectral index, $γ$ = 3.10$^{+0.26}_{-0.22}$, consistent with previous IceCube results. The observation of a neutrino excess from the direction of NGC 4151 is at a post-trial significance of 2.9$σ$. If interpreted as an astrophysical signal, the excess observed from NGC 4151 corresponds to a flux $φ_{ν_μ+\barν_μ}$ = $1.51_{-0.81}^{+0.99} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV and $γ$ = 2.83$^{+0.35}_{-0.28}$.
△ Less
Submitted 12 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications
Authors:
Zhou Zhou,
Guohang He,
Zheng Zhang,
Luziwei Leng,
Qinghai Guo,
Jianxing Liao,
Xuan Song,
Ran Cheng
Abstract:
Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks…
▽ More
Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks to identify an optimal neural decoding backbone that boasts robust performance and swift inference capabilities suitable for edge deployment. We executed a series of neural decoding experiments involving nonhuman primates engaged in random reaching tasks, evaluating four prospective models, Gated Recurrent Unit (GRU), Transformer, Receptance Weighted Key Value (RWKV), and Selective State Space model (Mamba), across several metrics: single-session decoding, multi-session decoding, new session fine-tuning, inference speed, calibration speed, and scalability. The findings indicate that although the GRU model delivers sufficient accuracy, the RWKV and Mamba models are preferable due to their superior inference and calibration speeds. Additionally, RWKV and Mamba comply with the scaling law, demonstrating improved performance with larger data sets and increased model sizes, whereas GRU shows less pronounced scalability, and the Transformer model requires computational resources that scale prohibitively. This paper presents a thorough comparative analysis of the four models in various scenarios. The results are pivotal in pinpointing an optimal backbone that can handle increasing data volumes and is viable for edge implementation. This analysis provides essential insights for ongoing research and practical applications in the field.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Differentiable Combinatorial Scheduling at Scale
Authors:
Mingju Liu,
Yingjie Li,
Jiaqi Yin,
Zhiru Zhang,
Cunxi Yu
Abstract:
This paper addresses the complex issue of resource-constrained scheduling, an NP-hard problem that spans critical areas including chip design and high-performance computing. Traditional scheduling methods often stumble over scalability and applicability challenges. We propose a novel approach using a differentiable combinatorial scheduling framework, utilizing Gumbel-Softmax differentiable samplin…
▽ More
This paper addresses the complex issue of resource-constrained scheduling, an NP-hard problem that spans critical areas including chip design and high-performance computing. Traditional scheduling methods often stumble over scalability and applicability challenges. We propose a novel approach using a differentiable combinatorial scheduling framework, utilizing Gumbel-Softmax differentiable sampling technique. This new technical allows for a fully differentiable formulation of linear programming (LP) based scheduling, extending its application to a broader range of LP formulations. To encode inequality constraints for scheduling tasks, we introduce \textit{constrained Gumbel Trick}, which adeptly encodes arbitrary inequality constraints. Consequently, our method facilitates an efficient and scalable scheduling via gradient descent without the need for training data. Comparative evaluations on both synthetic and real-world benchmarks highlight our capability to significantly improve the optimization efficiency of scheduling, surpassing state-of-the-art solutions offered by commercial and open-source solvers such as CPLEX, Gurobi, and CP-SAT in the majority of the designs.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.