-
GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets
Authors:
Qiming Wu,
Zichen Chen,
Will Corcoran,
Misha Sra,
Ambuj K. Singh
Abstract:
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph da…
▽ More
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with an increase of 11.11\%, 33.37\%, and 33.37\%, respectively.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Repeater-Like Asynchronous Measurement-Device-Independent Quantum Conference Key Agreement
Authors:
Yu-Shuo Lu,
Yuan-Mei Xie,
Yao Fu,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum conference key agreement facilitates secure communication among multiple parties through multipartite entanglement and is anticipated to be an important cryptographic primitive for future quantum networks. However, the experimental complexity and low efficiency associated with the synchronous detection of multipartite entangled states have significantly hindered their practical application…
▽ More
Quantum conference key agreement facilitates secure communication among multiple parties through multipartite entanglement and is anticipated to be an important cryptographic primitive for future quantum networks. However, the experimental complexity and low efficiency associated with the synchronous detection of multipartite entangled states have significantly hindered their practical application. In this work, we propose a measurement-device-independent conference key agreement protocol that utilizes asynchronous Greenberger-Horne-Zeilinger state measurement.This approach achieves a linear scaling of the conference key rate among multiple parties, exhibiting performance similar to that of the single-repeater scheme in quantum networks. The asynchronous measurement strategy bypasses the need for complex global phase locking technologies, concurrently extending the intercity transmission distance with composable security in the finite key regime. Additionally, our work also showcases the advantages of the asynchronous pairing concept in multiparty quantum entanglement.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow
Authors:
Zhichao Chen,
Haoxuan Li,
Fangyikang Wang,
Odin Zhang,
Hu Xu,
Xiaoyu Jiang,
Zhihuan Song,
Eric H. Wang
Abstract:
Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within…
▽ More
Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within the realm of numerical tabular datasets, we introduce a novel principled approach termed Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation (KnewImp). Specifically, based on Wasserstein gradient flow (WGF) framework, we first prove that issue (1) stems from the cost functionals implicitly maximized in DM-based MDI are equivalent to the MDI's objective plus diversification-promoting non-negative terms. Based on this, we then design a novel cost functional with diversification-discouraging negative entropy and derive our KnewImp approach within WGF framework and reproducing kernel Hilbert space. After that, we prove that the imputation procedure of KnewImp can be derived from another cost functional related to the joint distribution, eliminating the need for the mask matrix and hence naturally addressing issue (2). Extensive experiments demonstrate that our proposed KnewImp approach significantly outperforms existing state-of-the-art methods.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Authors:
Yakun Song,
Zhuo Chen,
Xiaofei Wang,
Ziyang Ma,
Guanrou Yang,
Xie Chen
Abstract:
Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T…
▽ More
Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, TacoLM introduces a gated attention mechanism to improve the training and inference efficiency and reduce the model size. Meanwhile, an additional gated cross-attention layer is included for each decoder layer, which improves the efficiency and content accuracy of the synthesized speech. In the evaluation of the Librispeech corpus, the proposed TacoLM achieves a better word error rate, speaker similarity, and mean opinion score, with 90% fewer parameters and 5.2 times speed up, compared with VALL-E. Demo and code is available at https://ereboas.github.io/TacoLM/.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory
Authors:
Yang Li,
Yujie Luo,
Yichen Zhang,
Ao Sun,
Wei Huang,
Shuai Zhang,
Tao Zhang,
Chuang Zhou,
Li Ma,
Jie Yang,
Mei Wu,
Heng Wang,
Yan Pan,
Yun Shao,
Xing Chen,
Ziyang Chen,
Song Yu,
Hong Guo,
Bingjie Xu
Abstract:
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still…
▽ More
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still missed for precision time synchronization. In this paper, a secure combination algorithm based on Dempster-Shafer theory is proposed for multiple paths method. Special optimizations are done for the combination algorithm to solve the potential problems due to untrusted evidence. Theoretical simulation shows that the proposed algorithm works much better than Fault Tolerant Algorithm (FTA) and the attack detection method based on single path. And experimental demonstration proves the feasibility and superiority of the proposed algorithm, where the time stability with 27.97 ps, 1.57 ps, and 1.12 ps at average time 1s, 10s, 100s is achieved under TDA and local clock jump. The proposed algorithm can be used to improve the security and resilience of many importance synchronization protocol, such as NTP, PTP, and TWFTT.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging
Authors:
Zixuan Chen,
Lingxiao Yang,
Jian-Huang Lai,
Xiaohua Xie
Abstract:
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of develo** cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based…
▽ More
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of develo** cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based methods may leave considerable ``holes'' (i.e., unmodeled spaces) in their fields, leading to sub-optimal results. In this paper, we propose the Coordinate-based Continuous Projection Field (CoCPF), which aims to build hole-free representation fields for SVCT reconstruction, achieving better reconstruction quality. Specifically, to fill the holes, CoCPF first employs the stripe-based volume sampling module to broaden the sampling regions of Radon transformation from rays (1D space) to stripes (2D space), which can well cover the internal regions between SV projections. Then, by feeding the sampling regions into the proposed differentiable rendering modules, the holes can be jointly optimized during training, reducing the ill-posed levels. As a result, CoCPF can accurately estimate the internal measurements between SV projections (i.e., DV sinograms), producing high-quality CT images after re-projection. Extensive experiments on simulated and real projection datasets demonstrate that CoCPF outperforms state-of-the-art methods for 2D and 3D SVCT reconstructions under various projection numbers and geometries, yielding fine-grained details and fewer artifacts. Our code will be publicly available.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation
Authors:
Zixuan Chen,
Ruijie Su,
Jiahao Zhu,
Lingxiao Yang,
Jian-Huang Lai,
Xiaohua Xie
Abstract:
Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi…
▽ More
Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bias brings inconsistent updating direction, resulting in implausible 3D generation e.g., color deviation, Janus problem, and semantically inconsistent details). In this work, we propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. Specifically, PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps (1-3). Compared to SDS, PCDS can acquire a more accurate updating direction with the same sampling time (1 sampling step), while enabling few-step (2-3) sampling to trade compute for higher generation quality. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details. Extensive experiments demonstrate that our approach outperforms the state-of-the-art in generation quality and training efficiency, conspicuously alleviating the implausible 3D generation issues caused by the deviated updating direction. Moreover, it can be simply applied to many 3D generative applications to yield impressive 3D assets, please see our project page: https://narcissusex.github.io/VividDreamer.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Authors:
Jia Syuen Lim,
Zhuoxiao Chen,
Mahsa Baktashmotlagh,
Zhi Chen,
Xin Yu,
Zi Huang,
Yadan Luo
Abstract:
Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we in…
▽ More
Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlap** hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering
Authors:
Zhengliang Shi,
Shuo Zhang,
Weiwei Sun,
Shen Gao,
Pengjie Ren,
Zhumin Chen,
Zhaochun Ren
Abstract:
Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable no…
▽ More
Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Authors:
Zhuoxiao Chen,
Junjie Meng,
Mahsa Baktashmotlagh,
Zi Huang,
Yadan Luo
Abstract:
LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assem…
▽ More
LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assembles best-fit supermodels for each test batch from a bank of historical checkpoints, leveraging long-term knowledge to guide model updates without forgetting. The model assembly is directed by the proposed synergy weights (SW), employed for weighted averaging of the selected checkpoints to minimize redundancy in the composite supermodel. These weights are calculated by evaluating the similarity of predicted bounding boxes on test data and the feature independence among model pairs in the bank. To maintain an informative yet compact model bank, we pop out checkpoints with the lowest average SW scores and insert newly updated model weights. Our method was rigorously tested against prior test-time domain adaptation strategies on three datasets and under eight types of corruptions, demonstrating its superior adaptability to changing scenes and conditions. Remarkably, our approach achieved a 67.3% increase in performance in a complex "cross-corruption" scenario, which involves cross-dataset inconsistencies and real-world scene corruptions, providing a more realistic testbed of adaptation capabilities. The code is available at https://github.com/zhuoxiao-chen/MOS.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video
Authors:
Zhengbang Yang,
Haotian Xia,
**gxi Li,
Zezhi Chen,
Zhuangdi Zhu,
Weining Shen
Abstract:
Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluate…
▽ More
Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluated mainstream large language models for various sports tasks. Our evaluation spans from simple queries on basic rules and historical facts to complex, context-specific reasoning, leveraging strategies from zero-shot to few-shot learning, and chain-of-thought techniques. In addition to unimodal analysis, we further assessed the sports reasoning capabilities of mainstream video language models to bridge the gap in multimodal sports understanding benchmarking. Our findings highlighted the critical challenges of sports understanding for NLP. We proposed a new benchmark based on a comprehensive overview of existing sports datasets and provided extensive error analysis which we hope can help identify future research priorities in this field.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark
Authors:
Tao Han,
Song Guo,
Zhenghao Chen,
Wanghan Xu,
Lei Bai
Abstract:
Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific…
▽ More
Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from significant limitations, such as small sizes, limited temporal coverage, and a lack of comprehensive variables. These shortcomings prevent them from effectively reflecting the benchmarks of current forecasting methods and fail to support the real needs of operational weather forecasting. To address these challenges, we present the WEATHER-5K dataset. This dataset comprises a comprehensive collection of data from 5,672 weather stations worldwide, spanning a 10-year period with one-hour intervals. It includes multiple crucial weather elements, providing a more reliable and interpretable resource for forecasting. Furthermore, our WEATHER-5K dataset can serve as a benchmark for comprehensively evaluating existing well-known forecasting models, extending beyond GSWF methods to support future time-series research challenges and opportunities. The dataset and benchmark implementation are publicly available at: https://github.com/taohan10200/WEATHER-5K.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Temporal Knowledge Graph Question Answering: A Survey
Authors:
Miao Su,
ZiXuan Li,
Zhuo Chen,
Long Bai,
Xiaolong **,
Jiafeng Guo
Abstract:
Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic…
▽ More
Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic categorization of existing methods for TKGQA. In response, this paper provides a thorough survey from two perspectives: the taxonomy of temporal questions and the methodological categorization for TKGQA. Specifically, we first establish a detailed taxonomy of temporal questions engaged in prior studies. Subsequently, we provide a comprehensive review of TKGQA techniques of two categories: semantic parsing-based and TKG embedding-based. Building on this review, the paper outlines potential research directions aimed at advancing the field of TKGQA. This work aims to serve as a comprehensive reference for TKGQA and to stimulate further research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Discrete-Modulated Continuous-Variable Quantum Key Distribution in Satellite-to-Ground Communication
Authors:
Shi-Gen Li,
Chen-Long Li,
Wen-Bo Liu,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Satellite-to-ground quantum communication constitutes the cornerstone of the global quantum network, heralding the advent of the future of quantum information. Continuous-variable quantum key distribution is a strong candidate for space-ground quantum communication due to its simplicity, stability, and ease of implementation, especially for the robustness of space background light noise. Recently,…
▽ More
Satellite-to-ground quantum communication constitutes the cornerstone of the global quantum network, heralding the advent of the future of quantum information. Continuous-variable quantum key distribution is a strong candidate for space-ground quantum communication due to its simplicity, stability, and ease of implementation, especially for the robustness of space background light noise. Recently, the discrete-modulated continuous-variable protocol has garnered increased attention, owing to its lower implementation requirements, acceptable security key rate, and pronounced compatibility with extant infrastructures. Here, we derive key rates for discrete-modulated continuous-variable quantum key distribution protocols in free-space channel environments across various conditions through numerical simulation, revealing the viability of its application in satellite-to-ground communication.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection
Authors:
Zhuoxiao Chen,
Zixin Wang,
Sen Wang,
Zi Huang,
Yadan Luo
Abstract:
LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizab…
▽ More
LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizability of pre-trained models, which creates a sharp loss landscape during training. Such sharpness, when encountered during testing, can precipitate significant performance declines, even with minor data variations. To address the aforementioned challenges, we propose \textbf{dual-perturbation optimization (DPO)} for \textbf{\underline{T}est-\underline{t}ime \underline{A}daptation in \underline{3}D \underline{O}bject \underline{D}etection (TTA-3OD)}. We minimize the sharpness to cultivate a flat loss landscape to ensure model resiliency to minor data variations, thereby enhancing the generalization of the adaptation process. To fully capture the inherent variability of the test point clouds, we further introduce adversarial perturbation to the input BEV features to better simulate the noisy test environment. As the dual perturbation strategy relies on trustworthy supervision signals, we utilize a reliable Hungarian matcher to filter out pseudo-labels sensitive to perturbations. Additionally, we introduce early Hungarian cutoff to avoid error accumulation from incorrect pseudo-labels by halting the adaptation process. Extensive experiments across three types of transfer tasks demonstrate that the proposed DPO significantly surpasses previous state-of-the-art approaches, specifically on Waymo $\rightarrow$ KITTI, outperforming the most competitive baseline by 57.72\% in $\text{AP}_\text{3D}$ and reaching 91\% of the fully supervised upper bound.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
A Pure Transformer Pretraining Framework on Text-attributed Graphs
Authors:
Yu Song,
Haitao Mao,
Jiachen Xiao,
**gzhe Liu,
Zhikai Chen,
Wei **,
Carl Yang,
Jiliang Tang,
Hui Liu
Abstract:
Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan…
▽ More
Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Language Models (LLMs) on text-attributed graphs (TAGs), demonstrating superiority to traditional bag-of-words or word2vec techniques. These high-quality node features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, our framework achieves significantly better transferability among graphs within the same domain. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
Authors:
Zhawnen Chen,
Tianchun Wang,
Yizhou Wang,
Michal Kosinski,
Xiang Zhang,
Yun Fu,
Sheng Li
Abstract:
Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention…
▽ More
Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention). However, human reasoning in the wild is often grounded in dynamic scenes across time. Thus, we consider videos a new medium for examining spatio-temporal ToM reasoning ability. Specifically, we ask explicit probing questions about videos with abundant social and emotional reasoning content. We develop a pipeline for multimodal LLM for ToM reasoning using video and text. We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question, which reveals how multimodal LLMs reason about ToM.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy
Authors:
Long Bai,
Qiaozhi Tan,
Tong Chen,
Wan Jun Nah,
Yanheng Li,
Zhicheng He,
Sishen Yuan,
Zhen Chen,
**lin Wu,
Mobarakol Islam,
Zhen Li,
Hongbin Liu,
Hongliang Ren
Abstract:
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema…
▽ More
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DFT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DFT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
In-Context Former: Lightning-fast Compressing Context for Large Language Model
Authors:
Xiangfeng Wang,
Zaiyi Chen,
Zheyong Xie,
Tong Xu,
Yongyi He,
Enhong Chen
Abstract:
With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods typically leverage the self-attention mechanism of the LLM itself for context compression. While these methods have achieved notable results, the compression process…
▽ More
With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods typically leverage the self-attention mechanism of the LLM itself for context compression. While these methods have achieved notable results, the compression process still involves quadratic time complexity, which limits their applicability. To mitigate this limitation, we propose the In-Context Former (IC-Former). Unlike previous methods, IC-Former does not depend on the target LLMs. Instead, it leverages the cross-attention mechanism and a small number of learnable digest tokens to directly condense information from the contextual word embeddings. This approach significantly reduces inference time, which achieves linear growth in time complexity within the compression range. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times while achieving over 90% of the baseline performance on evaluation metrics. Overall, our model effectively reduces compression costs and makes real-time compression scenarios feasible.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Dual-Phase Accelerated Prompt Optimization
Authors:
Muchen Yang,
Moxin Li,
Yongle Li,
Zijun Chen,
Chongming Gao,
Junqi Zhang,
Yangyang Li,
Fuli Feng
Abstract:
Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa…
▽ More
Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching
Authors:
Zhuoran Li,
Chunming Hu,
Junfan Chen,
Zhijun Chen,
Xiaohui Guo,
Richong Zhang
Abstract:
Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switchin…
▽ More
Code-switching is a data augmentation scheme mixing words from multiple languages into source lingual text. It has achieved considerable generalization performance of cross-lingual transfer tasks by aligning cross-lingual contextual word representations. However, uncontrolled and over-replaced code-switching would augment dirty samples to model training. In other words, the excessive code-switching text samples will negatively hurt the models' cross-lingual transferability. To this end, we propose a Progressive Code-Switching (PCS) method to gradually generate moderately difficult code-switching examples for the model to discriminate from easy to hard. The idea is to incorporate progressively the preceding learned multilingual knowledge using easier code-switching data to guide model optimization on succeeding harder code-switching data. Specifically, we first design a difficulty measurer to measure the impact of replacing each word in a sentence based on the word relevance score. Then a code-switcher generates the code-switching data of increasing difficulty via a controllable temperature variable. In addition, a training scheduler decides when to sample harder code-switching data for model training. Experiments show our model achieves state-of-the-art results on three different zero-shot cross-lingual transfer tasks across ten languages.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Authors:
Vahid Noroozi,
Zhehuai Chen,
Somshubra Majumdar,
Steve Huang,
Jagadeesh Balam,
Boris Ginsburg
Abstract:
In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships be…
▽ More
In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships between the speech and text domains. Our process employs large language models to generate textual components and text-to-speech systems to generate speech components. The proposed methods offer a practical and effective means to expand the training dataset for these models. Experimental results show progress in achieving an integrated understanding of text and speech. We also highlight the potential of using unlabeled speech data to generate synthetic samples comparable in quality to those with available transcriptions, enabling the expansion of these models to more languages.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Burn-in Test and Thermal Performance Evaluation of Silicon Photomultipliers for the JUNO-TAO Experiment
Authors:
X. Chen,
G. F. Cao,
M. H. Qu,
H. W. Wang,
N. Anfimov,
A. Rybnikov,
J. Y. Xu,
A. Q. Su,
Z. L. Chen,
J. Cao,
Y. C. Li,
M. Qi
Abstract:
This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energ…
▽ More
This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energy resolution. All SiPM tiles underwent a room temperature burn-in test in the dark for two weeks, while cryogenic testing analyzed the thermal dependence of parameters for some sampled SiPMs. Results from these comprehensive tests provide crucial insights into the long-term performance and stability of the 10 square meters of SiPMs operating at -50°C to detect scintillation photons in the TAO detector. Despite some anomalies awaiting further inspection, all SiPMs successfully passed the burn-in test over two weeks at room temperature, which is equivalent to 6.7 years at -50°C. Results are also used to guide optimal SiPM selection, configuration, and operation, ensuring reliability and sustainability in reactor neutrino measurements. This work also provides insights for a rapid and robust quality assessment in future experiments that employ large-scale SiPMs as detection systems.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Can AI Beat Undergraduates in Entry-level Java Assignments? Benchmarking Large Language Models on JavaBench
Authors:
Jialun Cao,
Zhiyong Chen,
Jiarong Wu,
Shing-chi Cheung,
Chang Xu
Abstract:
Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.…
▽ More
Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.3% of benchmarks. Only a mere handful extends to class-/project-levels, and all are limited to Python. Third, lacking advanced features. Existing benchmarks primarily assess basic coding skills, while overlooking advanced Object-Oriented Programming (OOP) features (i.e., encapsulation, inheritance, and polymorphism).
To fill these gaps, we propose JavaBench, a project-level Java benchmark that exercises OOP features. It comprises four Java projects with 389 methods in 106 Java classes. The test coverage is up to 92%, and JavaBench is attested by 282 undergraduate students, reaching a 90.93/100 average score (i.e., pass rate against the test suite), ensuring the quality of documentation, code skeleton, and tests. To better evaluate LLM's capability against JavaBench, we introduce a systematic evaluation design covering three context settings and five synthesis strategies at two granularities using three hierarchical metrics. Our extensive experiment yields several interesting findings. First, we noticed that regarding project-level Java programming, LLMs are far behind undergraduate students (no project can be correctly completed by any studied LLMs, and at most 41.17% Pass@5 in a more relaxed evaluation). Second, using method signature as prompt context may strike an ideal balance for project-level code generation. JavaBench is publicly available at https://github.com/java-bench/JavaBench.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics
Authors:
Huan Xu,
**lin Wu,
Guanglin Cao,
Zhen Chen,
Zhen Lei,
Hongbin Liu
Abstract:
Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a no…
▽ More
Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a novel Ultrasound Embodied Intelligence system that synergistically combines ultrasound robots with large language models (LLMs) and domain-specific knowledge augmentation, enhancing ultrasound robots' intelligence and operational efficiency. Our approach employs a dual strategy: firstly, integrating LLMs with ultrasound robots to interpret doctors' verbal instructions into precise motion planning through a comprehensive understanding of ultrasound domain knowledge, including APIs and operational manuals; secondly, incorporating a dynamic execution mechanism, allowing for real-time adjustments to scanning plans based on patient movements or procedural errors. We demonstrate the effectiveness of our system through extensive experiments, including ablation studies and comparisons across various models, showcasing significant improvements in executing medical procedures from verbal commands. Our findings suggest that the proposed system improves the efficiency and quality of ultrasound scans and paves the way for further advancements in autonomous medical scanning technologies, with the potential to transform non-invasive diagnostics and streamline medical workflows.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Low-Redundant Optimization for Large Language Model Alignment
Authors:
Zhipeng Chen,
Kun Zhou,
Wayne Xin Zhao,
**gyuan Wang,
Ji-Rong Wen
Abstract:
Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performanc…
▽ More
Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performance. It indicates the existence of redundant neurons in LLMs for alignment training. To reduce its influence, we propose a low-redundant alignment method named \textbf{ALLO}, focusing on optimizing the most related neurons with the most useful supervised signals. Concretely, we first identify the neurons that are related to the human preference data by a gradient-based strategy, then identify the alignment-related key tokens by reward models for computing loss. Besides, we also decompose the alignment process into the forgetting and learning stages, where we first forget the tokens with unaligned knowledge and then learn aligned knowledge, by updating different ratios of neurons, respectively. Experimental results on 10 datasets have shown the effectiveness of ALLO. Our code and data are available at \url{https://github.com/RUCAIBox/ALLO}.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback
Authors:
Zhirui Chen,
Vincent Y. F. Tan
Abstract:
We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \unde…
▽ More
We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields a simple regret of $\exp ( - Ω(n/H) )$ where $n$ is the number of data samples and $H$ denotes an instance-dependent hardness quantity that depends explicitly on the suboptimality gap of each action. Furthermore, we derive a first-of-its-kind instance-dependent lower bound in offline RL with preference feedback. Interestingly, we observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of {\sc RL-LOW}. In view of privacy considerations in practical applications, we also extend {\sc RL-LOW} to the setting of $(\varepsilon,δ)$-differential privacy and show, somewhat surprisingly, that the hardness parameter $H$ is unchanged in the asymptotic regime as $n$ tends to infinity; this underscores the inherent efficiency of {\sc RL-LOW} in terms of preserving the privacy of the observed rewards. Given our focus on establishing instance-dependent bounds, our work stands in stark contrast to previous works that focus on establishing worst-case regrets for offline RL with preference feedback.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The Panchromatic Hubble Andromeda Treasury: Triangulum Extended Region (PHATTER). VI. The High-Mass Stellar Initial Mass Function of M33
Authors:
Tobin M. Wainer,
Benjamin F. Williams,
L. Clifton Johnson,
Daniel R. Weisz,
Julianne J. Dalcanton,
Anil C. Seth,
Andrew Dolphin,
Meredith J. Durbin,
Eric F. Bell,
Zhuo Chen,
Puragra Guhathakurta,
Eric W. Koch,
Christina W. Lindberg,
Erik Rosolowsky,
Karin M. Sandstrom,
Evan D. Skillman,
Adam Smercina,
Estephani E. TorresVillanueva
Abstract:
We measure the high-mass stellar initial mass function (IMF) from resolved stars in M33 young stellar clusters. Leveraging \textit{Hubble Space Telescope's} high resolving power, we fully model the IMF probabilistically. We first model the optical CMD of each cluster to constrain its power-law slope $Γ$, marginalized over other cluster parameters in the fit (e.g., cluster age, mass, and radius). W…
▽ More
We measure the high-mass stellar initial mass function (IMF) from resolved stars in M33 young stellar clusters. Leveraging \textit{Hubble Space Telescope's} high resolving power, we fully model the IMF probabilistically. We first model the optical CMD of each cluster to constrain its power-law slope $Γ$, marginalized over other cluster parameters in the fit (e.g., cluster age, mass, and radius). We then probabilistically model the distribution of MF slopes for a highly strict cluster sample of 9 clusters more massive than log(Mass/M$_{\odot}$)=3.6; above this mass, all clusters have well-populated main sequences of massive stars and should have accurate recovery of their MF slopes, based on extensive tests with artificial clusters. We find the ensemble IMF is best described by a mean high-mass slope of $\overlineΓ = 1.49\pm0.18$, with an intrinsic scatter of $σ^{2}_Γ = 0.02^{+0.16}_{0.00}$, consistent with a universal IMF. We find no dependence of the IMF on environmental impacts such as the local star formation rate or galactocentric radius within M33, which serves as a proxy for metallicity. This $\overlineΓ$ measurement is consistent with similar measurements in M31, despite M33 having a much higher star formation rate intensity. While this measurement is formally consistent with the canonical Kroupa ($Γ= 1.30$) IMF, as well as the Salpeter ($Γ= 1.35)$) value, it is the second Local Group cluster sample to show evidence for a somewhat steeper high-mass IMF slope. We explore the impacts a steeper IMF slope has on a number of astronomical sub-fields.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Precision measurement of the $Ξ^-_b$ baryon lifetime
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1064 additional authors not shown)
Abstract:
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys…
▽ More
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Finding Linear Explanations for a Given Ranking
Authors:
Zixuan Chen,
Panagiotis Manolios,
Mirek Riedewald
Abstract:
Given a relation and a ranking of its tuples, but no information about the ranking function, we propose RankExplain to solve 2 types of problems: SAT asks if any linear scoring function can exactly reproduce the given ranking. OPT identifies the linear scoring function that minimizes position-based error, i.e., the total of the ranking-position differences over all tuples in the top-k. Our solutio…
▽ More
Given a relation and a ranking of its tuples, but no information about the ranking function, we propose RankExplain to solve 2 types of problems: SAT asks if any linear scoring function can exactly reproduce the given ranking. OPT identifies the linear scoring function that minimizes position-based error, i.e., the total of the ranking-position differences over all tuples in the top-k. Our solution consists of linear programs that solve the problems exactly and can be implemented using MILP solvers. These solvers also support additional constraints on the scoring function, allowing the user to explore competing hypotheses through alternative scoring functions. We also discuss techniques for dealing with numerical imprecision and for improving performance and scalability. Experiments demonstrate that RankExplain can solve realistic problems. It is the first technique to provide exact solutions for OPT; and it is the fastest in producing exact solutions for SAT.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Authors:
Jiaqi Wang,
Yuhang Zang,
Pan Zhang,
Tao Chu,
Yuhang Cao,
Zeyi Sun,
Ziyu Liu,
Xiaoyi Dong,
Tong Wu,
Dahua Lin,
Zeming Chen,
Zhi Wang,
Lingchen Meng,
Wenhao Yao,
Jianwei Yang,
Sihong Wu,
Zhineng Chen,
Zuxuan Wu,
Yu-Gang Jiang,
Peixi Wu,
Bosong Chai,
Xuan Nie,
Longquan Yan,
Zeyu Wang,
Qifan Zhou
, et al. (9 additional authors not shown)
Abstract:
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3…
▽ More
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Tunable Fano and Dicke resonant tunneling of double quantum dots sandwiched between topological insulators
Authors:
Yuan Hong,
Zhen-Guo Fu,
Zhou-Wei-Yu Chen,
Feng Chi,
Zhigang Wang,
Wei Zhang,
** Zhang
Abstract:
We study the resonant tunneling in double quantum dots (DQD) sandwiched between surfaces of topological insulator (TI) Bi$_2$Te$_3$, which possess strong spin-orbit coupling (SOC) and $^{d}C_{3v}$ double group symmetry. Distinct from the spin-conserved case with two-dimensional electron gas (2DEG) electrodes, the conductance displays an asymmetrical double-peak Fano-type lineshape rather than Dick…
▽ More
We study the resonant tunneling in double quantum dots (DQD) sandwiched between surfaces of topological insulator (TI) Bi$_2$Te$_3$, which possess strong spin-orbit coupling (SOC) and $^{d}C_{3v}$ double group symmetry. Distinct from the spin-conserved case with two-dimensional electron gas (2DEG) electrodes, the conductance displays an asymmetrical double-peak Fano-type lineshape rather than Dicke-type lineshape in the zero-field cases. While a Landau-Zener-like lineshape trajectory, which is identified as a signal of competition effect, could be developed by increasing the strength of interdot hop**. Furthermore, when applying an in-plane Zeeman field, we find that the conductance lineshape crossover between Fano and Dicke type could be driven by tilting the field orientation. Moreover, the rotational symmetry of the system could also be revealed from the lineshape trajectory. Our findings will contribute to a better understanding of the resonant tunneling in the presence of electrode SOC and may be confirmed experimentally in the future.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Boosting Medical Image Classification with Segmentation Foundation Model
Authors:
Pengfei Gu,
Zihan Zhao,
Hongxiao Wang,
Yaopeng Peng,
Yizhe Zhang,
Nishchal Sapkota,
Chaoli Wang,
Danny Z. Chen
Abstract:
The Segment Anything Model (SAM) exhibits impressive capabilities in zero-shot segmentation for natural images. Recently, SAM has gained a great deal of attention for its applications in medical image segmentation. However, to our best knowledge, no studies have shown how to harness the power of SAM for medical image classification. To fill this gap and make SAM a true ``foundation model'' for med…
▽ More
The Segment Anything Model (SAM) exhibits impressive capabilities in zero-shot segmentation for natural images. Recently, SAM has gained a great deal of attention for its applications in medical image segmentation. However, to our best knowledge, no studies have shown how to harness the power of SAM for medical image classification. To fill this gap and make SAM a true ``foundation model'' for medical image analysis, it is highly desirable to customize SAM specifically for medical image classification. In this paper, we introduce SAMAug-C, an innovative augmentation method based on SAM for augmenting classification datasets by generating variants of the original images. The augmented datasets can be used to train a deep learning classification model, thereby boosting the classification performance. Furthermore, we propose a novel framework that simultaneously processes raw and SAMAug-C augmented image input, capitalizing on the complementary information that is offered by both. Experiments on three public datasets validate the effectiveness of our new approach.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
LEO Satellite Networks Assisted Geo-distributed Data Processing
Authors:
Zhiyuan Zhao,
Zhe Chen,
Zheng Lin,
Wenjun Zhu,
Kun Qiu,
Chaoqun You,
Yue Gao
Abstract:
Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti…
▽ More
Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective solution for increasing edge clouds, and hence large volumes of data in edge clouds can be transferred to a core cloud via those networks for time-sensitive big data tasks processing, such as attack detection. However, the state-of-the-art satellite selection algorithms bring poor performance for those processing via our measurements. Therefore, we propose a novel data volume aware satellite selection algorithm, named DVA, to support such big data processing tasks. DVA first takes into account both the data size in edge clouds and satellite capacity to finalize the selection, thereby preventing congestion in the access network and reducing transmitting duration. Extensive simulations validate that DVA has a significantly lower average access network duration than the state-of-the-art satellite selection algorithms in a LEO satellite emulation platform.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights
Authors:
Zhikai Chen,
Haitao Mao,
**gzhe Liu,
Yu Song,
Bingheng Li,
Wei **,
Bahare Fatemi,
Anton Tsitsulin,
Bryan Perozzi,
Hui Liu,
Jiliang Tang
Abstract:
Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models t…
▽ More
Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods' full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. Empirical results provide new insights and inspire future research directions. Our code and data are publicly available from \url{https://github.com/CurryTang/TSGFM}.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Well-posedness and large deviations of fractional McKean-Vlasov stochastic reaction-diffusion equations on unbounded domains
Authors:
Zhang Chen,
Bixiang Wang
Abstract:
This paper is mainly concerned with the large deviation principle of the fractional McKean-Vlasov stochastic reaction-diffusion equation defined on R^n with polynomial drift of any degree. We first prove the well-posedness of the underlying equation under a dissipative condition, and then show the strong convergence of solutions of the corresponding controlled equation with respect to the weak top…
▽ More
This paper is mainly concerned with the large deviation principle of the fractional McKean-Vlasov stochastic reaction-diffusion equation defined on R^n with polynomial drift of any degree. We first prove the well-posedness of the underlying equation under a dissipative condition, and then show the strong convergence of solutions of the corresponding controlled equation with respect to the weak topology of controls, by employing the idea of uniform tail-ends estimates of solutions in order to circumvent the non-compactness of Sobolev embeddings on unbounded domains. We finally establish the large deviation principle of the fractional McKean-Vlasov equation by the weak convergence method without assuming the time Holder continuity of the non-autonomous diffusion coefficients.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels
Authors:
Tong Wu,
Zhiyong Chen,
Meixia Tao,
Bin Xia,
Wenjun Zhang
Abstract:
Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuse…
▽ More
Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuses these semantic features for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance the performance between the two users. Considering the different channel state information (CSI) of both users over DBC, a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel. Experimental results show that the proposed system outperforms the traditional broadcasting schemes.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Universal materials model of deep-learning density functional theory Hamiltonian
Authors:
Yuxiang Wang,
Yang Li,
Zechen Tang,
He Li,
Zilong Yuan,
Honggeng Tao,
Nianlong Zou,
Ting Bao,
Xinghao Liang,
Zezhou Chen,
Shanghua Xu,
Ce Bian,
Zhiming Xu,
Chong Wang,
Chen Si,
Wenhui Duan,
Yong Xu
Abstract:
Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by develo** universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling compu…
▽ More
Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by develo** universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling computational modeling of the complicated structure-property relationship of materials in general. By constructing a large materials database and substantially improving the DeepH method, we obtain a universal materials model of DeepH capable of handling diverse elemental compositions and material structures, achieving remarkable accuracy in predicting material properties. We further showcase a promising application of fine-tuning universal materials models for enhancing specific materials models. This work not only demonstrates the concept of DeepH's universal materials model but also lays the groundwork for develo** large materials models, opening up significant opportunities for advancing artificial intelligence-driven materials discovery.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
Authors:
Pengfei Gu,
Yejia Zhang,
Huimin Li,
Hongxiao Wang,
Yizhe Zhang,
Chaoli Wang,
Danny Z. Chen
Abstract:
Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack…
▽ More
Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information, which is critical for medical image segmentation tasks. In this paper, we propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation. (1) We propose a new topological loss to preserve geometric shape information by computing topological signatures of both the input and reconstructed volumes, learning geometric shape information. (2) We introduce a pre-text task that predicts the positions of the centers and eight corners of 3D crops, enabling the MAE to aggregate spatial information. (3) We extend the MAE pre-training strategy to a hybrid state-of-the-art (SOTA) medical image segmentation architecture and co-pretrain it alongside the ViT. (4) We develop a fine-tuned model for downstream segmentation tasks by complementing the pre-trained ViT encoder with our pre-trained SOTA model. Extensive experiments on five public 3D segmentation datasets show the effectiveness of our new approach.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
Authors:
Yuchen Ren,
Zhiyuan Chen,
Lifeng Qiao,
Hongtai **g,
Yuchen Cai,
Sheng Xu,
Peng Ye,
Xinzhu Ma,
Siqi Sun,
Hongliang Yan,
Dong Yuan,
Wanli Ouyang,
Xihui Liu
Abstract:
RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we i…
▽ More
RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON (\textbf{BE}nchm\textbf{A}rk for \textbf{CO}mprehensive R\textbf{N}A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https://github.com/terry-r123/RNABenchmark.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Training-free Camera Control for Video Generation
Authors:
Chen Hou,
Guoqiang Wei,
Yan Zeng,
Zhibo Chen
Abstract:
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. Instead, it can be plugged and played with most pretrained video diffusion models and generate camera controllable videos…
▽ More
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. Instead, it can be plugged and played with most pretrained video diffusion models and generate camera controllable videos with a single image or text prompt as input. The inspiration of our work comes from the layout prior that intermediate latents hold towards generated results, thus rearranging noisy pixels in them will make output content reallocated as well. As camera move could also be seen as a kind of pixel rearrangement caused by perspective change, videos could be reorganized following specific camera motion if their noisy latents change accordingly. Established on this, we propose our method CamTrol, which enables robust camera control for video diffusion models. It is achieved by a two-stage process. First, we model image layout rearrangement through explicit camera movement in 3D point cloud space. Second, we generate videos with camera motion using layout prior of noisy latents formed by a series of rearranged images. Extensive experiments have demonstrated the robustness our method holds in controlling camera motion of generated videos. Furthermore, we show that our method can produce impressive results in generating 3D rotation videos with dynamic content. Project page at https://lifedecoder.github.io/CamTrol/.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
Authors:
Xiqian Yu,
Hanxin Zhu,
Tianyu He,
Zhibo Chen
Abstract:
Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-qual…
▽ More
Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed. To alleviate the shortage of data for higher-resolution synthesis, we propose to leverage off-the-shelf 2D diffusion priors by distilling the 2D knowledge into 3D with Score Distillation Sampling (SDS). Nevertheless, applying SDS directly to Gaussian-based 3D super-resolution leads to undesirable and redundant 3D Gaussian primitives, due to the randomness brought by generative priors. To mitigate this issue, we introduce two simple yet effective techniques to reduce stochastic disturbances introduced by SDS. Specifically, we 1) shrink the range of diffusion timestep in SDS with an annealing strategy; 2) randomly discard redundant Gaussian primitives during densification. Extensive experiments have demonstrated that our proposed GaussainSR can attain high-quality results for HRNVS with only low-resolution inputs on both synthetic and real-world datasets. Project page: https://chchnii.github.io/GaussianSR/
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis
Authors:
Zhang Chen,
Luca Demetrio,
Srishti Gupta,
Xiaoyi Feng,
Zhaoqiang Xia,
Antonio Emanuele Cinà,
Maura Pintor,
Luca Oneto,
Ambra Demontis,
Battista Biggio,
Fabio Roli
Abstract:
Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in…
▽ More
Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control
Authors:
Yuzhong Huang,
Zhong Li,
Zhang Chen,
Zhiyuan Ren,
Guosheng Lin,
Fred Morstatter,
Yi Xu
Abstract:
In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus i…
▽ More
In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to refine and bolster its functionality. Subsequently, we utilize the pre-conditioned 2D images as a basis for optimizing a randomly initialized implicit representation (NeRF). This process is significantly expedited by a decoupled back-propagation technique, allowing for multiple updates of implicit parameters per optimization cycle. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods, as quantified by comparative metrics.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment
Authors:
Fengbin Guan,
Xin Li,
Zihao Yu,
Yiting Lu,
Zhibo Chen
Abstract:
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been und…
▽ More
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been under-explored. Consequently, we propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals that the Mamba model has obvious advantages compared with existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost for IQA. To increase the transferability of Q-Mamba, we propose the StylePrompt tuning paradigm, where the basic lightweight mean and variance prompts are injected to assist the task-adaptive transfer learning of pre-trained Q-Mamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our proposed StylePrompt enables better perception transfer capability with less computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets have demonstrated the effectiveness of our proposed Q-Mamba.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets
Authors:
Shenghua Wan,
Ziyuan Chen,
Le Gan,
Shuai Feng,
De-Chuan Zhan
Abstract:
Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimatio…
▽ More
Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - \emph{Separated Model-based Offline Policy Optimization} (SeMOPO) - decomposing latent states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO. To assess the efficacy, we construct the Low-Quality Vision Deep Data-Driven Datasets for RL (LQV-D4RL), where the data are collected by non-expert policy and the observations include moving distractors. Experimental results show that our method substantially outperforms all baseline methods, and further analytical experiments validate the critical designs in our method. The project website is \href{https://sites.google.com/view/semopo}{https://sites.google.com/view/semopo}.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Advancing High Resolution Vision-Language Models in Biomedicine
Authors:
Zekai Chen,
Arda Pekis,
Kevin Brown
Abstract:
Multi-modal learning has significantly advanced generative AI, especially in vision-language modeling. Innovations like GPT-4V and open-source projects such as LLaVA have enabled robust conversational agents capable of zero-shot task completions. However, applying these technologies in the biomedical field presents unique challenges. Recent initiatives like LLaVA-Med have started to adapt instruct…
▽ More
Multi-modal learning has significantly advanced generative AI, especially in vision-language modeling. Innovations like GPT-4V and open-source projects such as LLaVA have enabled robust conversational agents capable of zero-shot task completions. However, applying these technologies in the biomedical field presents unique challenges. Recent initiatives like LLaVA-Med have started to adapt instruction-tuning for biomedical contexts using large datasets such as PMC-15M. Our research offers three key contributions: (i) we present a new instruct dataset enriched with medical image-text pairs from Claude3-Opus and LLaMA3 70B, (ii) we propose a novel image encoding strategy using hierarchical representations to improve fine-grained biomedical visual comprehension, and (iii) we develop the Llama3-Med model, which achieves state-of-the-art zero-shot performance on biomedical visual question answering benchmarks, with an average performance improvement of over 10% compared to previous methods. These advancements provide more accurate and reliable tools for medical professionals, bridging gaps in current multi-modal conversational assistants and promoting further innovations in medical AI.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Fredformer: Frequency Debiased Transformer for Time Series Forecasting
Authors:
Xihao Piao,
Zheng Chen,
Taichi Murayama,
Yasuko Matsubara,
Yasushi Sakurai
Abstract:
The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this b…
▽ More
The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this bias and discovered that frequency bias results from the model disproportionately focusing on frequency features with higher energy. Based on our analysis, we formulate this bias and propose Fredformer, a Transformer-based framework designed to mitigate frequency bias by learning features equally across different frequency bands. This approach prevents the model from overlooking lower amplitude features important for accurate forecasting. Extensive experiments show the effectiveness of our proposed approach, which can outperform other baselines in different real-world time-series datasets. Furthermore, we introduce a lightweight variant of the Fredformer with an attention matrix approximation, which achieves comparable performance but with much fewer parameters and lower computation costs. The code is available at: https://github.com/chenzRG/Fredformer
△ Less
Submitted 19 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.