-
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Authors:
Huiqiang Jiang,
Yucheng Li,
Chengruidong Zhang,
Qianhui Wu,
Xufang Luo,
Surin Ahn,
Zhenhua Han,
Amir H. Abdi,
Dongsheng Li,
Chin-Yew Lin,
Yuqing Yang,
Lili Qiu
Abstract:
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefi…
▽ More
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefilling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs. To address this gap, we introduce MInference (Milliontokens Inference), a sparse calculation method designed to accelerate pre-filling of long-sequence processing. Specifically, we identify three unique patterns in long-context attention matrices-the A-shape, Vertical-Slash, and Block-Sparsethat can be leveraged for efficient sparse computation on GPUs. We determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs. Our proposed technique can be directly applied to existing LLMs without any modifications to the pre-training setup or additional fine-tuning. By evaluating on a wide range of downstream tasks, including InfiniteBench, RULER, PG-19, and Needle In A Haystack, and models including LLaMA-3-1M, GLM4-1M, Yi-200K, Phi-3-128K, and Qwen2-128K, we demonstrate that MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy. Our code is available at https://aka.ms/MInference.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Optimizing Age of Information in Vehicular Edge Computing with Federated Graph Neural Network Multi-Agent Reinforcement Learning
Authors:
Wenhua Wang,
Qiong Wu,
**yi Fan,
Nan Cheng,
Wen Chen,
Jiangzhou Wang,
Khaled B. Letaief
Abstract:
With the rapid development of intelligent vehicles and Intelligent Transport Systems (ITS), the sensors such as cameras and LiDAR installed on intelligent vehicles provides higher capacity of executing computation-intensive and delay-sensitive tasks, thereby raising deployment costs. To address this issue, Vehicular Edge Computing (VEC) has been proposed to process data through Road Side Units (RS…
▽ More
With the rapid development of intelligent vehicles and Intelligent Transport Systems (ITS), the sensors such as cameras and LiDAR installed on intelligent vehicles provides higher capacity of executing computation-intensive and delay-sensitive tasks, thereby raising deployment costs. To address this issue, Vehicular Edge Computing (VEC) has been proposed to process data through Road Side Units (RSUs) to support real-time applications. This paper focuses on the Age of Information (AoI) as a key metric for data freshness and explores task offloading issues for vehicles under RSU communication resource constraints. We adopt a Multi-agent Deep Reinforcement Learning (MADRL) approach, allowing vehicles to autonomously make optimal data offloading decisions. However, MADRL poses risks of vehicle information leakage during communication learning and centralized training. To mitigate this, we employ a Federated Learning (FL) framework that shares model parameters instead of raw data to protect the privacy of vehicle users. Building on this, we propose an innovative distributed federated learning framework combining Graph Neural Networks (GNN), named Federated Graph Neural Network Multi-Agent Reinforcement Learning (FGNN-MADRL), to optimize AoI across the system. For the first time, road scenarios are constructed as graph data structures, and a GNN-based federated learning framework is proposed, effectively combining distributed and centralized federated aggregation. Furthermore, we propose a new MADRL algorithm that simplifies decision making and enhances offloading efficiency, further reducing the decision complexity. Simulation results demonstrate the superiority of our proposed approach to other methods through simulations.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Authors:
Qiucheng Wu,
Handong Zhao,
Michael Saxon,
Trung Bui,
William Yang Wang,
Yang Zhang,
Shiyu Chang
Abstract:
Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje…
▽ More
Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of objects and devise action plans to achieve desired outcomes in visual scenes. In our study, we introduce VSP, a benchmark that 1) evaluates the spatial planning capability in these models in general, and 2) breaks down the visual planning task into finer-grained sub-tasks, including perception and reasoning, and measure the LMs capabilities in these sub-tasks. Our evaluation shows that both open-source and private VLMs fail to generate effective plans for even simple spatial planning tasks. Evaluations on the fine-grained analytical tasks further reveal fundamental deficiencies in the models' visual perception and bottlenecks in reasoning abilities, explaining their worse performance in the general spatial planning tasks. Our work illuminates future directions for improving VLMs' abilities in spatial planning. Our benchmark is publicly available at https://github.com/UCSB-NLP-Chang/Visual-Spatial-Planning.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Understanding the Broad-line Region of Active Galactic Nuclei with Photoionization. I. the Moderate-Accretion Regime
Authors:
Qiaoya Wu,
Yue Shen,
Hengxiao Guo,
Scott F. Anderson,
W. N. Brandt,
Catherine J. Grier,
Patrick B. Hall,
Luis C. Ho,
Yasaman Homayouni,
Keith Horne,
Jennifer I-Hsiu Li,
Donald P. Schneider
Abstract:
Over three decades of reverberation map** (RM) studies on local broad-line active galactic nuclei (AGNs) have measured reliable black-hole (BH) masses for $> 100$ AGNs. These RM measurements reveal a significant correlation between the Balmer broad-line region size and the AGN optical luminosity (the $R-L$ relation). Recent RM studies for AGN samples with more diverse BH accretion parameters (e.…
▽ More
Over three decades of reverberation map** (RM) studies on local broad-line active galactic nuclei (AGNs) have measured reliable black-hole (BH) masses for $> 100$ AGNs. These RM measurements reveal a significant correlation between the Balmer broad-line region size and the AGN optical luminosity (the $R-L$ relation). Recent RM studies for AGN samples with more diverse BH accretion parameters (e.g., mass and Eddington ratio) reveal a substantial intrinsic dispersion around the average $R-L$ relation, suggesting variations in the overall spectral energy distribution shape as functions of accretion parameters. Here we perform a detailed photoionization investigation of expected broad-line properties as functions of accretion parameters, using the latest models for the AGN continuum implemented in {\tt qsosed}. We compare theoretical predictions with observations of a sample of 67 $z\lesssim0.5$ reverberation-mapped AGNs with both rest-frame optical and UV spectra in the moderate-accretion regime (Eddington ratio $λ_{\rm Edd}\equiv L/L_{\rm Edd}<0.5$). The UV/optical line strengths and their dependences on accretion parameters can be reasonably well reproduced by the locally-optimally-emitting cloud (LOC) photoionization models. We provide quantitative recipes that use optical/UV line flux ratios to infer the ionizing continuum, which is not directly observable. In addition, photoionization models with universal values of ionization parameter ($\log U_{\rm H}=-2$) and hydrogen density ($\log n({\rm H})=12$) can qualitatively reproduce the observed global $R-L$ relation for the current AGN sample. However, such models fail to reproduce the observed trend of decreasing BLR size with $L/L_{\rm Edd}$ at fixed optical luminosity, which may imply that the gas density increases with the accretion rate.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation
Authors:
Yapeng Zhao,
Qingqing Wu,
Wen Chen,
Yong Zeng,
Ruiqi Liu,
Weidong Mei,
Fen Hou,
Shaodan Ma
Abstract:
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,…
▽ More
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target, and multiple singleantenna communication users. The BS needs to allocate the available resources to efficiently provide sensing, communication, and computation services. Due to the heavy service burden and limited power budget, the BS can partially offload the tasks to the nearby edge server instead of computing them locally. We consider the estimation of the target response matrix, a general problem in radar sensing, and utilize Cramer-Rao bound (CRB) as the corresponding performance metric. To tackle the non-convex optimization problem, we propose both semidefinite relaxation (SDR)-based alternating optimization and SDR-based successive convex approximation (SCA) algorithms to minimize the CRB of radar sensing while meeting the requirement of communication users and the need for task computing. Furthermore, we demonstrate that the optimal rankone solutions of both the alternating and SCA algorithms can be directly obtained via the solver or further constructed even when dealing with multiple functionalities. Simulation results show that the proposed algorithms can provide higher target estimation performance than state-of-the-art benchmarks while satisfying the communication and computation constraints.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Sampling from the Continuous Random Energy Model in Total Variation Distance
Authors:
Holden Lee,
Qiang Wu
Abstract:
The continuous random energy model (CREM) is a toy model of spin glasses on $\{0,1\}^N$ that, in the limit, exhibits an infinitely hierarchical correlation structure. We give two polynomial-time algorithms to approximately sample from the Gibbs distribution of the CREM in the high-temperature regime, based on a Markov chain and a sequential sampler. The running time depends algebraically on the de…
▽ More
The continuous random energy model (CREM) is a toy model of spin glasses on $\{0,1\}^N$ that, in the limit, exhibits an infinitely hierarchical correlation structure. We give two polynomial-time algorithms to approximately sample from the Gibbs distribution of the CREM in the high-temperature regime, based on a Markov chain and a sequential sampler. The running time depends algebraically on the desired TV distance and failure probability and exponentially in $(1/g')^{O(1)}$, where $g'$ is the gap to a certain inverse temperature threshold; this contrasts with previous results which only attain $o(N)$ accuracy in KL divergence. If the covariance function $A$ of the CREM is concave, the algorithms work up to the critical threshold $β_c$, which is the static phase transition point; moreover, for certain $A$, the algorithms work up to the known algorithmic threshold $β_G$ proposed in Addario-Berry and Maillard (2020) for non-trivial sampling guarantees. Our result depends on quantitative bounds for the fluctuation of the partition function and a new contiguity result of the ``tilted" CREM obtained from sampling, which is of independent interest. We also show that the spectral gap is exponentially small with high probability, suggesting that the algebraic dependence is unavoidable with a Markov chain approach.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Standardized Data-Parallel Rendering Using ANARI
Authors:
Ingo Wald,
Stefan Zellmann,
Jefferson Amstutz,
Qi Wu,
Kevin Griffin,
Milan Jaros,
Stefan Wesner
Abstract:
We propose and discuss a paradigm that allows for expressing \emph{data-parallel} rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel sci-vis rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing apps to show how easy it is to adopt this paradigm, and what can be gained from doin…
▽ More
We propose and discuss a paradigm that allows for expressing \emph{data-parallel} rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel sci-vis rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing apps to show how easy it is to adopt this paradigm, and what can be gained from doing so.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting
Authors:
Xuanming Zhang,
Anthony Diaz,
Zixun Chen,
Qingyang Wu,
Kun Qian,
Erik Voss,
Zhou Yu
Abstract:
Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeki…
▽ More
Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeking to improve their writing. To bridge this gap, we introduce DECOR, a novel benchmark that includes expert annotations for detecting incoherence in L2 English writing, identifying the underlying reasons, and rewriting the incoherent sentences. To our knowledge, DECOR is the first coherence assessment dataset specifically designed for improving L2 English writing, featuring pairs of original incoherent sentences alongside their expert-rewritten counterparts. Additionally, we fine-tuned models to automatically detect and rewrite incoherence in student essays. We find that incorporating specific reasons for incoherence during fine-tuning consistently improves the quality of the rewrites, achieving a result that is favored in both automatic and human evaluations.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Extended GeV $γ$-ray emission around the star forming region of the W3 complex
Authors:
Qihang Wu,
Xiaona Sun,
Ruizhi Yang,
Tingting Ge,
Yunfeng Liang,
Enwei Liang
Abstract:
We analyze the GeV $γ$-ray emission from the W3 complex using about 14 years of Pass 8 data recorded by the $\it Fermi$ Large Area Telescope (\textit{Fermi}-LAT). We resolve the $γ$-ray emissions around W3 into two components: an elliptical Gaussian overlap** with the molecular gas and a point-like source near the cluster W3 Main. The pion-bump feature of SED for the elliptical Gaussian together…
▽ More
We analyze the GeV $γ$-ray emission from the W3 complex using about 14 years of Pass 8 data recorded by the $\it Fermi$ Large Area Telescope (\textit{Fermi}-LAT). We resolve the $γ$-ray emissions around W3 into two components: an elliptical Gaussian overlap** with the molecular gas and a point-like source near the cluster W3 Main. The pion-bump feature of SED for the elliptical Gaussian together with the better fitting result of pion decay model favor the hadronic origin. We further argue that the cosmic rays (CRs) could originate from the interactions between cluster winds and the shock produced by the SNR HB3. The point-like source positionally coincident with the star cluster W3 Main indicates it may be directly powered by near clusters, while its fainter $γ$-ray emissions below 10 GeV is possibly due to the shelter from dense gas making the low-energy CRs incapable of penetrating the dense materials. Meanwhile, we cannot rule out that the $γ$-ray emissions originate from the interaction of accelerated protons in SNR with the ambient gas.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Authors:
Qingxuan Wu,
Zhiyang Dou,
Sirui Xu,
Soshi Shimada,
Chen Wang,
Zhengming Yu,
Yuan Liu,
Cheng Lin,
Zeyu Cao,
Taku Komura,
Vladislav Golyanik,
Christian Theobalt,
Wen** Wang,
Lingjie Liu
Abstract:
Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand…
▽ More
Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
CAT: Interpretable Concept-based Taylor Additive Models
Authors:
Viet Duong,
Qiong Wu,
Zhengyi Zhou,
Hongjue Zhao,
Chenxiang Luo,
Eric Zavesky,
Huaxiu Yao,
Huajie Shao
Abstract:
As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to…
▽ More
As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with relevant names and their ground-truth values. In response, we propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process. CAT does not have to require domain experts to annotate concepts and their ground-truth values. Instead, it only requires users to simply categorize input features into broad groups, which can be easily accomplished through a quick metadata review. Specifically, CAT first embeds each group of input features into one-dimensional high-level concept representation, and then feeds the concept representations into a new white-box Taylor Neural Network (TaylorNet). The TaylorNet aims to learn the non-linear relationship between the inputs and outputs using polynomials. Evaluation results across multiple benchmarks demonstrate that CAT can outperform or compete with the baselines while reducing the need of extensive model parameters. Importantly, it can explain model predictions through high-level concepts that human can understand.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation
Authors:
Bowei Yao,
Yi Zeng,
Haizhao Dai,
Qing Wu,
Youshen Xiao,
Fei Gao,
Yuyao Zhang,
**gyi Yu,
Xiran Cai
Abstract:
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving…
▽ More
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving the image quality reconstructed from sparse data. Specially, the initial acoustic pressure distribution was modeled as a continuous function of spatial coordinates, and parameterized by a multi-layer perceptron. The weights of multi-layer perceptron were determined by training the network in self-supervised manner. And the total variation regularization term was used to offer the prior knowledge. We compared our result with some ablation studies, and the results show that out method outperforms existing methods on simulation and experimental data. Under the sparse sampling condition, our method can suppress the artifacts and avoid the ill-posed problem effectively, which reconstruct images with higher signal-to-noise ratio and contrast-to-noise ratio than traditional methods. The high-quality results for sparse data make the proposed method hold the potential for further decreasing the hardware cost of photoacoustic tomography system.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets
Authors:
Qiming Wu,
Zichen Chen,
Will Corcoran,
Misha Sra,
Ambuj K. Singh
Abstract:
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph da…
▽ More
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with an increase of 11.11\%, 33.37\%, and 33.37\%, respectively.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)
Authors:
Nick Bryan-Kinns,
Corey Ford,
Shuoyang Zheng,
Helen Kennedy,
Alan Chamberlain,
Makayla Lewis,
Drew Hemment,
Zi** Li,
Qiong Wu,
Lanxi Xiao,
Gus Xia,
Jeba Rezwana,
Michael Clemens,
Gabriel Vigliensoni
Abstract:
This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.
This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.
△ Less
Submitted 1 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling
Authors:
Shuaixin Liu,
Kunqian Li,
Yilin Ding,
Kuangwei Xu,
Qianli Jiang,
Q. M. Jonathan Wu,
Dalei Song
Abstract:
We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra…
▽ More
We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of transformation possibilities. Then, we propose an end-to-end unified framework that combines parametric Bézier curve modeling with the widely used You-Only-Look-Once (YOLO) pipeline, abbreviated as TISC-Net, and incorporates effective funnel activation and efficient multi-scale attention modules to enhance curve feature perception and learning. Furthermore, we propose incorporating trunk endpoint loss as an additional constraint to effectively mitigate the impact of endpoint deviations on the overall curve. Finally, by utilizing the depth information of pixels located along the trunk curve captured by a binocular camera, we propose accurately estimating the in-situ length of sea cucumbers through space curve integration. We established two challenging benchmark datasets for curve-based in-situ sea cucumber trunk identification. These datasets consist of over 1,000 real-world marine environment images of sea cucumbers, accompanied by Bézier format annotations. We conduct evaluation on SC-ISTI, for which our method achieves mAP50 above 0.9 on both object detection and trunk identification tasks. Extensive length measurement experiments demonstrate that the average absolute relative error is around 0.15.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
GUI Action Narrator: Where and When Did That Action Take Place?
Authors:
Qinchen Wu,
Difei Gao,
Kevin Qinghong Lin,
Zhuoyu Wu,
Xiangwu Guo,
Peiran Li,
Weichen Zhang,
Hengxu Wang,
Mike Zheng Shou
Abstract:
The advent of Multimodal LLMs has significantly enhanced image OCR recognition capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks. One fundamental aspect of develo** a GUI automation system is understanding primitive GUI actions. This comprehension is crucial as it enables agents to learn from user demonstrations, an essential element of automation. T…
▽ More
The advent of Multimodal LLMs has significantly enhanced image OCR recognition capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks. One fundamental aspect of develo** a GUI automation system is understanding primitive GUI actions. This comprehension is crucial as it enables agents to learn from user demonstrations, an essential element of automation. To rigorously evaluate such capabilities, we developed a video captioning benchmark for GUI actions, comprising 4,189 diverse video captioning samples. This task presents unique challenges compared to natural scene video captioning: 1) GUI screenshots typically contain denser information than natural scenes, and 2) events within GUIs are subtler and occur more rapidly, requiring precise attention to the appropriate time span and spatial region for accurate understanding. To address these challenges, we introduce our GUI action dataset \textbf{Act2Cap} as well as a simple yet effective framework, \textbf{GUI Narrator}, for GUI video captioning that utilizes the cursor as a visual prompt to enhance the interpretation of high-resolution screenshots. Specifically, a cursor detector is trained on our dataset, and a multimodal LLM model with mechanisms for selecting keyframes and key regions generates the captions. Experimental results indicate that even for today's most advanced multimodal models, such as GPT-4o, the task remains highly challenging. Additionally, our evaluations show that our strategy effectively enhances model performance, whether integrated into the fine-tuning of open-source models or employed as a prompting strategy in closed-source models.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
QoE Maximization for Multiple-UAV-Assisted Multi-Access Edge Computing: An Online Joint Optimization Approach
Authors:
Long He,
Geng Sun,
Zemin Sun,
Qingqing Wu,
Jiawen Kang,
Dusit Niyato,
Zhu Han,
Victor C. M. Leung
Abstract:
In disaster scenarios, conventional terrestrial multi-access edge computing (MEC) paradigms, which rely on fixed infrastructure, may become unavailable due to infrastructure damage. With high-probability line-of-sight (LoS) communication, flexible mobility, and low cost, unmanned aerial vehicle (UAV)-assisted MEC is emerging as a new promising paradigm to provide edge computing services for ground…
▽ More
In disaster scenarios, conventional terrestrial multi-access edge computing (MEC) paradigms, which rely on fixed infrastructure, may become unavailable due to infrastructure damage. With high-probability line-of-sight (LoS) communication, flexible mobility, and low cost, unmanned aerial vehicle (UAV)-assisted MEC is emerging as a new promising paradigm to provide edge computing services for ground user devices (UDs) in disaster-stricken areas. However, the limited battery capacity, computing resources, and spectrum resources also pose serious challenges for UAV-assisted MEC, which can potentially shorten the service time of UAVs and degrade the quality of experience (QoE) of UDs without an effective control approach. To this end, in this work, we first present a hierarchical architecture of multiple-UAV-assisted MEC networks that enables the coordinated provision of edge computing services by multiple UAVs. Then, we formulate a joint task offloading, resource allocation, and UAV trajectory planning optimization problem (JTRTOP) to maximize the QoE of UDs while considering the energy consumption constraints of UAVs. Since the problem is proven to be a future-dependent and NP-hard problem, we propose a novel online joint task offloading, resource allocation, and UAV trajectory planning approach (OJTRTA) to solve the problem. Specifically, the JTRTOP is first transformed into a per-slot real-time optimization problem (PROP) using the Lyapunov optimization framework. Then, a two-stage optimization method based on game theory and convex optimization is proposed to solve the PROP. Simulation results provide empirical evidence supporting the superior system performance of the proposed OJTRTA in comparison to alternative approaches.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
KAOS: Large Model Multi-Agent Operating System
Authors:
Zhao Zhuo,
Rongzhen Li,
Kai Liu,
Huhai Zou,
KaiMao Li,
Jie Yu,
Tianhao Sun,
Qingbo Wu
Abstract:
The intelligent interaction model based on large models reduces the differences in user experience across various system platforms but faces challenges in multi-agent collaboration and resource sharing. To demonstrate a uniform user experience across different foundational software platforms and address resource coordination management challenges, this paper proposes a multi-agent operating system…
▽ More
The intelligent interaction model based on large models reduces the differences in user experience across various system platforms but faces challenges in multi-agent collaboration and resource sharing. To demonstrate a uniform user experience across different foundational software platforms and address resource coordination management challenges, this paper proposes a multi-agent operating system based on the open-source Kylin. The research method involves empowering agents with large models to serve applications. First, by introducing management role agents and vertical multi-agent collaboration to construct or replace typical application software. Second, by studying system-level shared resource scheduling strategies to enhance user experience and optimize resource utilization. And finally, by validating the efficiency and superiority of the large model multi-agent operating system through real applications and scoring intelligence. The feasibility of this system is demonstrated, providing a new perspective for the development of multi-agent operating systems. Experimental results show significant advantages of multi-agent collaboration in various application scenarios.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning
Authors:
Kangwei Qi,
Qiong Wu,
**yi Fan,
Nan Cheng,
Qiang Fan,
Jiangzhou Wang
Abstract:
Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS)…
▽ More
Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS) is introduced to support vehicle communication and provide an alternative communication path. The system performance can be improved by flexibly adjusting the phase-shift of the RIS. For RIS-assisted VEC system where tasks arrive randomly, we design a control scheme that considers offloading power, local power allocation and phase-shift optimization. To solve this non-convex problem, we propose a new deep reinforcement learning (DRL) framework that employs modified multi-agent deep deterministic policy gradient (MADDPG) approach to optimize the power allocation for vehicle users (VUs) and block coordinate descent (BCD) algorithm to optimize the phase-shift of the RIS. Simulation results show that our proposed scheme outperforms the centralized deep deterministic policy gradient (DDPG) scheme and random scheme.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks
Authors:
Kangwei Qi,
Qiong Wu,
**yi Fan,
Nan Cheng,
Wen Chen,
Jiangzhou Wang,
Khaled B. Letaief
Abstract:
Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t…
▽ More
Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-to-infrastructure (V2I) links and the stability of vehicle-to-vehicle (V2V) links, we introduce the age of information (AoI) model and the payload transmission probability model. Therefore, with the objective of minimizing the AoI of V2I links and prioritizing transmission of V2V links payload, we construct this optimization problem as an Markov decision process (MDP) problem in which the BS serves as an agent to allocate resources and control phase-shift for the vehicles using the soft actor-critic (SAC) algorithm, which gradually converges and maintains a high stability. A AoI-aware joint vehicular resource allocation and RIS phase-shift control scheme based on SAC algorithm is proposed and simulation results show that its convergence speed, cumulative reward, AoI performance, and payload transmission probability outperforms those of proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3) and stochastic algorithms.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Joint parameter estimations for spin glasses
Authors:
Wei-Kuo Chen,
Arnab Sen,
Qiang Wu
Abstract:
Spin glass models with quadratic-type Hamiltonians are disordered statistical physics systems with competing ferromagnetic and anti-ferromagnetic spin interactions. The corresponding Gibbs measures belong to the exponential family parametrized by (inverse) temperature $β>0$ and external field $h\in\mathbb{R}$. Given a sample from these Gibbs measures, a statistically fundamental question is to inf…
▽ More
Spin glass models with quadratic-type Hamiltonians are disordered statistical physics systems with competing ferromagnetic and anti-ferromagnetic spin interactions. The corresponding Gibbs measures belong to the exponential family parametrized by (inverse) temperature $β>0$ and external field $h\in\mathbb{R}$. Given a sample from these Gibbs measures, a statistically fundamental question is to infer the temperature and external field parameters. In 2007, Chatterjee (Ann. Statist. 35 (2007), no.5, 1931-1946) first proved that in the absence of external field $h=0$, the maximum pseudolikelihood estimator for $β$ is $\sqrt{N}$-consistent under some mild assumptions on the disorder matrices. It was left open whether the same method can be used to estimate the temperature and external field simultaneously. In this paper, under some easily verifiable conditions, we prove that the bivariate maximum pseudolikelihood estimator is indeed jointly $\sqrt{N}$-consistent for the temperature and external field parameters. The examples cover the classical Sherrington-Kirkpatrick model and its diluted variants.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
HumanPlus: Humanoid Shadowing and Imitation from Humans
Authors:
Zipeng Fu,
Qingqing Zhao,
Qi Wu,
Gordon Wetzstein,
Chelsea Finn
Abstract:
One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a…
▽ More
One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, ty**, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Authors:
Kevin Qinghong Lin,
Linjie Li,
Difei Gao,
Qinchen WU,
Mingyi Yan,
Zhengyuan Yang,
Lijuan Wang,
Mike Zheng Shou
Abstract:
Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-c…
▽ More
Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle-level planning: generate sequences of precise action narrations based on visual state (i.e., screenshot) and goals; (iii) atomic action execution: perform specific actions such as accurately clicking designated elements. For each level, we design evaluation metrics across individual dimensions to provide clear signals, such as individual performance in clicking, dragging, ty**, and scrolling for atomic action execution. Our evaluation on VideoGUI reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks, especially for high-level planning.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System
Authors:
Ziheng Zhang,
Wen Chen,
Qingqing Wu,
Zhendong Li,
Xusheng Zhu,
**gfeng Chen,
Nan Cheng
Abstract:
This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl…
▽ More
This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multiple targets and process echo signals, respectively. Based on the above model, we derive the Fisher information matrix of the echo signal with respect to the time delay. By employing the chain rule and exploiting the geometric relationship between time delay and position, the Cramer-Rao bound (CRB) for estimating the target's Cartesian coordinate position is derived. Then, we propose a two-stage algorithmic framework to minimize CRB in single- and multi-target localization systems by joint optimizing active beamforming at BS, passive beamforming at multiple IRSs and IRS selection. For the single-target case, we derive the optimal closed-form solution for multiple IRSs coefficients design and propose a lowcomplexity algorithm based on alternating direction method of multipliers to obtain the optimal solution for active beaming design. For the multi-target case, alternating optimization is used to transform the original problem into two subproblems where semi-definite relaxation and successive convex approximation are applied to tackle the quadraticity and indefiniteness in the CRB expression, respectively. Finally, numerical simulation results validate the effectiveness of the proposed algorithm for multiple IRSs collaborative localization system compared to other benchmark schemes as well as the significant performance gains.
△ Less
Submitted 17 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Automated Molecular Concept Generation and Labeling with Large Language Models
Authors:
Shichang Zhang,
Botao Xia,
Zimin Zhang,
Qianli Wu,
Fang Sun,
Ziniu Hu,
Yizhou Sun
Abstract:
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like G…
▽ More
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like Graph Neural Networks (GNNs), primarily due to their requirement for predefined concepts and manual label for each instance, which demand domain knowledge and can be labor-intensive. This paper introduces a novel framework for Automated Molecular Concept (AutoMolCo) generation and labeling. AutoMolCo leverages the knowledge in Large Language Models (LLMs) to automatically generate predictive molecular concepts and label them for each molecule. Such procedures are repeated through iterative interactions with LLMs to refine concepts, enabling simple linear models on the refined concepts to outperform GNNs and LLM in-context learning on several benchmarks. The whole AutoMolCo framework is automated without any human knowledge inputs in either concept generation, labeling, or refinement, thereby surpassing the limitations of extant CMs while maintaining their explainability and allowing easy intervention. Through systematic experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets, we demonstrate that the AutoMolCo-induced explainable CMs are beneficial and promising for molecular science research.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
IMFL-AIGC: Incentive Mechanism Design for Federated Learning Empowered by Artificial Intelligence Generated Content
Authors:
Guang**g Huang,
Qiong Wu,
**gyi Li,
Xu Chen
Abstract:
Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data. To alleviate the heterogeneous data quality among clients, artificial intelligence-generated content (AIGC) can be leveraged as a novel data synthesis technique for FL model performance enhancement. Due to various costs incurred by AIGC…
▽ More
Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data. To alleviate the heterogeneous data quality among clients, artificial intelligence-generated content (AIGC) can be leveraged as a novel data synthesis technique for FL model performance enhancement. Due to various costs incurred by AIGC-empowered FL (e.g., costs of local model computation and data synthesis), however, clients are usually reluctant to participate in FL without adequate economic incentives, which leads to an unexplored critical issue for enabling AIGC-empowered FL. To fill this gap, we first devise a data quality assessment method for data samples generated by AIGC and rigorously analyze the convergence performance of FL model trained using a blend of authentic and AI-generated data samples. We then propose a data quality-aware incentive mechanism to encourage clients' participation. In light of information asymmetry incurred by clients' private multi-dimensional attributes, we investigate clients' behavior patterns and derive the server's optimal incentive strategies to minimize server's cost in terms of both model accuracy loss and incentive payments for both complete and incomplete information scenarios. Numerical results demonstrate that our proposed mechanism exhibits highest training accuracy and reduces up to 53.34% of the server's cost with real-world datasets, compared with existing benchmark mechanisms.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Authors:
Yuhui Wang,
Qingyuan Wu,
Weida Li,
Dylan R. Ashley,
Francesco Faccio,
Chao Huang,
Jürgen Schmidhuber
Abstract:
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100\times 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due…
▽ More
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100\times 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introducing an "adaptive highway loss" that constructs skip connections to improve gradient flow. We evaluate our method on both 2D maze navigation environments and the ViZDoom 3D navigation benchmark. We find that our new method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and casually solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in RL environments.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
FSH: 3D Representation via Fibonacci Spherical Harmonics
Authors:
Zikuan Li,
Anyi Huang,
Wenru Jia,
Qiaoyun Wu,
Mingqiang Wei,
Jun Wang
Abstract:
Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a…
▽ More
Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a 3D representation method using Fibonacci Spherical Harmonics (FSH). We introduce a spherical Fibonacci grid (SFG), which is more uniform than equiangular grids for SHT in the frequency domain. Our method employs analytical weights for SHT on SFG, effectively assigning sampling errors to spherical harmonic degrees higher than the recovered band-limited function. This provides a novel solution for spherical harmonic transformation on non-equiangular grids. The key advantages of our FSH method include: 1) With the same number of sampling points, SFG captures more features without bias compared to equiangular grids; 2) The root mean square error of 32-degree spherical harmonic coefficients is reduced by approximately 34.6\% for SFG compared to equiangular grids; and 3) FSH offers more stable frequency domain representations, especially for rotating functions. FSH enhances the stability of frequency domain representations under rotational transformations. Its application in 3D shape reconstruction and 3D shape classification results in more accurate and robust representations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Interference Analysis for Coexistence of UAVs and Civil Aircrafts Based on Automatic Dependent Surveillance-Broadcast
Authors:
Yiyang Liao,
Ziye Jia,
Chao Dong,
Lei Zhang,
Qihui Wu,
Huiling Hu,
Zhu Han
Abstract:
Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and groun…
▽ More
Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and ground stations (GSs). However, due to the limited resource of channel capacity, UAVs equipped with ADS-B results in the interference between UAVs and civil aircrafts (CAs), which further impacts the accuracy of received information at GSs. In detail, the channel capacity is mainly affected by the density of aircrafts and the transmitting power of ADS-B. Hence, based on the three-dimensional poisson point process, this work leverages the stochastic geometry theory to build a model of the coexistence of UAVs and CAs and analyze the interference performance of ADS-B monitoring system. From simulation results, we reveal the effects of transmitting power, density, threshold and pathloss on the performance of the ADS-B monitoring system. Besides, we provide the suggested transmitting power and density for the safe coexistence of UAVs and CAs.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets
Authors:
Zhiyu Shao,
Qiong Wu,
**yi Fan,
Nan Cheng,
Qiang Fan,
Jiangzhou Wang
Abstract:
This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i…
▽ More
This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking in urban environments, we design a semantic communication system and introduce two resource allocation metrics: high-speed semantic transmission rate (HSR) and semantic spectrum efficiency (HSSE). Our main goal is to maximize HSSE. Additionally, we address the coexistence of vehicular users and WiFi users in 5G New Radio Unlicensed (NR-U) networks. To tackle this complex challenge, we propose a novel approach that jointly optimizes flexible DC coexistence mechanism and the allocation of resources and base stations (BSs). Unlike traditional bit transmission methods, our approach integrates the semantic communication paradigm into the communication system. Experimental results demonstrate that our proposed solution outperforms traditional bit transmission methods with traditional DC coexistence mechanism in terms of HSSE and semantic throughput (ST) for both vehicular and WiFi users.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty
Authors:
Qizhen Wu,
Kexin Liu,
Lei Chen,
**hu Lü
Abstract:
In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies and dynamic obstacles complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal wit…
▽ More
In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies and dynamic obstacles complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal with the hybrid process. Here, we propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism between the two layers, which indicates the quantified uncertainty. It decouples the hybrid process into discrete allocation and continuous planning layers, with a probabilistic ensemble model to quantify the uncertainty and regulate the interaction frequency adaptively. Furthermore, to overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training, which enhances the training efficiency and stability. Experiment results in both comparison and ablation studies validate the effectiveness and generalization performance of our proposed approach.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning
Authors:
Zhiyu Shao,
Qiong Wu,
**yi Fan,
Nan Cheng,
Wen Chen,
Jiangzhou Wang,
Khaled B. Letaief
Abstract:
This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement le…
▽ More
This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.
△ Less
Submitted 17 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Movable Antenna Enhanced NOMA Short-Packet Transmission
Authors:
Xinyuan He,
Wen Chen,
Qingqing Wu,
Xusheng Zhu,
Nan Cheng
Abstract:
This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is app…
▽ More
This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is approached by decomposing it into two subproblems, utilizing successive convex approximation (SCA) to handle the highly non-concave nature of channel gains. Numerical results confirm that the proposed solution offers substantial improvements in effective throughput compared to NOMA short-packet communication with fixed position antennas (FPAs).
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Instability of Self-Driving Satellite Mega-Constellation: From Theory to Practical Impacts on Network Lifetime and Capacity
Authors:
Yimei Chen,
Yuanjie Li,
Hewu Li,
Lixin Liu,
Li Ouyang,
Jiabo Yang,
Junyi Li,
Jian** Wu,
Qian Wu,
Jun Liu,
Zeqi Lai
Abstract:
Low Earth Orbit (LEO) satellite mega-constellations aim to enable high-speed Internet for numerous users anywhere on Earth. To safeguard their network infrastructure in congested outer space, they perform automatic orbital maneuvers to avoid collisions with external debris and satellites. However, our control-theoretic analysis and empirical validation using Starlink's space situational awareness…
▽ More
Low Earth Orbit (LEO) satellite mega-constellations aim to enable high-speed Internet for numerous users anywhere on Earth. To safeguard their network infrastructure in congested outer space, they perform automatic orbital maneuvers to avoid collisions with external debris and satellites. However, our control-theoretic analysis and empirical validation using Starlink's space situational awareness datasets discover that, these safety-oriented maneuvers themselves can threaten safety and networking via cascaded collision avoidance inside the mega-constellation. This domino effect forces a dilemma between long-term LEO network lifetime and short-term LEO network capacity. Its root cause is that, the decades-old local pairwise maneuver paradigm for standalone satellites is inherently unstable if scaled out to recent mega-constellation networks. We thus propose an alternative bilateral maneuver control that stabilizes self-driving mega-constellations for concurrent network lifetime and capacity boosts. Our operational trace-driven emulation shows a 8$\times$ network lifetime extension in Starlink without limiting its network capacity.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS
Authors:
Ruiqi Liu,
Shuang Zheng,
Qingqing Wu,
Yifan Jiang,
Nan Zhang,
Yuanwei Liu,
Marco Di Renzo,
and George C. Alexandropoulos
Abstract:
Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research inv…
▽ More
Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research investigations and applications. To pave the way to the formal standardization of RISs, the European Telecommunications Standards Institute (ETSI) launched the Industry Specification Group (ISG) on the RIS technology in September 2021. This article provides a comprehensive overview of the status of the work conducted by the ETSI ISG RIS, covering typical deployment scenarios of reconfigurable metasurfaces, use cases and operating applications, requirements, emerging hardware architectures and operating modes, as well as the latest insights regarding future directions of RISs and the resulting smart wireless environments.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Learning Divergence Fields for Shift-Robust Graph Representations
Authors:
Qitian Wu,
Fan Nie,
Chenxiao Yang,
Junchi Yan
Abstract:
Real-world data generation often involves certain geometries (e.g., graphs) that induce instance-level interdependence. This characteristic makes the generalization of learning models more difficult due to the intricate interdependent patterns that impact data-generative distributions and can vary from training to testing. In this work, we propose a geometric diffusion model with learnable diverge…
▽ More
Real-world data generation often involves certain geometries (e.g., graphs) that induce instance-level interdependence. This characteristic makes the generalization of learning models more difficult due to the intricate interdependent patterns that impact data-generative distributions and can vary from training to testing. In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data. We generalize the diffusion equation with stochastic diffusivity at each time step, which aims to capture the multi-faceted information flows among interdependent data. Furthermore, we derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains. Regarding practical implementation, we introduce three model instantiations that can be considered as the generalized versions of GCN, GAT, and Transformers, respectively, which possess advanced robustness against distribution shifts. We demonstrate their promising efficacy for out-of-distribution generalization on diverse real-world datasets.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
SMART: Scene-motion-aware human action recognition framework for mental disorder group
Authors:
Zengyuan Lai,
Jiarui Yang,
Songpengcheng Xia,
Qi Wu,
Zhen Sun,
Wenxian Yu,
Ling Pei
Abstract:
Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this…
▽ More
Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this paper, we innovatively propose to build a vision-based HAR dataset including abnormal actions often occurring in the mental disorder group and then introduce a novel Scene-Motion-aware Action Recognition Technology framework, named SMART, consisting of two technical modules. First, we propose a scene perception module to extract human motion trajectory and human-scene interaction features, which introduces additional scene information for a supplementary semantic representation of the above actions. Second, the multi-stage fusion module fuses the skeleton motion, motion trajectory, and human-scene interaction features, enhancing the semantic association between the skeleton motion and the above supplementary representation, thus generating a comprehensive representation with both human motion and scene information. The effectiveness of our proposed method has been validated on our self-collected HAR dataset (MentalHAD), achieving 94.9% and 93.1% accuracy in un-seen subjects and scenes and outperforming state-of-the-art approaches by 6.5% and 13.2%, respectively. The demonstrated subject- and scene- generalizability makes it possible for SMART's migration to practical deployment in smart healthcare systems for mental disorder patients in medical settings. The code and dataset will be released publicly for further research: https://github.com/Inowlzy/SMART.git.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
How Far Can We Compress Instant-NGP-Based NeRF?
Authors:
Yihang Chen,
Qianyi Wu,
Mehrtash Harandi,
Jianfei Cai
Abstract:
In recent years, Neural Radiance Field (NeRF) has demonstrated remarkable capabilities in representing 3D scenes. To expedite the rendering process, learnable explicit representations have been introduced for combination with implicit NeRF representation, which however results in a large storage space requirement. In this paper, we introduce the Context-based NeRF Compression (CNC) framework, whic…
▽ More
In recent years, Neural Radiance Field (NeRF) has demonstrated remarkable capabilities in representing 3D scenes. To expedite the rendering process, learnable explicit representations have been introduced for combination with implicit NeRF representation, which however results in a large storage space requirement. In this paper, we introduce the Context-based NeRF Compression (CNC) framework, which leverages highly efficient context models to provide a storage-friendly NeRF representation. Specifically, we excavate both level-wise and dimension-wise context dependencies to enable probability prediction for information entropy reduction. Additionally, we exploit hash collision and occupancy grids as strong prior knowledge for better context modeling. To the best of our knowledge, we are the first to construct and exploit context models for NeRF compression. We achieve a size reduction of 100$\times$ and 70$\times$ with improved fidelity against the baseline Instant-NGP on Synthesic-NeRF and Tanks and Temples datasets, respectively. Additionally, we attain 86.7\% and 82.3\% storage size reduction against the SOTA NeRF compression method BiRF. Our code is available here: https://github.com/YihangChen-ee/CNC.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Entanglement-assist cyclic weak-value-amplification metrology
Authors:
Zi-Rui Zhong,
Xia-lin Su,
Xiang-Ming Hu,
Qing-lin Wu
Abstract:
Weak measurement has garnered widespread interest for its ability to amplify small physical effects at the cost of low detection probabilities. Previous entanglement and recycling techniques enhance postselection efficiency and signal-to-noise ratio (SNR) of weak measurement from distinct perspectives. Here, we incorporate a power recycling cavity into the entanglement-assisted weak measurement sy…
▽ More
Weak measurement has garnered widespread interest for its ability to amplify small physical effects at the cost of low detection probabilities. Previous entanglement and recycling techniques enhance postselection efficiency and signal-to-noise ratio (SNR) of weak measurement from distinct perspectives. Here, we incorporate a power recycling cavity into the entanglement-assisted weak measurement system. We obtain an improvement of both detection efficiency and Fisher information, and find that the improvement from entanglement and recycling occur in different dimensions. Furthermore, we analyze two types of errors, walk-off errors and readout errors. The conclusions suggest that entanglement exacerbates the walk-off effect caused by recycling, but this detriment can be balanced by proper parameter selection. In addition, power-recycling can complement entanglement in suppressing readout noise, thus enhancing the accuracy in the measurement results and recovering the lost Fisher information. This work delves deeper into the metrological advantages of weak measurement.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Highway Value Iteration Networks
Authors:
Yuhui Wang,
Weida Li,
Francesco Faccio,
Qingyuan Wu,
Jürgen Schmidhuber
Abstract:
Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment…
▽ More
Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation
Authors:
Salim Rezvani,
Farhad Pourpanah,
Chee Peng Lim,
Q. M. Jonathan Wu
Abstract:
This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based…
▽ More
This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based models into re-sampling, algorithmic, and fusion methods, and discuss the principles of the representative models in each category. In addition, we conduct a series of empirical evaluations to compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets, ranging from low to high imbalanced ratios. Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best, but with a higher computational load. A discussion on research gaps and future research directions is provided.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Authors:
Yijiong Yu,
Huiqiang Jiang,
Xufang Luo,
Qianhui Wu,
Chin-Yew Lin,
Dongsheng Li,
Yuqing Yang,
Yongfeng Huang,
Lili Qiu
Abstract:
Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt…
▽ More
Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Authors:
Haodong Hong,
Sen Wang,
Zi Huang,
Qi Wu,
Jiajun Liu
Abstract:
Current Vision-and-Language Navigation (VLN) tasks mainly employ textual instructions to guide agents. However, being inherently abstract, the same textual instruction can be associated with different visual signals, causing severe ambiguity and limiting the transfer of prior knowledge in the vision domain from the user to the agent. To fill this gap, we propose Vision-and-Language Navigation with…
▽ More
Current Vision-and-Language Navigation (VLN) tasks mainly employ textual instructions to guide agents. However, being inherently abstract, the same textual instruction can be associated with different visual signals, causing severe ambiguity and limiting the transfer of prior knowledge in the vision domain from the user to the agent. To fill this gap, we propose Vision-and-Language Navigation with Multi-modal Prompts (VLN-MP), a novel task augmenting traditional VLN by integrating both natural language and images in instructions. VLN-MP not only maintains backward compatibility by effectively handling text-only prompts but also consistently shows advantages with different quantities and relevance of visual prompts. Possible forms of visual prompts include both exact and similar object images, providing adaptability and versatility in diverse navigation scenarios. To evaluate VLN-MP under a unified framework, we implement a new benchmark that offers: (1) a training-free pipeline to transform textual instructions into multi-modal forms with landmark images; (2) diverse datasets with multi-modal instructions for different downstream tasks; (3) a novel module designed to process various image prompts for seamless integration with state-of-the-art VLN models. Extensive experiments on four VLN benchmarks (R2R, RxR, REVERIE, CVDN) show that incorporating visual prompts significantly boosts navigation performance. While maintaining efficiency with text-only prompts, VLN-MP enables agents to navigate in the pre-explore setting and outperform text-based models, showing its broader applicability.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Flag-like singular integrals and associated Hardy spaces on a kind of nilpotent Lie groups of step two
Authors:
Wei Wang,
Qingyan Wu
Abstract:
The Cauchy-Szegö singular integral is a fundamental tool in the study of holomorphic $H^p$ Hardy space. But for a kind of Siegel domains, the Cauchy-Szegö kernels are neither product ones nor flag ones on the Shilov boundaries, which have the structure of nilpotent Lie groups $\mathscr N $ of step two. We use the lifting method to investigate flag-like singular integrals on $\mathscr N $, which in…
▽ More
The Cauchy-Szegö singular integral is a fundamental tool in the study of holomorphic $H^p$ Hardy space. But for a kind of Siegel domains, the Cauchy-Szegö kernels are neither product ones nor flag ones on the Shilov boundaries, which have the structure of nilpotent Lie groups $\mathscr N $ of step two. We use the lifting method to investigate flag-like singular integrals on $\mathscr N $, which includes these Cauchy-Szegö ones as a special case. The lifting group is the product $\tilde {\mathscr N }$ of three Heisenberg groups, and naturally geometric or analytical objects on $\mathscr N $ are the projection of those on $\tilde {\mathscr N } $. As in the flag case, we introduce various notions on $\mathscr N $ adapted to geometric feature of these kernels, such as tubes, nontangential regions, tube maximal functions, Littlewood-Paley functions, tents, shards and atoms etc. They have the feature of tri-parameters, although the second step of the group $\mathscr N$ is only $2$-dimensional, i.e. there exists a hidden parameter as in the flag case. We also establish the corresponding Calderón reproducing formula, characterization of $ L ^p (\mathscr N)$ by Littlewood-Paley functions, $ L ^p $-boundedness of tube maximal functions and flag-like singular integrals and atomic decomposition of $H^1$ Hardy space on $ {\mathscr N } $.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Augmented Commonsense Knowledge for Remote Object Grounding
Authors:
Bahram Mohammadi,
Yicong Hong,
Yuankai Qi,
Qi Wu,
Shirui Pan,
Javen Qinfeng Shi
Abstract:
The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task…
▽ More
The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task, which uses concise high-level instructions, such as ''Bring me the blue cushion in the master bedroom''. To address enhancing representation, we propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as a spatio-temporal knowledge graph for improving agent navigation. Specifically, the proposed approach involves constructing a knowledge base by retrieving commonsense information from ConceptNet, followed by a refinement module to remove noisy and irrelevant knowledge. We further present ACK which consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment by integrating visible objects, commonsense knowledge, and concept history, which includes object and knowledge temporal information. Moreover, we add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction. Experimental results demonstrate our proposed model noticeably outperforms the baseline and archives the state-of-the-art on the REVERIE benchmark.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking
Authors:
Yifan Zeng,
Ojas Tendolkar,
Raymond Baartmans,
Qingyun Wu,
Huazheng Wang,
Lizhong Chen
Abstract:
Ranking passages by prompting a large language model (LLM) can achieve promising performance in modern information retrieval (IR) systems. A common approach is to sort the ranking list by prompting LLMs for pairwise comparison. However, sorting-based methods require consistent comparisons to correctly sort the passages, which we show that LLMs often violate. We identify two kinds of intrinsic inco…
▽ More
Ranking passages by prompting a large language model (LLM) can achieve promising performance in modern information retrieval (IR) systems. A common approach is to sort the ranking list by prompting LLMs for pairwise comparison. However, sorting-based methods require consistent comparisons to correctly sort the passages, which we show that LLMs often violate. We identify two kinds of intrinsic inconsistency in LLM-based pairwise comparisons: order inconsistency which leads to conflicting results when switching the passage order, and transitive inconsistency which leads to non-transitive triads among all preference pairs. In this paper, we propose LLM-RankFusion, an LLM-based ranking framework that mitigates these inconsistencies and produces a robust ranking list. LLM-RankFusion mitigates order inconsistency using in-context learning (ICL) to demonstrate order-agnostic comparisons and calibration to estimate the underlying preference probability between two passages. We then address transitive inconsistency by aggregating the ranking results from multiple rankers. In our experiments, we empirically show that LLM-RankFusion can significantly reduce inconsistent pairwise comparison results, and improve the ranking quality by making the final ranking list more robust.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Dynamic Microgrid Formation Considering Time-dependent Contingency: A Distributionally Robust Approach
Authors:
Ziang Liu,
Sheng Cai,
Qiuwei Wu,
Xinwei Shen,
Xuan Zhang,
Nikos Hatziargyriou
Abstract:
The increasing frequency of extreme weather events has posed significant risks to the operation of power grids. During long-duration extreme weather events, microgrid formation (MF) is an essential solution to enhance the resilience of the distribution systems by proactively partitioning the distribution system into several microgrids to mitigate the impact of contingencies. This paper proposes a…
▽ More
The increasing frequency of extreme weather events has posed significant risks to the operation of power grids. During long-duration extreme weather events, microgrid formation (MF) is an essential solution to enhance the resilience of the distribution systems by proactively partitioning the distribution system into several microgrids to mitigate the impact of contingencies. This paper proposes a distributionally robust dynamic microgrid formation (DR-DMF) approach to fully consider the temporal characteristics of line failure probability during long-duration extreme weather events like typhoons. The boundaries of each microgrid are dynamically adjusted to enhance the resilience of the system. Furthermore, the expected load shedding is minimized by a distributionally robust optimization model considering the uncertainty of line failure probability regarding the worst-case distribution of contingencies. The effectiveness of the proposed model is verified by numerical simulations on a modified IEEE 37-node system.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Pinning and dipole asymptotics of locally deformed striped phases
Authors:
Arnd Scheel,
Qiliang Wu
Abstract:
We investigate the effect of spatial inhomogeneity on perfectly periodic, self-organized striped patterns in spatially extended systems. We demonstrate that inhomogeneities select a specific translate of the striped patterns and induce algebraically decaying, dipole-type farfield deformations. Phase shifts and leading order terms are determined by effective moments of the spatial inhomogeneity. Fa…
▽ More
We investigate the effect of spatial inhomogeneity on perfectly periodic, self-organized striped patterns in spatially extended systems. We demonstrate that inhomogeneities select a specific translate of the striped patterns and induce algebraically decaying, dipole-type farfield deformations. Phase shifts and leading order terms are determined by effective moments of the spatial inhomogeneity. Farfield decay is proportional to the derivatives of the Green's function of an effective Laplacian. Technically, we use mode filters and conjugacies to an effective Laplacian to establish Fredholm properties of the linearization in Kondratiev spaces. Spatial localization in a contraction argument is gained through the use of an explicit deformation ansatz and a subtle cancellation in Bloch wave space.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Authors:
Feng Chen,
Zhen Yang,
Bohan Zhuang,
Qi Wu
Abstract:
We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible, online video editing is tailored to real-life applications such as live streaming and online chat, requiring (1) fast continual step inference, (2) long-term tem…
▽ More
We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible, online video editing is tailored to real-life applications such as live streaming and online chat, requiring (1) fast continual step inference, (2) long-term temporal modeling, and (3) zero-shot video editing capability. To solve these issues, we propose Streaming Video Diffusion (SVDiff), which incorporates the compact spatial-aware temporal recurrence into off-the-shelf Stable Diffusion and is trained with the segment-level scheme on large-scale long videos. This simple yet effective setup allows us to obtain a single model that is capable of executing a broad range of videos and editing each streaming frame with temporal coherence. Our experiments indicate that our model can edit long, high-quality videos with remarkable results, achieving a real-time inference speed of 15.2 FPS at a resolution of 512x512.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.