Search | arXiv e-print repository

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

Authors: Wenjia Xu, Zijian Yu, Yixu Wang, Jiuniu Wang, Mugen Peng

Abstract: An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in profession… ▽ More An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in professional domains. To address these limitations, we propose a LLM-driven remote sensing intelligent agent named RS-Agent. Firstly, RS-Agent is powered by a large language model (LLM) that acts as its "Central Controller," enabling it to understand and respond to various problems intelligently. Secondly, our RS-Agent integrates many high-performance remote sensing image processing tools, facilitating multi-tool and multi-turn conversations. Thirdly, our RS-Agent can answer professional questions by leveraging robust knowledge documents. We conducted experiments using several datasets, e.g., RSSDIVCS, RSVQA, and DOTAv1. The experimental results demonstrate that our RS-Agent delivers outstanding performance in many tasks, i.e., scene classification, visual question answering, and object counting tasks. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.11936 [pdf, other]

UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization

Authors: Wenjia Xu, Yaxuan Yao, Jiaqi Cao, Zhiwei Wei, Chunbo Liu, Jiuniu Wang, Mugen Peng

Abstract: The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of… ▽ More The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of UAV with the ortho satellite maps. However, collecting UAV ground-down view images across diverse locations is costly, leading to a scarcity of large-scale datasets for real-world scenarios. Existing datasets for UAV visual localization are often limited to small geographic areas or are focused only on urban regions with distinct textures. To address this, we define the UAV visual localization task by determining the UAV's real position coordinates on a large-scale satellite map based on the captured ground-down view. In this paper, we present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. This dataset comprises images from diverse drones across 11 locations in China, capturing a range of topographical features. The dataset features images from fixed-wing drones and multi-terrain drones, captured at different altitudes and orientations. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date. Our dataset is tailored to support both the training and testing of models by providing a diverse and extensive data. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.08967 [pdf, other]

Beam Management in Low Earth Orbit Satellite Communication With Handover Frequency Control and Satellite-Terrestrial Spectrum Sharing

Authors: Yaohua Sun, Jianfeng Zhu, Mugen Peng

Abstract: To achieve ubiquitous wireless connectivity, low earth orbit (LEO) satellite networks have drawn much attention. However, effective beam management is challenging due to time-varying cell load, high dynamic network topology, and complex interference situations. In this paper, under inter-satellite handover frequency and satellite-terrestrial/inter-beam interference constraints, we formulate a prac… ▽ More To achieve ubiquitous wireless connectivity, low earth orbit (LEO) satellite networks have drawn much attention. However, effective beam management is challenging due to time-varying cell load, high dynamic network topology, and complex interference situations. In this paper, under inter-satellite handover frequency and satellite-terrestrial/inter-beam interference constraints, we formulate a practical beam management problem, aiming to maximize the long-term service satisfaction of cells. Particularly, Lyapunov framework is leveraged to equivalently transform the primal problem into multiple single epoch optimization problems, where virtual queue stability constraints replace inter-satellite handover frequency constraints. Since each single epoch problem is NP-hard, we further decompose it into three subproblems, including inter-satellite handover decision, beam hop** design and satellite-terrestrial spectrum sharing. First, a proactive inter-satellite handover mechanism is developed to balance handover frequency and satellite loads. Subsequently, a beam hop** design algorithm is presented based on conflict graphs to achieve interference mitigation among beams, and then a flexible satellite-terrestrial spectrum sharing algorithm is designed to satisfy the demands of beam cells and improve spectral efficiency. Simulation results show that our proposal significantly improves service satisfaction compared with baselines, where the average data queue length of beam cells is reduced by over 50% with affordable handover frequency. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08960 [pdf, other]

doi 10.1109/TVT.2023.3325328

Timing Advance Estimation in Low Earth Orbit Satellite Networks

Authors: Jianfeng Zhu, Yaohua Sun, Mugen Peng

Abstract: Low earth orbit (LEO) satellite communication based on 3GPP standard is seen as a promising solution to rolling out communication services in areas without terrestrial base stations. However, due to the fast movement of satellites and large beam footprint size, the existing 5G timing advance (TA) estimation mechanism cannot be directly applied when global navigation satellite system is unavailable… ▽ More Low earth orbit (LEO) satellite communication based on 3GPP standard is seen as a promising solution to rolling out communication services in areas without terrestrial base stations. However, due to the fast movement of satellites and large beam footprint size, the existing 5G timing advance (TA) estimation mechanism cannot be directly applied when global navigation satellite system is unavailable. In this article, an enhanced TA estimation approach is proposed for LEO satellite communication networks. Specifically, a user-side time-frequency pre-compensation method is introduced at first, which leverages frequency offset measurement on synchronization signal blocks broadcasted by satellites in initial cell search phase. For the random access phase, the upper bound of inter-preamble interference incurred by partial-period cross-correlation operations is derived for a preamble format advised by 3GPP, and it is shown that the interference level is closely related to the square of the number of such operations. Inspired by this result, a cyclic prefix free preamble format is further designed, which features extended guard time, differential power allocation and flexible preamble structure. Numerical results show that our proposal can reduce the missed detection rate of preamble within a beam. Particularly, the missed detection rates of preamble under 32, 48, and 64 users are lower than 1% when SNR = -6 dB, which is a significant improvement compared to baselines. In addition, our proposal can limit the TA estimation error of the detected users to the time length of 25 time-domain sampling points when the subcarrier spacing is 30 kHz and operation frequency is 27 GHz. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 17 pages, 14 figures

arXiv:2404.08959 [pdf, other]

Beam Management in Low Earth Orbit Satellite Networks with Random Traffic Arrival and Time-varying Topology

Authors: Jianfeng Zhu, Yaohua Sun, Mugen Peng

Abstract: Low earth orbit (LEO) satellite communication networks have been considered as promising solutions to providing high data rate and seamless coverage, where satellite beam management plays a key role. However, due to the limitation of beam resource, dynamic network topology, beam spectrum reuse, time-varying traffic arrival and service continuity requirement, it is challenging to effectively alloca… ▽ More Low earth orbit (LEO) satellite communication networks have been considered as promising solutions to providing high data rate and seamless coverage, where satellite beam management plays a key role. However, due to the limitation of beam resource, dynamic network topology, beam spectrum reuse, time-varying traffic arrival and service continuity requirement, it is challenging to effectively allocate time-frequency resource of satellite beams to multiple cells. In this paper, aiming at reducing time-averaged beam revisit time and mitigate inter-satellite handover, a beam management problem is formulated for dynamic LEO satellite communication networks, under inter-cell interference and network stability constraints. Particularly, inter-cell interference constraints are further simplified into off-axis angle based constraints, which provide tractable rules for spectrum sharing between two beam cells. To deal with the long-term performance optimization, the primal problem is transformed into a series of single epoch problems by adopting Lyapunov optimization framework. Since the transformed problem is NP-hard, it is further divided into three subproblems, including serving beam allocation, beam service time allocation and serving satellite allocation. With the help of conflict graphs built with off-axis angle based constraints, serving beam allocation and beam service time allocation algorithms are developed to reduce beam revisit time and cell packet queue length. Then, we further develop a satellite-cell service relationship optimization algorithm to better adapt to dynamic network topology. Compared with baselines, numerical results show that our proposal can reduce average beam revisit time by 20.8% and keep strong network stability with similar inter-satellite handover frequency. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.02937 [pdf, other]

Towards Responsible and Reliable Traffic Flow Prediction with Large Language Models

Authors: Xusen Guo, Qiming Zhang, Junyue Jiang, Mingxing Peng, Hao, Yang, Meixin Zhu

Abstract: Traffic forecasting is crucial for intelligent transportation systems. It has experienced significant advancements thanks to the power of deep learning in capturing latent patterns of traffic data. However, recent deep-learning architectures require intricate model designs and lack an intuitive understanding of the map** from input data to predicted results. Achieving both accuracy and responsib… ▽ More Traffic forecasting is crucial for intelligent transportation systems. It has experienced significant advancements thanks to the power of deep learning in capturing latent patterns of traffic data. However, recent deep-learning architectures require intricate model designs and lack an intuitive understanding of the map** from input data to predicted results. Achieving both accuracy and responsibility in traffic prediction models remains a challenge due to the complexity of traffic data and the inherent opacity of deep learning models. To tackle these challenges, we propose a Responsible and Reliable Traffic flow forecasting model with Large Language Models (R2T-LLM), which leverages large language models (LLMs) to generate responsible traffic predictions. By transferring multi-modal traffic data into natural language descriptions, R2T-LLM captures complex spatial-temporal patterns and external factors from comprehensive traffic data. The LLM framework is fine-tuned using language-based instructions to align with spatial-temporal traffic flow data. Empirically, R2T-LLM shows competitive accuracy compared with deep learning baselines, while providing an intuitive and reliable explanation for predictions. We discuss the spatial-temporal and input dependencies for conditional future flow forecasting, showcasing R2T-LLM's potential for diverse city prediction tasks. This paper contributes to advancing accountable traffic prediction models and lays a foundation for future exploration of LLM applications in transportation. To the best of our knowledge, this is the first study to use LLM for accountable and reliable prediction of traffic flows. △ Less

Submitted 21 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 27pages, 8 figures

arXiv:2404.01148 [pdf, other]

Joint Beam Scheduling and Beamforming Design for Cooperative Positioning in Multi-beam LEO Satellite Networks

Authors: Hongtao Xv, Yaohua Sun, Yafei Zhao, Mugen Peng, Shijie Zhang

Abstract: Cooperative positioning with multiple low earth orbit (LEO) satellites is promising in providing location-based services and enhancing satellite-terrestrial communication. However, positioning accuracy is greatly affected by inter-beam interference and satellite-terrestrial topology geometry. To select the best combination of satellites from visible ones and suppress inter-beam interference, this… ▽ More Cooperative positioning with multiple low earth orbit (LEO) satellites is promising in providing location-based services and enhancing satellite-terrestrial communication. However, positioning accuracy is greatly affected by inter-beam interference and satellite-terrestrial topology geometry. To select the best combination of satellites from visible ones and suppress inter-beam interference, this paper explores the utilization of flexible beam scheduling and beamforming of multi-beam LEO satellites that can adjust beam directions toward the same earth-fixed cell to send positioning signals simultaneously. By leveraging Cramér-Rao lower bound (CRLB) to characterize user Time Difference of Arrival (TDOA) positioning accuracy, the concerned problem is formulated, aiming at optimizing user positioning accuracy under beam scheduling and beam transmission power constraints. To deal with the mixed-integer-nonconvex problem, we decompose it into an inner beamforming design problem and an outer beam scheduling problem. For the former, we first prove the monotonic relationship between user positioning accuracy and its perceived signal-to-interference-plus-noise ratio (SINR) to reformulate the problem, and then semidefinite relaxation (SDR) is adopted for beamforming design. For the outer problem, a heuristic low-complexity beam scheduling scheme is proposed, whose core idea is to schedule users with lower channel correlation to mitigate inter-beam interference while seeking a proper satellite-terrestrial topology geometry. Simulation results verify the superior positioning performance of our proposed positioning-oriented beamforming and beam scheduling scheme, and it is shown that average user positioning accuracy is improved by $17.1\%$ and $55.9\%$ when the beam transmission power is 20 dBw, compared to conventional beamforming and beam scheduling schemes, respectively. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00988 [pdf, other]

doi 10.1109/TMC.2024.3380891

Distributed Satellite-Terrestrial Cooperative Routing Strategy Based on Minimum Hop-Count Analysis in Mega LEO Satellite Constellation

Authors: Xin'ao Feng, Yaohua Sun, Mugen Peng

Abstract: Mega low earth orbit (LEO) satellite constellation is promising in achieving global coverage with high capacity. However, forwarding packets in mega constellation faces long end-to-end delay caused by multi-hop routing and high-complexity routing table construction, which will detrimentally impair the network transmission efficiency. To overcome this issue, a distributed low-complexity satellite-t… ▽ More Mega low earth orbit (LEO) satellite constellation is promising in achieving global coverage with high capacity. However, forwarding packets in mega constellation faces long end-to-end delay caused by multi-hop routing and high-complexity routing table construction, which will detrimentally impair the network transmission efficiency. To overcome this issue, a distributed low-complexity satellite-terrestrial cooperative routing approach is proposed in this paper, and its core idea is that each node forwards packets to next-hop node under the constraints of minimum end-to-end hop-count and queuing delay. Particularly, to achieve an accurate and low-complexity minimum end-to-end hop-count estimation in satellite-terrestrial cooperative routing scenario, we first introduce a satellite real-time position based graph (RTPG) to simplify the description of three-dimensional constellation, and further abstract RTPG into a key node based graph (KNBG). Considering the frequent regeneration of KNBG due to satellite movement, a low complexity generation method of KNBG is studied as well. Finally, utilizing KNBG as input, we design the minimum end-to-end hop-count estimation method (KNBG-MHCE). Meanwhile, the computational complexity, routing path survival probability and practical implementation of our proposal are all deeply discussed. Extensive simulations are also conducted in systems with Ka and laser band inter-satellite links to verify the superiority of our proposal. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 16pages, 15 figures, published to IEEE Transactions on Mobile Computing

Journal ref: IEEE Transactions on Mobile Computing, no. 01, pp. 1-16, 2024, early access

arXiv:2404.00051 [pdf, other]

Deja vu: Contrastive Historical Modeling with Prefix-tuning for Temporal Knowledge Graph Reasoning

Authors: Miao Peng, Ben Liu, Wenjie Xu, Zihao Jiang, Jiahui Zhu, Min Peng

Abstract: Temporal Knowledge Graph Reasoning (TKGR) is the task of inferring missing facts for incomplete TKGs in complex scenarios (e.g., transductive and inductive settings), which has been gaining increasing attention. Recently, to mitigate dependence on structured connections in TKGs, text-based methods have been developed to utilize rich linguistic information from entity descriptions. However, sufferi… ▽ More Temporal Knowledge Graph Reasoning (TKGR) is the task of inferring missing facts for incomplete TKGs in complex scenarios (e.g., transductive and inductive settings), which has been gaining increasing attention. Recently, to mitigate dependence on structured connections in TKGs, text-based methods have been developed to utilize rich linguistic information from entity descriptions. However, suffering from the enormous parameters and inflexibility of pre-trained language models, existing text-based methods struggle to balance the textual knowledge and temporal information with computationally expensive purpose-built training strategies. To tap the potential of text-based models for TKGR in various complex scenarios, we propose ChapTER, a Contrastive historical modeling framework with prefix-tuning for TEmporal Reasoning. ChapTER feeds history-contextualized text into the pseudo-Siamese encoders to strike a textual-temporal balance via contrastive estimation between queries and candidates. By introducing virtual time prefix tokens, it applies a prefix-based tuning method to facilitate the frozen PLM capable for TKGR tasks under different settings. We evaluate ChapTER on four transductive and three few-shot inductive TKGR benchmarks, and experimental results demonstrate that ChapTER achieves superior performance compared to competitive baselines with only 0.17% tuned parameters. We conduct thorough analysis to verify the effectiveness, flexibility and efficiency of ChapTER. △ Less

Submitted 25 March, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2403.18621 [pdf, other]

doi 10.1109/TVT.2024.3420880

Performance Analysis of Integrated Sensing and Communication Networks with Blockage Effects

Authors: Zezhong Sun, Shi Yan, Ning Jiang, Jiaen Zhou, Mugen Peng

Abstract: Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle,… ▽ More Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle, this paper constructs a comprehensive framework considering building blockage and employs a distance-correlated blockage model to analyze interference from line of sight (LoS), non-line of sight (NLoS), and target reflection cascading (TRC) links. Using stochastic geometric theory, expressions for signal-to-interference-plus-noise ratio (SINR) and coverage probability for communication and sensing in the presence of blockage are derived, allowing for a comprehensive comparison under the same parameters. The research findings indicate that blockage can positively impact coverage, especially in enhancing communication performance. The analysis also suggests that there exists an optimal base station (BS) density when blockage is of the same order of magnitude as the BS density, maximizing communication or sensing coverage probability. △ Less

Submitted 2 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: This paper has been accepted by IEEE Transactions on Vehicular Technology

arXiv:2403.18344 [pdf, other]

LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models

Authors: Mingxing Peng, Xusen Guo, Xianda Chen, Meixin Zhu, Kehua Chen, Hao, Yang, Xuesong Wang, Yinhai Wang

Abstract: To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address thes… ▽ More To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address these challenges by proposing LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models (LLMs). Essentially, we reformulate the lane change prediction task as a language modeling problem, processing heterogeneous driving scenario information in natural language as prompts for input into the LLM and employing a supervised fine-tuning technique to tailor the LLM specifically for our lane change prediction task. This allows us to utilize the LLM's powerful common sense reasoning abilities to understand complex interactive information, thereby improving the accuracy of long-term predictions. Furthermore, we incorporate explanatory requirements into the prompts in the inference stage. Therefore, our LC-LLM model not only can predict lane change intentions and trajectories but also provides explanations for its predictions, enhancing the interpretability. Extensive experiments on the large-scale highD dataset demonstrate the superior performance and interpretability of our LC-LLM in lane change prediction task. To the best of our knowledge, this is the first attempt to utilize LLMs for predicting lane change behavior. Our study shows that LLMs can encode comprehensive interaction information for driving behavior understanding. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.06249 [pdf, other]

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

Authors: Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Wanlong Yu, Jimin Huang, Qianqian Xie

Abstract: While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU un… ▽ More While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated and original English datasets, enriching the breadth and depth of bilingual financial modeling. It provides unrestricted access to diverse model variants, a substantial compilation of diverse cross-lingual and multi-modal instruction data, and an evaluation benchmark with expert annotations, comprising 10 NLP tasks, 20 bilingual specific tasks, totaling 95k datasets. Our thorough evaluation emphasizes the advantages of incorporating these bilingual datasets, especially in translation tasks and utilizing original English data, enhancing both linguistic flexibility and analytical acuity in financial contexts. Notably, ICE-INTENT distinguishes itself by showcasing significant enhancements over conventional LLMs and existing financial LLMs in bilingual milieus, underscoring the profound impact of robust bilingual data on the accuracy and efficacy of financial NLP. △ Less

Submitted 16 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 24 pages, 5 figures, 12 tables, including Appendix

arXiv:2403.05574 [pdf, other]

HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy

Authors: Mengxi Xiao, Qianqian Xie, Ziyan Kuang, Zhicheng Liu, Kailai Yang, Min Peng, Weiguang Han, Jimin Huang

Abstract: Large Language Models (LLMs) can play a vital role in psychotherapy by adeptly handling the crucial task of cognitive reframing and overcoming challenges such as shame, distrust, therapist skill variability, and resource scarcity. Previous LLMs in cognitive reframing mainly converted negative emotions to positive ones, but these approaches have limited efficacy, often not promoting clients' self-d… ▽ More Large Language Models (LLMs) can play a vital role in psychotherapy by adeptly handling the crucial task of cognitive reframing and overcoming challenges such as shame, distrust, therapist skill variability, and resource scarcity. Previous LLMs in cognitive reframing mainly converted negative emotions to positive ones, but these approaches have limited efficacy, often not promoting clients' self-discovery of alternative perspectives. In this paper, we unveil the Hel** and Empowering through Adaptive Language in Mental Enhancement (HealMe) model. This novel cognitive reframing therapy method effectively addresses deep-rooted negative thoughts and fosters rational, balanced perspectives. Diverging from traditional LLM methods, HealMe employs empathetic dialogue based on psychotherapeutic frameworks. It systematically guides clients through distinguishing circumstances from feelings, brainstorming alternative viewpoints, and develo** empathetic, actionable suggestions. Moreover, we adopt the first comprehensive and expertly crafted psychological evaluation metrics, specifically designed to rigorously assess the performance of cognitive reframing, in both AI-simulated dialogues and real-world therapeutic conversations. Experimental results show that our model outperforms others in terms of empathy, guidance, and logical coherence, demonstrating its effectiveness and potential positive impact on psychotherapy. △ Less

Submitted 22 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

Comments: 17 pages, 4 figures

ACM Class: J.4

arXiv:2402.12659 [pdf, other]

FinBen: A Holistic Financial Benchmark for Large Language Models

Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yi**g Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs. All datasets, results, and codes are released for the research community: https://github.com/The-FinAI/PIXIU. △ Less

Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 26 pages, 11 figures

arXiv:2402.07405 [pdf, other]

Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English

Authors: Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han, Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang, Qianqian Xie

Abstract: Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Toisón de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for… ▽ More Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Toisón de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English. We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks. Harnessing this, we introduce FinMA-ES, an LLM designed for bilingual financial applications. We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks. The FLARE-ES benchmark results reveal a significant multilingual performance gap and bias in existing LLMs. FinMA-ES models surpass SOTA LLMs such as GPT-4 in Spanish financial tasks, due to strategic instruction tuning and leveraging data from diverse linguistic resources, highlighting the positive impact of cross-linguistic transfer. All our datasets, models, and benchmarks have been released. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 10 pages, 2 figures

arXiv:2402.02094 [pdf, other]

doi 10.1016/j.isprsjprs.2023.02.012

Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification

Authors: Wenjia Xu, Jiuniu Wang, Zhiwei Wei, Mugen Peng, Yirong Wu

Abstract: Deep neural networks have achieved promising progress in remote sensing (RS) image classification, for which the training process requires abundant samples for each class. However, it is time-consuming and unrealistic to annotate labels for each RS category, given the fact that the RS target database is increasing dynamically. Zero-shot learning (ZSL) allows for identifying novel classes that are… ▽ More Deep neural networks have achieved promising progress in remote sensing (RS) image classification, for which the training process requires abundant samples for each class. However, it is time-consuming and unrealistic to annotate labels for each RS category, given the fact that the RS target database is increasing dynamically. Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training, which provides a promising solution for the aforementioned problem. However, previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes. Besides, pioneer ZSL models use convolutional neural networks pre-trained on ImageNet, which focus on the main objects appearing in each image, neglecting the background context that also matters in RS scene classification. To address the above problems, we propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images. In this way, the attribute annotation process is accomplished by machine instead of human as in other methods. Moreover, we propose a Deep Semantic-Visual Alignment (DSVA) that take advantage of the self-attention mechanism in the transformer to associate local image regions together, integrating the background context information for prediction. The DSVA model further utilizes the attribute attention maps to focus on the informative image regions that are essential for knowledge transfer in ZSL, and maps the visual images into attribute space to perform ZSL classification. With extensive experiments, we show that our model outperforms other state-of-the-art models by a large margin on a challenging large-scale RS scene classification benchmark. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: Published in ISPRS P&RS. The code is available at https://github.com/wenjiaXu/RS_Scene_ZSL

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 198, 2023, Pages 140-152

arXiv:2401.11541 [pdf, other]

Multi-View Neural 3D Reconstruction of Micro-/Nanostructures with Atomic Force Microscopy

Authors: Shuo Chen, Mao Peng, Yi** Li, Bing-Feng Ju, Hujun Bao, Yuan-Liu Chen, Guofeng Zhang

Abstract: Atomic Force Microscopy (AFM) is a widely employed tool for micro-/nanoscale topographic imaging. However, conventional AFM scanning struggles to reconstruct complex 3D micro-/nanostructures precisely due to limitations such as incomplete sample topography capturing and tip-sample convolution artifacts. Here, we propose a multi-view neural-network-based framework with AFM (MVN-AFM), which accurate… ▽ More Atomic Force Microscopy (AFM) is a widely employed tool for micro-/nanoscale topographic imaging. However, conventional AFM scanning struggles to reconstruct complex 3D micro-/nanostructures precisely due to limitations such as incomplete sample topography capturing and tip-sample convolution artifacts. Here, we propose a multi-view neural-network-based framework with AFM (MVN-AFM), which accurately reconstructs surface models of intricate micro-/nanostructures. Unlike previous works, MVN-AFM does not depend on any specially shaped probes or costly modifications to the AFM system. To achieve this, MVN-AFM uniquely employs an iterative method to align multi-view data and eliminate AFM artifacts simultaneously. Furthermore, we pioneer the application of neural implicit surface reconstruction in nanotechnology and achieve markedly improved results. Extensive experiments show that MVN-AFM effectively eliminates artifacts present in raw AFM images and reconstructs various micro-/nanostructures including complex geometrical microstructures printed via Two-photon Lithography and nanoparticles such as PMMA nanospheres and ZIF-67 nanocrystals. This work presents a cost-effective tool for micro-/nanoscale 3D analysis. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.09711 [pdf, ps, other]

doi 10.1109/TVT.2024.3353339

Joint Beam Direction Control and Radio Resource Allocation in Dynamic Multi-beam LEO Satellite Networks

Authors: Shuo Yuan, Yaohua Sun, Mugen Peng, Renzhi Yuan

Abstract: Multi-beam low earth orbit (LEO) satellites are emerging as key components in beyond 5G and 6G to provide global coverage and high data rate. To fully unleash the potential of LEO satellite communication, resource management plays a key role. However, the uneven distribution of users, the coupling of multi-dimensional resources, complex inter-beam interference, and time-varying network topologies… ▽ More Multi-beam low earth orbit (LEO) satellites are emerging as key components in beyond 5G and 6G to provide global coverage and high data rate. To fully unleash the potential of LEO satellite communication, resource management plays a key role. However, the uneven distribution of users, the coupling of multi-dimensional resources, complex inter-beam interference, and time-varying network topologies all impose significant challenges on effective communication resource management. In this paper, we study the joint optimization of beam direction and the allocation of spectrum, time, and power resource in a dynamic multi-beam LEO satellite network. The objective is to improve long-term user sum data rate while taking user fairness into account. Since the concerned resource management problem is mixed-integer non-convex programming, the problem is decomposed into three subproblems, namely beam direction control and time slot allocation, user subchannel assignment, and beam power allocation. Then, these subproblems are solved iteratively by leveraging matching with externalities and successive convex approximation, and the proposed algorithms are analyzed in terms of stability, convergence, and complexity. Extensive simulations are conducted, and the results demonstrate that our proposal can improve the number of served users by up to two times and the sum user data rate by up to 68%, compared to baseline schemes. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted by IEEE Transactions on Vehicular Technology

arXiv:2311.07114 [pdf]

Novel models for fatigue life prediction under wideband random loads based on machine learning

Authors: Hong Sun, Yuanying Qiu, **g Li, ** Bai, Ming Peng

Abstract: Machine learning as a data-driven solution has been widely applied in the field of fatigue lifetime prediction. In this paper, three models for wideband fatigue life prediction are built based on three machine learning models, i.e. support vector machine (SVM), Gaussian process regression (GPR) and artificial neural network (ANN). The generalization ability of the models is enhanced by employing n… ▽ More Machine learning as a data-driven solution has been widely applied in the field of fatigue lifetime prediction. In this paper, three models for wideband fatigue life prediction are built based on three machine learning models, i.e. support vector machine (SVM), Gaussian process regression (GPR) and artificial neural network (ANN). The generalization ability of the models is enhanced by employing numerous power spectra samples with different bandwidth parameters and a variety of material properties related to fatigue life. Sufficient Monte Carlo numerical simulations demonstrate that the newly developed machine learning models are superior to the traditional frequency-domain models in terms of life prediction accuracy and the ANN model has the best overall performance among the three developed machine learning models. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.20448 [pdf, other]

A Survey on Federated Unlearning: Challenges, Methods, and Future Directions

Authors: Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, Xiaoning Liu

Abstract: In recent years, the notion of ``the right to be forgotten" (RTBF) has become a crucial aspect of data privacy, requiring the provision of mechanisms that support the removal of personal data of individuals upon their requests. Consequently, given the extensive adoption of data-intensive machine learning (ML) algorithms and increasing concerns for personal data privacy protection, the concept of m… ▽ More In recent years, the notion of ``the right to be forgotten" (RTBF) has become a crucial aspect of data privacy, requiring the provision of mechanisms that support the removal of personal data of individuals upon their requests. Consequently, given the extensive adoption of data-intensive machine learning (ML) algorithms and increasing concerns for personal data privacy protection, the concept of machine unlearning (MU) has gained considerable attention. MU empowers an ML model to selectively eliminate identifiable information. Evolving from the foundational principles of MU, federated unlearning (FU) has emerged to confront the challenge of data erasure within federated learning (FL) settings. This empowers the FL model to unlearn an FL client or identifiable information pertaining to the client. Nevertheless, unlike traditional MU, the distinctive attributes of federated learning introduce specific challenges for FU techniques. These challenges necessitate a tailored design when develo** FU algorithms. While various concepts and numerous federated unlearning schemes exist in this field, the unified workflow and tailored design of FU are not yet well understood. Therefore, this comprehensive survey delves into the techniques, methodologies, and recent advancements in federated unlearning. It provides an overview of fundamental concepts and principles, evaluates existing federated unlearning algorithms, and reviews optimizations tailored to federated learning. Additionally, it discusses practical applications and assesses their limitations. Finally, it outlines promising directions for future research. △ Less

Submitted 10 June, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.13940 [pdf, other]

doi 10.1109/TWC.2023.3324729

Joint Network Function Placement and Routing Optimization in Dynamic Software-defined Satellite-Terrestrial Integrated Networks

Authors: Shuo Yuan, Yaohua Sun, Mugen Peng

Abstract: Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in… ▽ More Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in SDSTNs via joint optimization of virtual network function (VNF) placement and routing planning with network dynamics characterized by a time-evolving graph. Aiming at minimizing average service latency, the corresponding problem is formulated as an integer nonlinear programming under resource, VNF deployment, and time-slotted flow constraints. Since exhaustive search is intractable, we transform the primary problem into an integer linear programming by involving auxiliary variables and then propose a Benders decomposition based branch-and-cut (BDBC) algorithm. Towards practical use, a time expansion-based decoupled greedy (TEDG) algorithm is further designed with rigorous complexity analysis. Extensive experiments demonstrate the optimality of BDBC algorithm and the low complexity of TEDG algorithm. Meanwhile, it is indicated that they can improve the number of completed services within a configuration period by up to 58% and reduce the average service latency by up to 17% compared to baseline schemes. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: Accepted by IEEE Transactions on Wireless Communications

arXiv:2310.01452 [pdf, other]

Fooling the Textual Fooler via Randomizing Latent Representations

Authors: Duy C. Hoang, Quang H. Nguyen, Saurav Manchanda, MinLong Peng, Kok-Seng Wong, Khoa D. Doan

Abstract: Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the mo… ▽ More Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets. △ Less

Submitted 9 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of ACL 2024

arXiv:2308.12644 [pdf, other]

Evolutionary Dynamic Optimization Laboratory: A MATLAB Optimization Platform for Education and Experimentation in Dynamic Environments

Authors: Mai Peng, Zeneng She, Delaram Yazdani, Danial Yazdani, Wenjian Luo, Changhe Li, Juergen Branke, Trung Thanh Nguyen, Amir H. Gandomi, Yaochu **, Xin Yao

Abstract: Many real-world optimization problems possess dynamic characteristics. Evolutionary dynamic optimization algorithms (EDOAs) aim to tackle the challenges associated with dynamic optimization problems. Looking at the existing works, the results reported for a given EDOA can sometimes be considerably different. This issue occurs because the source codes of many EDOAs, which are usually very complex a… ▽ More Many real-world optimization problems possess dynamic characteristics. Evolutionary dynamic optimization algorithms (EDOAs) aim to tackle the challenges associated with dynamic optimization problems. Looking at the existing works, the results reported for a given EDOA can sometimes be considerably different. This issue occurs because the source codes of many EDOAs, which are usually very complex algorithms, have not been made publicly available. Indeed, the complexity of components and mechanisms used in many EDOAs makes their re-implementation error-prone. In this paper, to assist researchers in performing experiments and comparing their algorithms against several EDOAs, we develop an open-source MATLAB platform for EDOAs, called Evolutionary Dynamic Optimization LABoratory (EDOLAB). This platform also contains an education module that can be used for educational purposes. In the education module, the user can observe a) a 2-dimensional problem space and how its morphology changes after each environmental change, b) the behaviors of individuals over time, and c) how the EDOA reacts to environmental changes and tries to track the moving optimum. In addition to being useful for research and education purposes, EDOLAB can also be used by practitioners to solve their real-world problems. The current version of EDOLAB includes 25 EDOAs and three fully-parametric benchmark generators. The MATLAB source code for EDOLAB is publicly available and can be accessed from [https://github.com/EDOLAB-platform/EDOLAB-MATLAB]. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: This work was submitted to ACM Transactions on Mathematical Software on December 7, 2022

arXiv:2308.09954 [pdf, other]

Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs

Authors: Suhang Wu, Minlong Peng, Yue Chen, **song Su, Mingming Sun

Abstract: Large language models (LLMs) possess a wealth of knowledge encoded in their parameters. However, this knowledge may become outdated or unsuitable over time. As a result, there has been a growing interest in knowledge editing for LLMs and evaluating its effectiveness. Existing studies primarily focus on knowledge editing using factual triplets, which not only incur high costs for collection but als… ▽ More Large language models (LLMs) possess a wealth of knowledge encoded in their parameters. However, this knowledge may become outdated or unsuitable over time. As a result, there has been a growing interest in knowledge editing for LLMs and evaluating its effectiveness. Existing studies primarily focus on knowledge editing using factual triplets, which not only incur high costs for collection but also struggle to express complex facts. Furthermore, these studies are often limited in their evaluation perspectives. In this paper, we propose Eva-KELLM, a new benchmark for evaluating knowledge editing of LLMs. This benchmark includes an evaluation framework and a corresponding dataset. Under our framework, we first ask the LLM to perform knowledge editing using raw documents, which provides a more convenient and universal approach compared to using factual triplets. We then evaluate the updated LLM from multiple perspectives. In addition to assessing the effectiveness of knowledge editing and the retention of unrelated knowledge from conventional studies, we further test the LLM's ability in two aspects: 1) Reasoning with the altered knowledge, aiming for the LLM to genuinely learn the altered knowledge instead of simply memorizing it. 2) Cross-lingual knowledge transfer, where the LLM updated with raw documents in one language should be capable of handling queries from another language. To facilitate further research, we construct and release the corresponding dataset. Using this benchmark, we investigate the effectiveness of several commonly-used knowledge editing methods. Experimental results indicate that the current methods for knowledge editing using raw documents are not effective in yielding satisfactory results, particularly when it comes to reasoning with altered knowledge and cross-lingual knowledge transfer. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2308.01857 [pdf, other]

iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Optimization etc.), and part of the analysis tools (Static Timing Analysis and Power Analysis). To demonstrate the effectiveness of iEDA, we implement and tape out three chips of different scales (from 700k to 1.5M gates) on different process nodes (110nm and 28nm) with iEDA. iEDA is publicly available from the project home page http://ieda.oscc.cc. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2306.05443 [pdf, other]

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance

Authors: Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, Jimin Huang

Abstract: Although large language models (LLMs) has shown great performance on natural language processing (NLP) in the financial domain, there are no publicly available financial tailtored LLMs, instruction tuning datasets, and evaluation benchmarks, which is critical for continually pushing forward the open-source development of financial artificial intelligence (AI). This paper introduces PIXIU, a compre… ▽ More Although large language models (LLMs) has shown great performance on natural language processing (NLP) in the financial domain, there are no publicly available financial tailtored LLMs, instruction tuning datasets, and evaluation benchmarks, which is critical for continually pushing forward the open-source development of financial artificial intelligence (AI). This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 136K data samples to support the fine-tuning, and an evaluation benchmark with 5 tasks and 9 datasets. We first construct the large-scale multi-task instruction data considering a variety of financial tasks, financial document types, and financial data modalities. We then propose a financial LLM called FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. To support the evaluation of financial LLMs, we propose a standardized benchmark that covers a set of critical financial tasks, including five financial NLP tasks and one financial prediction task. With this benchmark, we conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks. The model, datasets, benchmark, and experimental results are open-sourced to facilitate future research in financial AI. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 12 pages, 1 figures

arXiv:2306.04968 [pdf, other]

Actively Supervised Clustering for Open Relation Extraction

Authors: Jun Zhao, Yongxin Zhang, Qi Zhang, Tao Gui, Zhongyu Wei, Minlong Peng, Mingming Sun

Abstract: Current clustering-based Open Relation Extraction (OpenRE) methods usually adopt a two-stage pipeline. The first stage simultaneously learns relation representations and assignments. The second stage manually labels several instances and thus names the relation for each cluster. However, unsupervised objectives struggle to optimize the model to derive accurate clustering assignments, and the numbe… ▽ More Current clustering-based Open Relation Extraction (OpenRE) methods usually adopt a two-stage pipeline. The first stage simultaneously learns relation representations and assignments. The second stage manually labels several instances and thus names the relation for each cluster. However, unsupervised objectives struggle to optimize the model to derive accurate clustering assignments, and the number of clusters has to be supplied in advance. In this paper, we present a novel setting, named actively supervised clustering for OpenRE. Our insight lies in that clustering learning and relation labeling can be alternately performed, providing the necessary guidance for clustering without a significant increase in human effort. The key to the setting is selecting which instances to label. Instead of using classical active labeling strategies designed for fixed known classes, we propose a new strategy, which is applicable to dynamically discover clusters of unknown relations. Experimental results show that our method is able to discover almost all relational clusters in the data and improve the SOTA methods by 10.3\% and 5.2\%, on two datasets respectively. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by ACL2023

arXiv:2306.04954 [pdf, other]

RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction

Authors: Jun Zhao, Wenyu Zhan, Xin Zhao, Qi Zhang, Tao Gui, Zhongyu Wei, Junzhe Wang, Minlong Peng, Mingming Sun

Abstract: Semantic matching is a mainstream paradigm of zero-shot relation extraction, which matches a given input with a corresponding label description. The entities in the input should exactly match their hypernyms in the description, while the irrelevant contexts should be ignored when matching. However, general matching methods lack explicit modeling of the above matching pattern. In this work, we prop… ▽ More Semantic matching is a mainstream paradigm of zero-shot relation extraction, which matches a given input with a corresponding label description. The entities in the input should exactly match their hypernyms in the description, while the irrelevant contexts should be ignored when matching. However, general matching methods lack explicit modeling of the above matching pattern. In this work, we propose a fine-grained semantic matching method tailored for zero-shot relation extraction. Following the above matching pattern, we decompose the sentence-level similarity score into entity and context matching scores. Due to the lack of explicit annotations of the redundant components, we design a feature distillation module to adaptively identify the relation-irrelevant features and reduce their negative impact on context matching. Experimental results show that our method achieves higher matching $F_1$ score and has an inference speed 10 times faster, when compared with the state-of-the-art methods. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by ACL2023

arXiv:2306.04915 [pdf, ps, other]

Sensing-based Beamforming Design for Joint Performance Enhancement of RIS-Aided ISAC Systems

Authors: Xiaowei Qian, Xiaoling Hu, Chenxi Liu, Mugen Peng, Caijun Zhong

Abstract: Reconfigurable intelligent surface (RIS) has shown its great potential in facilitating device-based integrated sensing and communication (ISAC), where sensing and communication tasks are mostly conducted on different time-frequency resources. While the more challenging scenarios of simultaneous sensing and communication (SSC) have so far drawn little attention. In this paper, we propose a novel RI… ▽ More Reconfigurable intelligent surface (RIS) has shown its great potential in facilitating device-based integrated sensing and communication (ISAC), where sensing and communication tasks are mostly conducted on different time-frequency resources. While the more challenging scenarios of simultaneous sensing and communication (SSC) have so far drawn little attention. In this paper, we propose a novel RIS-aided ISAC framework where the inherent location information in the received communication signals from a blind-zone user equipment is exploited to enable SSC. We first design a two-phase ISAC transmission protocol. In the first phase, communication and coarse-grained location sensing are performed concurrently by exploiting the very limited channel state information, while in the second phase, by using the coarse-grained sensing information obtained from the first phase, simple-yet-efficient sensing-based beamforming designs are proposed to realize both higher-rate communication and fine-grained location sensing. We demonstrate that our proposed framework can achieve almost the same performance as the communication-only frameworks, while providing up to millimeter-level positioning accuracy. In addition, we show how the communication and sensing performance can be simultaneously boosted through our proposed sensing-based beamforming designs. The results presented in this work provide valuable insights into the design and implementation of other ISAC systems considering SSC. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.07912 [pdf, other]

Pre-trained Language Model with Prompts for Temporal Knowledge Graph Completion

Authors: Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, Min Peng

Abstract: Temporal Knowledge graph completion (TKGC) is a crucial task that involves reasoning at known timestamps to complete the missing part of facts and has attracted more and more attention in recent years. Most existing methods focus on learning representations based on graph neural networks while inaccurately extracting information from timestamps and insufficiently utilizing the implied information… ▽ More Temporal Knowledge graph completion (TKGC) is a crucial task that involves reasoning at known timestamps to complete the missing part of facts and has attracted more and more attention in recent years. Most existing methods focus on learning representations based on graph neural networks while inaccurately extracting information from timestamps and insufficiently utilizing the implied information in relations. To address these problems, we propose a novel TKGC model, namely Pre-trained Language Model with Prompts for TKGC (PPT). We convert a series of sampled quadruples into pre-trained language model inputs and convert intervals between timestamps into different prompts to make coherent sentences with implicit semantic information. We train our model with a masking strategy to convert TKGC task into a masked token prediction task, which can leverage the semantic information in pre-trained language models. Experiments on three benchmark datasets and extensive analysis demonstrate that our model has great competitiveness compared to other models with four metrics. Our model can effectively incorporate information from temporal knowledge graphs into the language models. △ Less

Submitted 3 March, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: Accepted to Findings of ACL 2023

ACM Class: I.2.4; I.2.7

arXiv:2304.05351 [pdf, ps, other]

The Wall Street Neophyte: A Zero-Shot Analysis of ChatGPT Over MultiModal Stock Movement Prediction Challenges

Authors: Qianqian Xie, Weiguang Han, Yanzhao Lai, Min Peng, Jimin Huang

Abstract: Recently, large language models (LLMs) like ChatGPT have demonstrated remarkable performance across a variety of natural language processing tasks. However, their effectiveness in the financial domain, specifically in predicting stock market movements, remains to be explored. In this paper, we conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal stock movement prediction… ▽ More Recently, large language models (LLMs) like ChatGPT have demonstrated remarkable performance across a variety of natural language processing tasks. However, their effectiveness in the financial domain, specifically in predicting stock market movements, remains to be explored. In this paper, we conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal stock movement prediction, on three tweets and historical stock price datasets. Our findings indicate that ChatGPT is a "Wall Street Neophyte" with limited success in predicting stock movements, as it underperforms not only state-of-the-art methods but also traditional methods like linear regression using price features. Despite the potential of Chain-of-Thought prompting strategies and the inclusion of tweets, ChatGPT's performance remains subpar. Furthermore, we observe limitations in its explainability and stability, suggesting the need for more specialized training or fine-tuning. This research provides insights into ChatGPT's capabilities and serves as a foundation for future work aimed at improving financial market analysis and prediction by leveraging social media sentiment and historical stock data. △ Less

Submitted 28 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: 13 pages

arXiv:2304.00364 [pdf, other]

Mastering Pair Trading with Risk-Aware Recurrent Reinforcement Learning

Authors: Weiguang Han, Jimin Huang, Qianqian Xie, Boyi Zhang, Yanzhao Lai, Min Peng

Abstract: Although pair trading is the simplest hedging strategy for an investor to eliminate market risk, it is still a great challenge for reinforcement learning (RL) methods to perform pair trading as human expertise. It requires RL methods to make thousands of correct actions that nevertheless have no obvious relations to the overall trading profit, and to reason over infinite states of the time-varying… ▽ More Although pair trading is the simplest hedging strategy for an investor to eliminate market risk, it is still a great challenge for reinforcement learning (RL) methods to perform pair trading as human expertise. It requires RL methods to make thousands of correct actions that nevertheless have no obvious relations to the overall trading profit, and to reason over infinite states of the time-varying market most of which have never appeared in history. However, existing RL methods ignore the temporal connections between asset price movements and the risk of the performed trading. These lead to frequent tradings with high transaction costs and potential losses, which barely reach the human expertise level of trading. Therefore, we introduce CREDIT, a risk-aware agent capable of learning to exploit long-term trading opportunities in pair trading similar to a human expert. CREDIT is the first to apply bidirectional GRU along with the temporal attention mechanism to fully consider the temporal correlations embedded in the states, which allows CREDIT to capture long-term patterns of the price movements of two assets to earn higher profit. We also design the risk-aware reward inspired by the economic theory, that models both the profit and risk of the tradings during the trading period. It helps our agent to master pair trading with a robust trading preference that avoids risky trading with possible high returns and losses. Experiments show that it outperforms existing reinforcement learning methods in pair trading and achieves a significant profit over five years of U.S. stock data. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: 8 pages, 5 figures

arXiv:2303.04340 [pdf, other]

Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles

Authors: Muzi Peng, Jiangwei Wang, Dong** Song, Fei Miao, Lili Su

Abstract: Deep learning is the method of choice for trajectory prediction for autonomous vehicles. Unfortunately, its data-hungry nature implicitly requires the availability of sufficiently rich and high-quality centralized datasets, which easily leads to privacy leakage. Besides, uncertainty-awareness becomes increasingly important for safety-crucial cyber physical systems whose prediction module heavily r… ▽ More Deep learning is the method of choice for trajectory prediction for autonomous vehicles. Unfortunately, its data-hungry nature implicitly requires the availability of sufficiently rich and high-quality centralized datasets, which easily leads to privacy leakage. Besides, uncertainty-awareness becomes increasingly important for safety-crucial cyber physical systems whose prediction module heavily relies on machine learning tools. In this paper, we relax the data collection requirement and enhance uncertainty-awareness by using Federated Learning on Connected Autonomous Vehicles with an uncertainty-aware global objective. We name our algorithm as FLTP. We further introduce ALFLTP which boosts FLTP via using active learning techniques in adaptatively selecting participating clients. We consider both negative log-likelihood (NLL) and aleatoric uncertainty (AU) as client selection metrics. Experiments on Argoverse dataset show that FLTP significantly outperforms the model trained on local data. In addition, ALFLTP-AU converges faster in training regression loss and performs better in terms of NLL, minADE and MR than FLTP in most rounds, and has more stable round-wise performance than ALFLTP-NLL. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.00293 [pdf]

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Authors: Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, Xuan**g Huang

Abstract: The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In t… ▽ More The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance drop** by up to 35.74\% and 43.59\% in natural language inference and sentiment analysis tasks, respectively. We also show that GPT-3.5 faces some specific robustness challenges, including robustness instability, prompt sensitivity, and number sensitivity. These insights are valuable for understanding its limitations and guiding future research in addressing these challenges to enhance GPT-3.5's overall performance and generalization abilities. △ Less

Submitted 1 March, 2023; originally announced March 2023.

MSC Class: 68-06 ACM Class: I.2

arXiv:2302.06730 [pdf, other]

Multi-Carrier NOMA-Empowered Wireless Federated Learning with Optimal Power and Bandwidth Allocation

Authors: Weicai Li, Tiejun Lv, Yashuai Cao, Wei Ni, Mugen Peng

Abstract: Wireless federated learning (WFL) undergoes a communication bottleneck in uplink, limiting the number of users that can upload their local models in each global aggregation round. This paper presents a new multi-carrier non-orthogonal multiple-access (MC-NOMA)-empowered WFL system under an adaptive learning setting of Flexible Aggregation. Since a WFL round accommodates both local model training a… ▽ More Wireless federated learning (WFL) undergoes a communication bottleneck in uplink, limiting the number of users that can upload their local models in each global aggregation round. This paper presents a new multi-carrier non-orthogonal multiple-access (MC-NOMA)-empowered WFL system under an adaptive learning setting of Flexible Aggregation. Since a WFL round accommodates both local model training and uploading for each user, the use of Flexible Aggregation allows the users to train different numbers of iterations per round, adapting to their channel conditions and computing resources. The key idea is to use MC-NOMA to concurrently upload the local models of the users, thereby extending the local model training times of the users and increasing participating users. A new metric, namely, Weighted Global Proportion of Trained Mini-batches (WGPTM), is analytically established to measure the convergence of the new system. Another important aspect is that we maximize the WGPTM to harness the convergence of the new system by jointly optimizing the transmit powers and subchannel bandwidths. This nonconvex problem is converted equivalently to a tractable convex problem and solved efficiently using variable substitution and Cauchy's inequality. As corroborated experimentally using a convolutional neural network and an 18-layer residential network, the proposed MC-NOMA WFL can efficiently reduce communication delay, increase local model training times, and accelerate the convergence by over 40%, compared to its existing alternative. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: 33 pages, 16 figures

arXiv:2302.02136 [pdf, other]

Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer

Authors: Min Peng, Chongyang Wang, Yu Shi, Xiang-Dong Zhou

Abstract: This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill vid… ▽ More This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill video-language interactions across different spatio-temporal scales. In addition to the canonical pyramid, which includes both bottom-up and top-down pathways with lateral connections, novel strategies are proposed to decompose the visual feature stream into spatial and temporal sub-streams at different scales and implement their interactions with the linguistic semantics while preserving the integrity of local and global semantics. We demonstrate better or on-par performances with high computational efficiency against state-of-the-art methods on five VideoQA benchmarks. Our ablation study shows the scalability of our model that achieves competitive results for text-to-video retrieval by leveraging feature extractors with reusable pre-trained weights, and also the effectiveness of the pyramid. △ Less

Submitted 5 March, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

Comments: Accepted by AAAI 2023

arXiv:2301.11586 [pdf, other]

Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques

Authors: Peihua Zhang, Chenggang Wu, Mingfan Peng, Kai Zeng, Ding Yu, Yuanming Lai, Yan Kang, Wei Wang, Zhe Wang

Abstract: Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existin… ▽ More Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existing software obfuscation techniques, which mainly focus on the intra-procedural code obfuscation, no longer effective. In this paper, we propose a new inter-procedural code obfuscation mechanism Khaos, which moves the code across functions to obfuscate the function by using compilation optimizations. Two obfuscation primitives are proposed to separate and aggregate the function, which are called fission and fusion respectively. A prototype of Khaos is implemented based on the LLVM compiler and evaluated on a large number of real-world programs including SPEC CPU 2006 & 2017, CoreUtils, JavaScript engines, etc. Experimental results show that Khaos outperforms existing code obfuscations and can significantly reduce the accuracy rates of five state-of-the-art binary diffing techniques (less than 19%) with lower runtime overhead (less than 7%). △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.10724 [pdf, other]

doi 10.1145/3580305.3599951

Select and Trade: Towards Unified Pair Trading with Hierarchical Reinforcement Learning

Authors: Weiguang Han, Boyi Zhang, Qianqian Xie, Min Peng, Yanzhao Lai, Jimin Huang

Abstract: Pair trading is one of the most effective statistical arbitrage strategies which seeks a neutral profit by hedging a pair of selected assets. Existing methods generally decompose the task into two separate steps: pair selection and trading. However, the decoupling of two closely related subtasks can block information propagation and lead to limited overall performance. For pair selection, ignoring… ▽ More Pair trading is one of the most effective statistical arbitrage strategies which seeks a neutral profit by hedging a pair of selected assets. Existing methods generally decompose the task into two separate steps: pair selection and trading. However, the decoupling of two closely related subtasks can block information propagation and lead to limited overall performance. For pair selection, ignoring the trading performance results in the wrong assets being selected with irrelevant price movements, while the agent trained for trading can overfit to the selected assets without any historical information of other assets. To address it, in this paper, we propose a paradigm for automatic pair trading as a unified task rather than a two-step pipeline. We design a hierarchical reinforcement learning framework to jointly learn and optimize two subtasks. A high-level policy would select two assets from all possible combinations and a low-level policy would then perform a series of trading actions. Experimental results on real-world stock data demonstrate the effectiveness of our method on pair trading compared with both existing pair selection and trading methods. △ Less

Submitted 5 February, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: 10 pages, 6 figures

arXiv:2210.08471 [pdf, other]

Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion

Authors: Jian Song, Di Liang, Rumei Li, Yuntao Li, Sirui Wang, Minlong Peng, Wei Wu, Yongxin Yu

Abstract: Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D… ▽ More Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task. △ Less

Submitted 24 August, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: Accepted by Findings of EMNLP 2022

arXiv:2210.04870 [pdf, other]

SMiLE: Schema-augmented Multi-level Contrastive Learning for Knowledge Graph Link Prediction

Authors: Miao Peng, Ben Liu, Qianqian Xie, Wenjie Xu, Hua Wang, Min Peng

Abstract: Link prediction is the task of inferring missing links between entities in knowledge graphs. Embedding-based methods have shown effectiveness in addressing this problem by modeling relational patterns in triples. However, the link prediction task often requires contextual information in entity neighborhoods, while most existing embedding-based methods fail to capture it. Additionally, little atten… ▽ More Link prediction is the task of inferring missing links between entities in knowledge graphs. Embedding-based methods have shown effectiveness in addressing this problem by modeling relational patterns in triples. However, the link prediction task often requires contextual information in entity neighborhoods, while most existing embedding-based methods fail to capture it. Additionally, little attention is paid to the diversity of entity representations in different contexts, which often leads to false prediction results. In this situation, we consider that the schema of knowledge graph contains the specific contextual information, and it is beneficial for preserving the consistency of entities across contexts. In this paper, we propose a novel Schema-augmented Multi-level contrastive LEarning framework (SMiLE) to conduct knowledge graph link prediction. Specifically, we first exploit network schema as the prior constraint to sample negatives and pre-train our model by employing a multi-level contrastive learning method to yield both prior schema and contextual information. Then we fine-tune our model under the supervision of individual triples to learn subtler representations for link prediction. Extensive experimental results on four knowledge graph datasets with thorough analysis of each component demonstrate the effectiveness of our proposed framework against state-of-the-art baselines. The implementation of SMiLE is available at https://github.com/GKNL/SMiLE. △ Less

Submitted 3 March, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Findings of EMNLP 2022

arXiv:2208.13361 [pdf, other]

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

Authors: Faysal Hossain Shezan, Yingjie Lao, Minlong Peng, Xin Wang, Mingming Sun, ** Li

Abstract: The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in… ▽ More The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app's functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 37 pages

arXiv:2208.05479 [pdf, other]

IRS-Based Integrated Location Sensing and Communication for mmWave SIMO Systems

Authors: Xiaoling Hu, Chenxi Liu, Mugen Peng, Caijun Zhong

Abstract: In this paper, we establish an integrated sensing and communication (ISAC) system based on a distributed semi-passive intelligent reflecting surface (IRS), which allows location sensing and data transmission to be carried out simultaneously, sharing the same frequency and time resources. The detailed working process of the proposed IRS-based ISAC system is designed, including the transmission prot… ▽ More In this paper, we establish an integrated sensing and communication (ISAC) system based on a distributed semi-passive intelligent reflecting surface (IRS), which allows location sensing and data transmission to be carried out simultaneously, sharing the same frequency and time resources. The detailed working process of the proposed IRS-based ISAC system is designed, including the transmission protocol, location sensing and beamforming optimization. Specifically, each coherence block consists of two periods, the ISAC period with two time blocks and the pure communication (PC) period. During each time block of the ISAC period, data transmission and user positioning are carried out simultaneously. The estimated user location in the first time block will be used for beamforming design in the second time block. During the PC period, only data transmission is conducted, by invoking the user location estimated in the second time block of the ISAC period for beamforming design. {\color{black}Simulation results show that a millimeter-level positioning accuracy can be achieved by the proposed location sensing scheme, demonstrating the advantage of the proposed IRS-based ISAC framework. Besides, the proposed two beamforming schemes based on the estimated location information achieve similar performance to the benchmark schemes assuming perfect channel state information (CSI), which verifies the effectiveness of beamforming design using sensed location information. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2208.05300

arXiv:2208.05324 [pdf, ps, other]

IRS-Aided Non-Orthogonal ISAC Systems: Performance Analysis and Beamforming Design

Authors: Zhouyuan Yu, Xiaoling Hu, Chenxi Liu, Mugen Peng, Caijun Zhong

Abstract: Intelligent reflecting surface (IRS) has shown its effectiveness in facilitating orthogonal time-division integrated sensing and communications (TD-ISAC), in which the sensing task and the communication task occupy orthogonal time-frequency resources, while the role of IRS in the more interesting scenarios of non-orthogonal ISAC (NO-ISAC) systems has so far remained unclear. In this paper, we cons… ▽ More Intelligent reflecting surface (IRS) has shown its effectiveness in facilitating orthogonal time-division integrated sensing and communications (TD-ISAC), in which the sensing task and the communication task occupy orthogonal time-frequency resources, while the role of IRS in the more interesting scenarios of non-orthogonal ISAC (NO-ISAC) systems has so far remained unclear. In this paper, we consider an IRS-aided NO-ISAC system, where a distributed IRS is deployed to assist concurrent communication and location sensing for a blind-zone user, occupying non-orthogonal/overlapped time-frequency resources. We first propose a modified Cramer-Rao lower bound (CRLB) to characterize the performances of both communication and location sensing in a unified manner. We further derive the closed-form expressions of the modified CRLB in our considered NO-ISAC system, enabling us to identify the fundamental trade-off between the communication and location sensing performances. In addition, by exploiting the modified CRLB, we propose a joint active and passive beamforming design algorithm that achieves a good communication and location sensing trade-off. Through numerical results, we demonstrate the superiority of the IRS-aided NO-ISAC systems over the IRS-aided TD-ISAC systems, in terms of both communication and localization performances. Besides, it is shown that the IRS-aided NO-ISAC system with random communication signals can achieve comparable localization performance to the IRS-aided localization system with dedicated positioning reference signals. Moreover, we investigate the trade-off between communication performance and localization performance and show how the performance of the NO-ISAC system can be significantly boosted by increasing the number of the IRS elements. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.05300 [pdf, ps, other]

doi 10.1109/TSP.2022.3217353

Location Sensing and Beamforming Design for IRS-Enabled Multi-User ISAC Systems

Authors: Zhouyuan Yu, Xiaoling Hu, Chenxi Liu, Mugen Peng, Caijun Zhong

Abstract: This paper explores the potential of the intelligent reflecting surface (IRS) in realizing multi-user concurrent communication and localization, using the same time-frequency resources. Specifically, we propose an IRS-enabled multi-user integrated sensing and communication (ISAC) framework, where a distributed semi-passive IRS assists the uplink data transmission from multiple users to the base st… ▽ More This paper explores the potential of the intelligent reflecting surface (IRS) in realizing multi-user concurrent communication and localization, using the same time-frequency resources. Specifically, we propose an IRS-enabled multi-user integrated sensing and communication (ISAC) framework, where a distributed semi-passive IRS assists the uplink data transmission from multiple users to the base station (BS) and conducts multi-user localization, simultaneously. We first design an ISAC transmission protocol, where the whole transmission period consists of two periods, i.e., the ISAC period for simultaneous uplink communication and multi-user localization, and the pure communication (PC) period for only uplink data transmission. For the ISAC period, we propose a multi-user location sensing algorithm, which utilizes the uplink communication signals unknown to the IRS, thus removing the requirement of dedicated positioning reference signals in conventional location sensing methods. Based on the sensed users' locations, we propose two novel beamforming algorithms for the ISAC period and PC period, respectively, which can work with discrete phase shifts and require no channel state information (CSI) acquisition. Numerical results show that the proposed multi-user location sensing algorithm can achieve up to millimeter-level positioning accuracy, indicating the advantage of the IRS-enabled ISAC framework. Moreover, the proposed beamforming algorithms with sensed location information and discrete phase shifts can achieve comparable performance to the benchmark considering perfect CSI acquisition and continuous phase shifts, demonstrating how the location information can ensure the communication performance. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2205.15397 [pdf, other]

Minimax Optimal Online Imitation Learning via Replay Estimation

Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with $H^2 / N$ for behavioral cloning and $H / \sqrt{N}$ for online moment matching, where $H$ is the horizon and $N$ is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e. learning the expert policy). In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work. We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes. △ Less

Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.04061 [pdf, other]

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Authors: Min Peng, Chongyang Wang, Yuan Gao, Yu Shi, Xiang-Dong Zhou

Abstract: Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language processing. While most existing approaches ignore the visual appearance-motion information at different temporal scales, it is unknown how to incorporate the multilevel processing capacity of a deep learning model with such multiscale information. Targeting these issues,… ▽ More Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language processing. While most existing approaches ignore the visual appearance-motion information at different temporal scales, it is unknown how to incorporate the multilevel processing capacity of a deep learning model with such multiscale information. Targeting these issues, this paper proposes a novel Multilevel Hierarchical Network (MHN) with multiscale sampling for VideoQA. MHN comprises two modules, namely Recurrent Multimodal Interaction (RMI) and Parallel Visual Reasoning (PVR). With a multiscale sampling, RMI iterates the interaction of appearance-motion information at each scale and the question embeddings to build the multilevel question-guided visual representations. Thereon, with a shared transformer encoder, PVR infers the visual cues at each level in parallel to fit with answering different question types that may rely on the visual information at relevant levels. Through extensive experiments on three VideoQA datasets, we demonstrate improved performances than previous state-of-the-arts and justify the effectiveness of each part of our method. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: Accepted by IJCAI 2022. arXiv admin note: text overlap with arXiv:2109.04735

arXiv:2204.05995 [pdf, ps, other]

Communication and Computation Assisted Sensing Information Freshness Performance Analysis in Vehicular Networks

Authors: Ning Jiang, Shi Yan, Zhuohan Liu, Chun**g Hu, Mugen Peng

Abstract: The timely sharing of raw sensing information in the vehicular networks (VNETs) is essential to safety. In order to improve the freshness of sensing information, joint scheduling of multi-dimensional resources such as communication and computation is required. However, the complex relevance among multi-dimensional resources is still unclear, and it is difficult to achieve efficient resource utiliz… ▽ More The timely sharing of raw sensing information in the vehicular networks (VNETs) is essential to safety. In order to improve the freshness of sensing information, joint scheduling of multi-dimensional resources such as communication and computation is required. However, the complex relevance among multi-dimensional resources is still unclear, and it is difficult to achieve efficient resource utilization. In this paper, we present a theoretical analysis for a novel metric Age of Information (AoI) on a communication and computation assisted spatial-temporal model. An uplink VNETs scenario where Road Side Units (RSUs) are deployed with computational resource is considered. The transmission and computation process is unified into a two-stage tandem queue and the expression of the average AoI is derived. The network interference is analyzed by modeling the VNETs as Cox Poisson Point Process based on stochastic geometry and the closed-form solution of the coverage probability and the expected data rate performance under the constraints of transmission resources is obtained. The simulation results reveal the basic relationship between communication and computation capacity and show that communication and computation should reach a tradeoff to improve resource utilization while ensuring real-time information requirement. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2202.11203 [pdf, other]

Label-Smoothed Backdoor Attack

Authors: Minlong Peng, Zidi Xiong, Mingming Sun, ** Li

Abstract: By injecting a small number of poisoned samples into the training set, backdoor attacks aim to make the victim model produce designed outputs on any input injected with pre-designed backdoors. In order to achieve a high attack success rate using as few poisoned training samples as possible, most existing attack methods change the labels of the poisoned samples to the target class. This practice of… ▽ More By injecting a small number of poisoned samples into the training set, backdoor attacks aim to make the victim model produce designed outputs on any input injected with pre-designed backdoors. In order to achieve a high attack success rate using as few poisoned training samples as possible, most existing attack methods change the labels of the poisoned samples to the target class. This practice often results in severe over-fitting of the victim model over the backdoors, making the attack quite effective in output control but easier to be identified by human inspection or automatic defense algorithms. In this work, we proposed a label-smoothing strategy to overcome the over-fitting problem of these attack methods, obtaining a \textit{Label-Smoothed Backdoor Attack} (LSBA). In the LSBA, the label of the poisoned sample $\bm{x}$ will be changed to the target class with a probability of $p_n(\bm{x})$ instead of 100\%, and the value of $p_n(\bm{x})$ is specifically designed to make the prediction probability the target class be only slightly greater than those of the other classes. Empirical studies on several existing backdoor attacks show that our strategy can considerably improve the stealthiness of these attacks and, at the same time, achieve a high attack success rate. In addition, our strategy makes it able to manually control the prediction probability of the design output through manipulating the applied and activated number of LSBAs\footnote{Source code will be published at \url{https://github.com/v-mipeng/LabelSmoothedAttack.git}}. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: Backdoor Attack

arXiv:2112.06224 [pdf, ps, other]

doi 10.1109/WCSP52459.2021.9613157

Joint Sensing, Communication, and Computation Resource Allocation for Cooperative Perception in Fog-Based Vehicular Networks

Authors: Xinran Zhang, Zhimin He, Yaohua Sun, Shuo Yuan, Mugen Peng

Abstract: To enlarge the perception range and reliability of individual autonomous vehicles, cooperative perception has been received much attention. However, considering the high volume of shared messages, limited bandwidth and computation resources in vehicular networks become bottlenecks. In this paper, we investigate how to balance the volume of shared messages and constrained resources in fog-based veh… ▽ More To enlarge the perception range and reliability of individual autonomous vehicles, cooperative perception has been received much attention. However, considering the high volume of shared messages, limited bandwidth and computation resources in vehicular networks become bottlenecks. In this paper, we investigate how to balance the volume of shared messages and constrained resources in fog-based vehicular networks. To this end, we first characterize sum satisfaction of cooperative perception taking account of its spatial-temporal value and latency performance. Next, the sensing block message, communication resource block, and computation resource are jointly allocated to maximize the sum satisfaction of cooperative perception, while satisfying the maximum latency and sojourn time constraints of vehicles. Owing to its non-convexity, we decouple the original problem into two separate sub-problems and devise corresponding solutions. Simulation results demonstrate that our proposed scheme can effectively boost the sum satisfaction of cooperative perception compared with existing baselines. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: Accepted by WCSP 2021

arXiv:2109.04735 [pdf, other]

Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering

Authors: Min Peng, Chongyang Wang, Yuan Gao, Yu Shi, Xiang-Dong Zhou

Abstract: Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language understanding. While existing approaches seldom leverage the appearance-motion information in the video at multiple temporal scales, the interaction between the question and the visual information for textual semantics extraction is frequently ignored. Targeting these iss… ▽ More Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language understanding. While existing approaches seldom leverage the appearance-motion information in the video at multiple temporal scales, the interaction between the question and the visual information for textual semantics extraction is frequently ignored. Targeting these issues, this paper proposes a novel Temporal Pyramid Transformer (TPT) model with multimodal interaction for VideoQA. The TPT model comprises two modules, namely Question-specific Transformer (QT) and Visual Inference (VI). Given the temporal pyramid constructed from a video, QT builds the question semantics from the coarse-to-fine multimodal co-occurrence between each word and the visual content. Under the guidance of such question-specific semantics, VI infers the visual clues from the local-to-global multi-level interactions between the question and the video. Within each module, we introduce a multimodal attention mechanism to aid the extraction of question-video interactions, with residual connections adopted for the information passing across different levels. Through extensive experiments on three VideoQA datasets, we demonstrate better performances of the proposed method in comparison with the state-of-the-arts. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: Submitted to AAAI'22

Showing 1–50 of 122 results for author: Peng, M