-
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Authors:
Wenbin An,
Feng Tian,
Sicong Leng,
Jiahao Nie,
Haonan Lin,
QianYing Wang,
Guang Dai,
** Chen,
Shijian Lu
Abstract:
Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec…
▽ More
Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA.
△ Less
Submitted 21 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Authors:
Zesen Cheng,
Sicong Leng,
Hang Zhang,
Yifei Xin,
Xin Li,
Guanzheng Chen,
Yongxin Zhu,
Wenqi Zhang,
Ziyang Luo,
Deli Zhao,
Lidong Bing
Abstract:
In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data…
▽ More
In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data. Additionally, we integrate an Audio Branch into the model through joint training, thereby enriching the multimodal understanding capabilities of the model by seamlessly incorporating audio cues. Comprehensive evaluations on multiple-choice video question answering (MC-VQA), open-ended video question answering (OE-VQA), and video captioning (VC) tasks demonstrate that VideoLLaMA 2 consistently achieves competitive results among open-source models and even gets close to some proprietary models on several benchmarks. Furthermore, VideoLLaMA 2 exhibits reasonable improvements in audio-only and audio-video question-answering (AQA & OE-AVQA) benchmarks over existing models. These advancements underline VideoLLaMA 2's superior performance in multimodal comprehension, setting a new standard for intelligent video analysis systems. All models are public to facilitate further research.
△ Less
Submitted 17 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Authors:
Hang Du,
Sicheng Zhang,
Binzhu Xie,
Guoshun Nan,
Jiayang Zhang,
Junrui Xu,
Hangyu Liu,
Sicong Leng,
Jiangming Liu,
Hehe Fan,
Dajiu Huang,
**g Feng,
Linli Chen,
Can Zhang,
Xuhuan Li,
Hao Zhang,
Jianhang Chen,
Qimei Cui,
Xiaofeng Tao
Abstract:
Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?"…
▽ More
Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.
△ Less
Submitted 6 May, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Constrained Layout Generation with Factor Graphs
Authors:
Mohammed Haroon Dupty,
Yanfei Dong,
Sicong Leng,
Guoji Fu,
Yong Liang Goh,
Wei Lu,
Wee Sun Lee
Abstract:
This paper addresses the challenge of object-centric layout generation under spatial constraints, seen in multiple domains including floorplan design process. The design process typically involves specifying a set of spatial constraints that include object attributes like size and inter-object relations such as relative positioning. Existing works, which typically represent objects as single nodes…
▽ More
This paper addresses the challenge of object-centric layout generation under spatial constraints, seen in multiple domains including floorplan design process. The design process typically involves specifying a set of spatial constraints that include object attributes like size and inter-object relations such as relative positioning. Existing works, which typically represent objects as single nodes, lack the granularity to accurately model complex interactions between objects. For instance, often only certain parts of an object, like a room's right wall, interact with adjacent objects. To address this gap, we introduce a factor graph based approach with four latent variable nodes for each room, and a factor node for each constraint. The factor nodes represent dependencies among the variables to which they are connected, effectively capturing constraints that are potentially of a higher order. We then develop message-passing on the bipartite graph, forming a factor graph neural network that is trained to produce a floorplan that aligns with the desired requirements. Our approach is simple and generates layouts faithful to the user requirements, demonstrated by a large improvement in IOU scores over existing methods. Additionally, our approach, being inferential and accurate, is well-suited to the practical human-in-the-loop design process where specifications evolve iteratively, offering a practical and powerful tool for AI-guided design.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
RIS-empowered Topology Control for Distributed Learning in Urban Air Mobility
Authors:
Kai Xiong,
Rui Wang,
Supeng Leng,
Wenyang Che,
Chongwen Huang,
Chau Yuen
Abstract:
Urban Air Mobility (UAM) expands vehicles from the ground to the near-ground space, envisioned as a revolution for transportation systems. Comprehensive scene perception is the foundation for autonomous aerial driving. However, UAM encounters the intelligent perception challenge: high perception learning requirements conflict with the limited sensors and computing chips of flying cars. To overcome…
▽ More
Urban Air Mobility (UAM) expands vehicles from the ground to the near-ground space, envisioned as a revolution for transportation systems. Comprehensive scene perception is the foundation for autonomous aerial driving. However, UAM encounters the intelligent perception challenge: high perception learning requirements conflict with the limited sensors and computing chips of flying cars. To overcome the challenge, federated learning (FL) and other collaborative learning have been proposed to enable resource-limited devices to conduct onboard deep learning (DL) collaboratively. But traditional collaborative learning like FL relies on a central integrator for DL model aggregation, which is difficult to deploy in dynamic environments. The fully decentralized learning schemes may be the intuitive solution while the convergence of distributed learning cannot be guaranteed. Accordingly, this paper explores reconfigurable intelligent surfaces (RIS) empowered distributed learning, taking account of topological attributes to facilitate the learning performance with convergence guarantee. We propose several FL topological criteria for optimizing the transmission delay and convergence rate by exploiting the Laplacian matrix eigenvalues of the communication network. Subsequently, we innovatively leverage the RIS link modification ability to remold the current network according to the proposed topological criteria. This paper rethinks the functions of RIS from the perspective of the network layer. Furthermore, a deep deterministic policy gradient-based RIS phase shift control algorithm is developed to construct or deconstruct the network links simultaneously to reshape the communication network. Simulation experiments are conducted over MobileNet-based multi-view learning to verify the efficiency of the distributed FL framework.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Causal Relationship Network of Risk Factors Impacting Workday Loss in Underground Coal Mines
Authors:
Shangsi Ren,
Cameron A. Beeche,
Zhiyi Shi,
Maria Acevedo Garcia,
Katherine Zychowski,
Shuguang Leng,
Pedram Roghanchi,
Jiantao Pu
Abstract:
This study aims to establish the causal relationship network between various factors leading to workday loss in underground coal mines using a novel causal artificial intelligence (AI) method. The analysis utilizes data obtained from the National Institute for Occupational Safety and Health (NIOSH). A total of 101,010 injury records from 3,982 unique underground coal mines spanning the years from…
▽ More
This study aims to establish the causal relationship network between various factors leading to workday loss in underground coal mines using a novel causal artificial intelligence (AI) method. The analysis utilizes data obtained from the National Institute for Occupational Safety and Health (NIOSH). A total of 101,010 injury records from 3,982 unique underground coal mines spanning the years from 1990 to 2020 were extracted from the NIOSH database. Causal relationships were analyzed and visualized using a novel causal AI method called Grouped Greedy Equivalence Search (GGES). The impact of each variable on workday loss was assessed through intervention do-calculus adjustment (IDA) scores. Model training and validation were performed using the 10-fold cross-validation technique. Performance metrics, including adjacency precision (AP), adjacency recall (AR), arrowhead precision (AHP), and arrowhead recall (AHR), were utilized to evaluate the models. Findings revealed that after 2006, key direct causes of workday loss among mining employees included total mining experience, mean office employees, mean underground employees, county, and total mining experience (years). Total mining experience emerged as the most influential factor, whereas mean employees per mine exhibited the least influence. The analyses emphasized the significant role of total mining experience in determining workday loss. The models achieved optimal performance, with AP, AR, AHP, and AHR values measuring 0.694, 0.653, 0.386, and 0.345, respectively. This study demonstrates the feasibility of utilizing the new GGES method to clarify the causal factors behind the workday loss by analyzing employment demographics and injury records and establish their causal relationship network.
△ Less
Submitted 24 January, 2024;
originally announced February 2024.
-
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Authors:
Sicong Leng,
Hang Zhang,
Guanzheng Chen,
Xin Li,
Shijian Lu,
Chunyan Miao,
Lidong Bing
Abstract:
Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitig…
▽ More
Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Authors:
Sicong Leng,
Yang Zhou,
Mohammed Haroon Dupty,
Wee Sun Lee,
Sam Conrad Joyce,
Wei Lu
Abstract:
We consider the task of generating designs directly from natural language descriptions, and consider floor plan generation as the initial research area. Language conditional generative models have recently been very successful in generating high-quality artistic images. However, designs must satisfy different constraints that are not present in generating artistic images, particularly spatial and…
▽ More
We consider the task of generating designs directly from natural language descriptions, and consider floor plan generation as the initial research area. Language conditional generative models have recently been very successful in generating high-quality artistic images. However, designs must satisfy different constraints that are not present in generating artistic images, particularly spatial and relational constraints. We make multiple contributions to initiate research on this task. First, we introduce a novel dataset, \textit{Tell2Design} (T2D), which contains more than $80k$ floor plan designs associated with natural language instructions. Second, we propose a Sequence-to-Sequence model that can serve as a strong baseline for future research. Third, we benchmark this task with several text-conditional image generation models. We conclude by conducting human evaluations on the generated samples and providing an analysis of human performance. We hope our contributions will propel the research on language-guided design generation forward.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification
Authors:
Minhao Zou,
Zhongxue Gan,
Yutong Wang,
Junheng Zhang,
Dongyan Sui,
Chun Guan,
Siyang Leng
Abstract:
Graph and hypergraph representation learning has attracted increasing attention from various research fields. Despite the decent performance and fruitful applications of Graph Neural Networks (GNNs), Hypergraph Neural Networks (HGNNs), and their well-designed variants, on some commonly used benchmark graphs and hypergraphs, they are outperformed by even a simple Multi-Layer Perceptron. This observ…
▽ More
Graph and hypergraph representation learning has attracted increasing attention from various research fields. Despite the decent performance and fruitful applications of Graph Neural Networks (GNNs), Hypergraph Neural Networks (HGNNs), and their well-designed variants, on some commonly used benchmark graphs and hypergraphs, they are outperformed by even a simple Multi-Layer Perceptron. This observation motivates a reexamination of the design paradigm of the current GNNs and HGNNs and poses challenges of extracting graph features effectively. In this work, a universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder. The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features via a normalized projection matrix. The resulting edge/hyperedge features, together with the original node features, are fed into a neural network. The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix, of the network's output, which can be further used for tasks such as node classification. The proposed architecture, in contrast to the traditional spectral-based and/or message passing approaches, simultaneously and comprehensively exploits the node features and graph/hypergraph topologies in an efficient and unified manner, covering both heterophilic and homophilic graphs. The designed projection matrix, encoding the graph features, is intuitive and interpretable. Extensive experiments are conducted and demonstrate the superior performance of the proposed framework on twelve representative hypergraph datasets and six real-world graph datasets, compared to the state-of-the-art methods. Our implementation is available online at https://github.com/MinhZou/UniG-Encoder.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
A Digital Twin Empowered Lightweight Model Sharing Scheme for Multi-Robot Systems
Authors:
Kai Xiong,
Zhihong Wang,
Supeng Leng,
Jianhua He
Abstract:
Multi-robot system for manufacturing is an Industry Internet of Things (IIoT) paradigm with significant operational cost savings and productivity improvement, where Unmanned Aerial Vehicles (UAVs) are employed to control and implement collaborative productions without human intervention. This mission-critical system relies on 3-Dimension (3-D) scene recognition to improve operation accuracy in the…
▽ More
Multi-robot system for manufacturing is an Industry Internet of Things (IIoT) paradigm with significant operational cost savings and productivity improvement, where Unmanned Aerial Vehicles (UAVs) are employed to control and implement collaborative productions without human intervention. This mission-critical system relies on 3-Dimension (3-D) scene recognition to improve operation accuracy in the production line and autonomous piloting. However, implementing 3-D point cloud learning, such as Pointnet, is challenging due to limited sensing and computing resources equipped with UAVs. Therefore, we propose a Digital Twin (DT) empowered Knowledge Distillation (KD) method to generate several lightweight learning models and select the optimal model to deploy on UAVs. With a digital replica of the UAVs preserved at the edge server, the DT system controls the model sharing network topology and learning model structure to improve recognition accuracy further. Moreover, we employ network calculus to formulate and solve the model sharing configuration problem toward minimal resource consumption, as well as convergence. Simulation experiments are conducted over a popular point cloud dataset to evaluate the proposed scheme. Experiment results show that the proposed model sharing scheme outperforms the individual model in terms of computing resource consumption and recognition accuracy.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Embedding Theory of Reservoir Computing and Reducing Reservoir Network Using Time Delays
Authors:
Xing-Yue Duan,
Xiong Ying,
Si-Yang Leng,
Jürgen Kurths,
Wei Lin,
Huan-Fei Ma
Abstract:
Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with…
▽ More
Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with the generalized embedding theory, we rigorously prove that RC is essentially a high dimensional embedding of the original input nonlinear dynamical system. Thus, using this embedding property, we unify into a universal framework the standard RC and the time-delayed RC where we novelly introduce time delays only into the network's output layer, and we further find a trade-off relation between the time delays and the number of neurons in RC. Based on this finding, we significantly reduce the network size of RC for reconstructing and predicting some representative physical systems, and, more surprisingly, only using a single neuron reservoir with time delays is sometimes sufficient for achieving those tasks.
△ Less
Submitted 8 May, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Authors:
Guoshun Nan,
Guoqing Luo,
Sicong Leng,
Yao Xiao,
Wei Lu
Abstract:
Dialogue-based relation extraction (DiaRE) aims to detect the structural information from unstructured utterances in dialogues. Existing relation extraction models may be unsatisfactory under such a conversational setting, due to the entangled logic and information sparsity issues in utterances involving multiple speakers. To this end, we introduce SOLS, a novel model which can explicitly induce s…
▽ More
Dialogue-based relation extraction (DiaRE) aims to detect the structural information from unstructured utterances in dialogues. Existing relation extraction models may be unsatisfactory under such a conversational setting, due to the entangled logic and information sparsity issues in utterances involving multiple speakers. To this end, we introduce SOLS, a novel model which can explicitly induce speaker-oriented latent structures for better DiaRE. Specifically, we learn latent structures to capture the relationships among tokens beyond the utterance boundaries, alleviating the entangled logic issue. During the learning process, our speaker-specific regularization method progressively highlights speaker-related key clues and erases the irrelevant ones, alleviating the information sparsity issue. Experiments on three public datasets demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 30 October, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Intelligent Sensing Scheduling for Mobile Target Tracking Wireless Sensor Networks
Authors:
Longyu Zhou,
Supeng Leng,
Qiang Liu,
Haoye Chai,
Jihua Zhou
Abstract:
Edge computing has emerged as a prospective paradigm to meet ever-increasing computation demands in Mobile Target Tracking Wireless Sensor Networks (MTT-WSN). This paradigm can offload time-sensitive tasks to sink nodes to improve computing efficiency. Nevertheless, it is difficult to execute dynamic and critical tasks in the MTT-WSN network. Besides, the network cannot ensure consecutive tracking…
▽ More
Edge computing has emerged as a prospective paradigm to meet ever-increasing computation demands in Mobile Target Tracking Wireless Sensor Networks (MTT-WSN). This paradigm can offload time-sensitive tasks to sink nodes to improve computing efficiency. Nevertheless, it is difficult to execute dynamic and critical tasks in the MTT-WSN network. Besides, the network cannot ensure consecutive tracking due to the limited energy. To address the problems, this paper proposes a new hierarchical target tracking structure based on Edge Intelligence (EI) technology. The structure integrates the computing resource of both mobile nodes and edge servers to provide efficient computation capability for real-time target tracking. Based on the proposed structure, we formulate an energy optimization model with the constrains of system execution latency and trajectory prediction accuracy. Moreover, we propose a long-term dynamic resource allocation algorithm to obtain the optimal resource allocation solution for the ac- curate and consecutive tracking. Simulation results demonstrate that our algorithm outperforms the deep Q-learning over 14.5% in terms of system energy consumption. It can also obtain a significant enhancement in tracking accuracy compared with the non-cooperative scheme.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Secure and Efficient Blockchain based Knowledge Sharing for Intelligent Connected Vehicles
Authors:
Haoye Chai,
Supeng Leng,
Fan Wu,
Jianhua He
Abstract:
The emergence of Intelligent Connected Vehicles (ICVs) shows great potential for future intelligent traffic systems, enhancing both traffic safety and road efficiency. However, the ICVs relying on data driven perception and driving models face many challenges, including the lack of comprehensive knowledge to deal with complicated driving context. In this paper, we are motivated to investigate coop…
▽ More
The emergence of Intelligent Connected Vehicles (ICVs) shows great potential for future intelligent traffic systems, enhancing both traffic safety and road efficiency. However, the ICVs relying on data driven perception and driving models face many challenges, including the lack of comprehensive knowledge to deal with complicated driving context. In this paper, we are motivated to investigate cooperative knowledge sharing for ICVs. We propose a secure and efficient directed acyclic graph (DAG) blockchain based knowledge sharing framework, aiming to cater for the micro-transaction based vehicular networks. The framework can realize both local and cross-regional knowledge sharing. Then, the framework is applied to autonomous driving applications, wherein machine learning based models for autonomous driving control can be shared. A lightweight tip selection algorithm (TSA) is proposed for the DAG based knowledge sharing framework to achieve consensus and identity verification for cross-regional vehicles. To enhance model accuracy as well as minimizing bandwidth consumption, an adaptive asynchronous distributed learning (ADL) based scheme is proposed for model uploading and downloading. Experiment results show that the blockchain based knowledge sharing is secure, and it can resist attacks from malicious users. In addition, the proposed adaptive ADL scheme can enhance driving safety related performance compared to several existing algorithms.
△ Less
Submitted 2 November, 2021; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Interventional Video Grounding with Dual Contrastive Learning
Authors:
Guoshun Nan,
Rui Qiao,
Yao Xiao,
Jun Liu,
Sicong Leng,
Hao Zhang,
Wei Lu
Abstract:
Video grounding aims to localize a moment from an untrimmed video for a given textual query. Existing approaches focus more on the alignment of visual and language stimuli with various likelihood-based matching or regression strategies, i.e., P(Y|X). Consequently, these models may suffer from spurious correlations between the language and video features due to the selection bias of the dataset. 1)…
▽ More
Video grounding aims to localize a moment from an untrimmed video for a given textual query. Existing approaches focus more on the alignment of visual and language stimuli with various likelihood-based matching or regression strategies, i.e., P(Y|X). Consequently, these models may suffer from spurious correlations between the language and video features due to the selection bias of the dataset. 1) To uncover the causality behind the model and data, we first propose a novel paradigm from the perspective of the causal inference, i.e., interventional video grounding (IVG) that leverages backdoor adjustment to deconfound the selection bias based on structured causal model (SCM) and do-calculus P(Y|do(X)). Then, we present a simple yet effective method to approximate the unobserved confounder as it cannot be directly sampled from the dataset. 2) Meanwhile, we introduce a dual contrastive learning approach (DCL) to better align the text and video by maximizing the mutual information (MI) between query and video clips, and the MI between start/end frames of a target moment and the others within a video to learn more informative visual representations. Experiments on three standard benchmarks show the effectiveness of our approaches. Our code is available on GitHub: https://github.com/nanguoshun/IVG.
△ Less
Submitted 7 July, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Connecting AI Learning and Blockchain Mining in 6G Systems
Authors:
Yunkai Wei,
Zixian An,
Supeng Leng,
Kun Yang
Abstract:
The sixth generation (6G) systems are generally recognized to be established on ubiquitous Artificial Intelligence (AI) and distributed ledger such as blockchain. However, the AI training demands tremendous computing resource, which is limited in most 6G devices. Meanwhile, miners in Proof-of-Work (PoW) based blockchains devote massive computing power to block mining, and are widely criticized for…
▽ More
The sixth generation (6G) systems are generally recognized to be established on ubiquitous Artificial Intelligence (AI) and distributed ledger such as blockchain. However, the AI training demands tremendous computing resource, which is limited in most 6G devices. Meanwhile, miners in Proof-of-Work (PoW) based blockchains devote massive computing power to block mining, and are widely criticized for the waste of computation. To address this dilemma, we propose an Evolved-Proof-of-Work (E-PoW) consensus that can integrate the matrix computations, which are widely existed in AI training, into the process of brute-force searches in the block mining. Consequently, E-PoW can connect AI learning and block mining via the multiply used common computing resource. Experimental results show that E-PoW can salvage by up to 80 percent computing power from pure block mining for parallel AI training in 6G systems.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Deep Learning Based Intelligent Inter-Vehicle Distance Control for 6G Enabled Cooperative Autonomous Driving
Authors:
Xiaosha Chen,
Supeng Leng,
Jianhua He,
Longyu Zhou
Abstract:
Research on the sixth generation cellular networks (6G) is gaining huge momentum to achieve ubiquitous wireless connectivity. Connected autonomous driving (CAV) is a critical vertical envisioned for 6G, holding great potentials of improving road safety, road and energy efficiency. However the stringent service requirements of CAV applications on reliability, latency and high speed communications w…
▽ More
Research on the sixth generation cellular networks (6G) is gaining huge momentum to achieve ubiquitous wireless connectivity. Connected autonomous driving (CAV) is a critical vertical envisioned for 6G, holding great potentials of improving road safety, road and energy efficiency. However the stringent service requirements of CAV applications on reliability, latency and high speed communications will present big challenges to 6G networks. New channel access algorithms and intelligent control schemes for connected vehicles are needed for 6G supported CAV. In this paper, we investigated 6G supported cooperative driving, which is an advanced driving mode through information sharing and driving coordination. Firstly we quantify the delay upper bounds of 6G vehicle to vehicle (V2V) communications with hybrid communication and channel access technologies. A deep learning neural network is developed and trained for fast computation of the delay bounds in real time operations. Then, an intelligent strategy is designed to control the inter-vehicle distance for cooperative autonomous driving. Furthermore, we propose a Markov Chain based algorithm to predict the parameters of the system states, and also a safe distance map** method to enable smooth vehicular speed changes. The proposed algorithms are implemented in the AirSim autonomous driving platform. Simulation results show that the proposed algorithms are effective and robust with safe and stable cooperative autonomous driving, which greatly improve the road safety, capacity and efficiency.
△ Less
Submitted 26 December, 2020;
originally announced December 2020.
-
Edge Intelligence for Autonomous Driving in 6G Wireless System: Design Challenges and Solutions
Authors:
Bo Yang,
Xuelin Cao,
Kai Xiong,
Chau Yuen,
Yong Liang Guan,
Supeng Leng,
Lijun Qian,
Zhu Han
Abstract:
In a level-5 autonomous driving system, the autonomous driving vehicles (AVs) are expected to sense the surroundings via analyzing a large amount of data captured by a variety of onboard sensors in near-real-time. As a result, enormous computing costs will be introduced to the AVs for processing the tasks with the deployed machine learning (ML) model, while the inference accuracy may not be guaran…
▽ More
In a level-5 autonomous driving system, the autonomous driving vehicles (AVs) are expected to sense the surroundings via analyzing a large amount of data captured by a variety of onboard sensors in near-real-time. As a result, enormous computing costs will be introduced to the AVs for processing the tasks with the deployed machine learning (ML) model, while the inference accuracy may not be guaranteed. In this context, the advent of edge intelligence (EI) and sixth-generation (6G) wireless networking are expected to pave the way to more reliable and safer autonomous driving by providing multi-access edge computing (MEC) together with ML to AVs in close proximity. To realize this goal, we propose a two-tier EI-empowered autonomous driving framework. In the autonomous-vehicles tier, the autonomous vehicles are deployed with the shallow layers by splitting the trained deep neural network model. In the edge-intelligence tier, an edge server is implemented with the remaining layers (also deep layers) and an appropriately trained multi-task learning (MTL) model. In particular, obtaining the optimal offloading strategy (including the binary offloading decision and the computational resources allocation) can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is solved via MTL in near-real-time with high accuracy. On another note, an edge-vehicle joint inference is proposed through neural network segmentation to achieve efficient online inference with data privacy-preserving and less communication delay. Experiments demonstrate the effectiveness of the proposed framework, and open research topics are finally listed.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
Communication and Computing Resource Optimization for Connected Autonomous Driving
Authors:
Kai Xiong,
Supeng Leng,
Xiaosha Chen,
Chongwen Huang,
Chau Yuen,
Yong Liang Guan
Abstract:
Transportation system is facing a sharp disruption since the Connected Autonomous Vehicles (CAVs) can free people from driving and provide good driving experience with the aid of Vehicle-to-Vehicle (V2V) communications. Although CAVs bring benefits in terms of driving safety, vehicle string stability, and road traffic throughput, most existing work aims at improving only one of these performance m…
▽ More
Transportation system is facing a sharp disruption since the Connected Autonomous Vehicles (CAVs) can free people from driving and provide good driving experience with the aid of Vehicle-to-Vehicle (V2V) communications. Although CAVs bring benefits in terms of driving safety, vehicle string stability, and road traffic throughput, most existing work aims at improving only one of these performance metrics. However, these metrics may be mutually competitive, as they share the same communication and computing resource in a road segment. From the perspective of joint optimizing driving safety, vehicle string stability, and road traffic throughput, there is a big research gap to be filled on the resource management for connected autonomous driving. In this paper, we first explore the joint optimization on driving safety, vehicle string stability, and road traffic throughput by leveraging on the consensus Alternating Directions Method of Multipliers algorithm (ADMM). However, the limited communication bandwidth and on-board processing capacity incur the resource competition in CAVs. We next analyze the multiple tasks competition in the contention based medium access to attain the upper bound delay of V2V-related application offloading. An efficient slee** multi-armed bandit tree-based algorithm is proposed to address the resource assignment problem. A series of simulation experiments are carried out to validate the performance of the proposed algorithms.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Intelligent Task Offloading for Heterogeneous V2X Communications
Authors:
Kai Xiong,
Supeng Leng,
Chongwen Huang,
Chau Yuen,
Liang Guan
Abstract:
With the rapid development of autonomous driving technologies, it becomes difficult to reconcile the conflict between ever-increasing demands for high process rate in the intelligent automotive tasks and resource-constrained on-board processors. Fortunately, vehicular edge computing (VEC) has been proposed to meet the pressing resource demands. Due to the delay-sensitive traits of automotive tasks…
▽ More
With the rapid development of autonomous driving technologies, it becomes difficult to reconcile the conflict between ever-increasing demands for high process rate in the intelligent automotive tasks and resource-constrained on-board processors. Fortunately, vehicular edge computing (VEC) has been proposed to meet the pressing resource demands. Due to the delay-sensitive traits of automotive tasks, only a heterogeneous vehicular network with multiple access technologies may be able to handle these demanding challenges. In this paper, we propose an intelligent task offloading framework in heterogeneous vehicular networks with three Vehicle-to-Everything (V2X) communication technologies, namely Dedicated Short Range Communication (DSRC), cellular-based V2X (C-V2X) communication, and millimeter wave (mmWave) communication. Based on stochastic network calculus, this paper firstly derives the delay upper bound of different offloading technologies with a certain failure probability. Moreover, we propose a federated Q-learning method that optimally utilizes the available resources to minimize the communication/computing budgets and the offloading failure probabilities. Simulation results indicate that our proposed algorithm can significantly outperform the existing algorithms in terms of offloading failure probability and resource cost.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Energy-aware Traffic Engineering in Hybrid SDN/IP Backbone Networks
Authors:
Yunkai Wei,
Xiaoning Zhang,
Lei Xie,
Supeng Leng
Abstract:
Software Defined Networking (SDN) can effectively improve the performance of traffic engineering and has promising application foreground in backbone networks. Therefore, new energy saving schemes must take SDN into account, which is extremely important considering the rapidly increasing energy consumption from Telecom and ISP networks. At the same time, the introduction of SDN in a current networ…
▽ More
Software Defined Networking (SDN) can effectively improve the performance of traffic engineering and has promising application foreground in backbone networks. Therefore, new energy saving schemes must take SDN into account, which is extremely important considering the rapidly increasing energy consumption from Telecom and ISP networks. At the same time, the introduction of SDN in a current network must be incremental in most cases, for both technical and economic reasons. During this period, operators have to manage hybrid networks, where SDN and traditional protocols coexist. In this paper, we study the energy efficient traffic engineering problem in hybrid SDN/IP networks. We first formulate the mathematic optimization model considering SDN/IP hybrid routing mode. As the problem is NP-hard, we propose the fast heuristic algorithm named HEATE (Hybrid Energy-Aware Traffic Engineering). In our proposed HEATE algorithm, the IP routers perform the shortest path routing using the distribute OSPF link weight optimization. The SDNs perform the multi-path routing with traffic flow splitting by the global SDN controller. The HEATE algorithm finds the optimal setting of OSPF link weight and splitting ratio of SDNs. Thus traffic flow is aggregated onto partial links and the underutilized links can be turned off to save energy. By computer simulation results, we show that our algorithm has a significant improvement in energy efficiency in hybrid SDN/IP networks.
△ Less
Submitted 12 May, 2016;
originally announced May 2016.
-
Multi-Objective Resource Allocation in Full-Duplex SWIPT Systems
Authors:
Shiyang Leng,
Derrick Wing Kwan Ng,
Nikola Zlatanov,
Robert Schober
Abstract:
In this paper, we investigate the resource allocation algorithm design for full-duplex simultaneous wireless information and power transfer (FD-SWIPT) systems. The considered system comprises a FD radio base station, multiple single-antenna half-duplex (HD) users, and multiple energy harvesters equipped with multiple antennas. We propose a multi-objective optimization framework to study the trade-…
▽ More
In this paper, we investigate the resource allocation algorithm design for full-duplex simultaneous wireless information and power transfer (FD-SWIPT) systems. The considered system comprises a FD radio base station, multiple single-antenna half-duplex (HD) users, and multiple energy harvesters equipped with multiple antennas. We propose a multi-objective optimization framework to study the trade-off between uplink transmit power minimization, downlink transmit power minimization, and total harvested energy maximization. The considered optimization framework takes into account heterogeneous quality of service requirements for uplink and downlink communication and wireless power transfer. The non-convex multi-objective optimization problem is transformed into an equivalent rank-constrained semidefinite program (SDP) and solved optimally by SDP relaxation. The solution of the proposed framework results in a set of Pareto optimal resource allocation policies. Numerical results unveil an interesting trade-off between the considered conflicting system design objectives and reveal the improved power efficiency facilitated by FD in SWIPT systems compared to traditional HD systems.
△ Less
Submitted 31 January, 2016; v1 submitted 6 October, 2015;
originally announced October 2015.
-
Multi-Objective Beamforming for Energy-Efficient SWIPT Systems
Authors:
Shiyang Leng,
Derrick Wing Kwan Ng,
Nikola Zlatanov,
Robert Schober
Abstract:
In this paper, we study the resource allocation algorithm design for energy-efficient simultaneous wireless information and power transfer (SWIPT) systems. The considered system comprises a transmitter, an information receiver, and multiple energy harvesting receivers equipped with multiple antennas. We propose a multi-objective optimization framework to study the trade-off between the maximizatio…
▽ More
In this paper, we study the resource allocation algorithm design for energy-efficient simultaneous wireless information and power transfer (SWIPT) systems. The considered system comprises a transmitter, an information receiver, and multiple energy harvesting receivers equipped with multiple antennas. We propose a multi-objective optimization framework to study the trade-off between the maximization of the energy efficiency of information transmission and the maximization of wireless power transfer efficiency. The proposed problem formulation takes into account the per antenna circuit power consumption of the transmitter and the imperfect channel state information of the energy harvesting receivers. The adopted non-convex multi-objective optimization problem is transformed into an equivalent rank-constrained semidefinite program (SDP) and optimally solved by SDP relaxation. Numerical results unveil an interesting trade-off between the considered conflicting system design objectives and reveal the benefits of multiple transmit antennas for improving system energy efficiency.
△ Less
Submitted 19 September, 2015;
originally announced September 2015.
-
Multi-Objective Power Allocation for Energy Efficient Wireless Information and Power Transfer Systems
Authors:
Shiyang Leng
Abstract:
Simultaneous wireless information and power transfer (SWIPT) provides a promising solution for enabling perpetual wireless networks. As energy efficiency (EE) is an im- portant evaluation of system performance, this thesis studies energy-efficient resource allocation algorithm designs in SWIPT systems. We first investigate the trade-off between the EE for information transmission, the EE for power…
▽ More
Simultaneous wireless information and power transfer (SWIPT) provides a promising solution for enabling perpetual wireless networks. As energy efficiency (EE) is an im- portant evaluation of system performance, this thesis studies energy-efficient resource allocation algorithm designs in SWIPT systems. We first investigate the trade-off between the EE for information transmission, the EE for power transfer, and the total transmit power in a basic SWIPT system with separated receivers. A multi-objective optimization problem is formulated under the constraint of maximum transmit power. We propose an algorithm which achieves flexible resource allocation for energy efficiencies maxi- mization and transmit power minimization. The trade-off region of the system design objectives is shown in simulation results. Further, we consider secure communication in a SWIPT system with power splitting receivers. Artificial noise is injected to the com- munication channel to combat the eavesdrop** capability of potential eavesdroppers. A power-efficient resource allocation algorithm is developed when multiple legitimate information receivers and multi-antenna potential eavesdroppers co-exist in the system. Simulation results demonstrate a significant performance gain by the proposed optimal algorithm compared to suboptimal baseline schemes.
△ Less
Submitted 9 April, 2015;
originally announced April 2015.
-
Power Efficient and Secure Multiuser Communication Systems with Wireless Information and Power Transfer
Authors:
Shiyang Leng,
Derrick Wing Kwan Ng,
Robert Schober
Abstract:
In this paper, we study resource allocation algorithm design for power efficient secure communication with simultaneous wireless information and power transfer (WIPT) in multiuser communication systems. In particular, we focus on power splitting receivers which are able to harvest energy and decode information from the received signals. The considered problem is modeled as an optimization problem…
▽ More
In this paper, we study resource allocation algorithm design for power efficient secure communication with simultaneous wireless information and power transfer (WIPT) in multiuser communication systems. In particular, we focus on power splitting receivers which are able to harvest energy and decode information from the received signals. The considered problem is modeled as an optimization problem which takes into account a minimum required signal-to-interference-plus-noise ratio (SINR) at multiple desired receivers, a maximum tolerable data rate at multiple multi-antenna potential eavesdroppers, and a minimum required power delivered to the receivers. The proposed problem formulation facilitates the dual use of artificial noise in providing efficient energy transfer and guaranteeing secure communication. We aim at minimizing the total transmit power by jointly optimizing transmit beamforming vectors, power splitting ratios at the desired receivers, and the covariance of the artificial noise. The resulting non-convex optimization problem is transformed into a semidefinite programming (SDP) and solved by SDP relaxation. We show that the adopted SDP relaxation is tight and achieves the global optimum of the original problem. Simulation results illustrate the significant power saving obtained by the proposed optimal algorithm compared to suboptimal baseline schemes.
△ Less
Submitted 24 February, 2014;
originally announced February 2014.