-
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Authors:
Haoqiu Yan,
Yongxin Zhu,
Kai Zheng,
Bing Liu,
Haoyu Cao,
Deqiang Jiang,
Linli Xu
Abstract:
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions…
▽ More
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Channel Estimation for AFDM With Superimposed Pilots
Authors:
Kai Zheng,
Miaowen Wen,
Tianqi Mao,
Lixia Xiao,
Zhaocheng Wang
Abstract:
The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve…
▽ More
The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve the SE, we propose a novel AFDM channel estimation scheme by introducing the superimposed pilots in the DAFT domain. An effective pilot placement method that minimizes the channel estimation error is also developed with a rigorous proof. To mitigate the pilot-data interference, we further propose an iterative channel estimator and signal detector. Simulation results demonstrate that both channel estimation and data detection performances can be improved by the proposed scheme as the number of superimposed pilots increases.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection
Authors:
Cheng Feng,
Kedi Zheng,
Lanqing Shan,
Hani Alers,
Lampros Stergioulas,
Hongye Guo,
Qixin Chen
Abstract:
Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re…
▽ More
Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can result in serious asynchrony in peer-to-peer trading, potentially impacting the effectiveness of negotiations and hindering convergence before the market closes. This paper introduces a connection-aware P2P trading algorithm designed for extensive prosumer trading. The algorithm facilitates asynchronous trading while respecting prosumer's autonomy in trading peer selection, an often overlooked aspect in traditional models. In addition, to optimize the use of limited connection opportunities, a smart trading peer connection selection strategy is developed to guide consumers to communicate strategically to accelerate convergence. A theoretical convergence guarantee is provided for the connection-aware P2P trading algorithm, which further details how smart selection strategies enhance convergence efficiency. Numerical studies are carried out to validate the effectiveness of the connection-aware algorithm and the performance of smart selection strategies in reducing the overall convergence time.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Innovation-triggered Learning for Data-driven Predictive Control: Deterministic and Stochastic Formulations
Authors:
Kaikai Zheng,
Dawei Shi,
Sandra Hirche,
Yang Shi
Abstract:
Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first-principle. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. {To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven…
▽ More
Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first-principle. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. {To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven controller design approach with guaranteed stability. Specifically, we consider a linear time-invariant system with unknown dynamics subject to deterministic/stochastic disturbances, respectively. Two kinds of data selection mechanisms are proposed by online evaluating the innovation contained in the sampled data, wherein the innovation is quantified by its effect of shrinking the set of potential system dynamics that are compatible with the sampled data. Next, after introducing a stability criterion using the set-valued estimation of system dynamics, a robust data-driven predictive controller is designed by minimizing a worst-case cost function.} The closed-loop stability of the data-driven predictive controller equipped with the innovation-triggered learning protocol is proved with a high probability framework. Finally, numerical experiments are performed to verify the validity of the proposed approaches, and the characteristics and the selection principle of the learning hyper-parameter are also discussed.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Authors:
Zehua Chen,
Guande He,
Kaiwen Zheng,
Xu Tan,
Jun Zhu
Abstract:
In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gau…
▽ More
In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gaussian prior in established diffusion-based TTS methods with a clean and deterministic one, which provides strong structural information of the target. Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process. Moreover, the tractability and flexibility of our formulation allow us to empirically study the design spaces such as noise schedules, as well as to develop stochastic and deterministic samplers. Experimental results on the LJ-Speech dataset illustrate the effectiveness of our method in terms of both synthesis quality and sampling efficiency, significantly outperforming our diffusion counterpart Grad-TTS in 50-step/1000-step synthesis and strong fast TTS models in few-step scenarios. Project page: https://bridge-tts.github.io/
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control
Authors:
Tong Liu,
Lei Lei,
Kan Zheng,
Xuemin,
Shen
Abstract:
An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-pa…
▽ More
An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part II: Control-Aware Radio Resource Allocation
Authors:
Lei Lei,
Tong Liu,
Kan Zheng,
Xuemin,
Shen
Abstract:
In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aw…
▽ More
In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward sha** and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Goal-Oriented Wireless Communication Resource Allocation for Cyber-Physical Systems
Authors:
Cheng Feng,
Kedi Zheng,
Yi Wang,
Kaibin Huang,
Qixin Chen
Abstract:
The proliferation of novel industrial applications at the wireless edge, such as smart grids and vehicle networks, demands the advancement of cyber-physical systems. The performance of CPSs is closely linked to the last-mile wireless communication networks, which often become bottlenecks due to their inherent limited resources. Current CPS operations often treat wireless communication networks as…
▽ More
The proliferation of novel industrial applications at the wireless edge, such as smart grids and vehicle networks, demands the advancement of cyber-physical systems. The performance of CPSs is closely linked to the last-mile wireless communication networks, which often become bottlenecks due to their inherent limited resources. Current CPS operations often treat wireless communication networks as unpredictable and uncontrollable variables, ignoring the potential adaptability of wireless networks, which results in inefficient and overly conservative CPS operations. Meanwhile, current wireless communications often focus more on throughput and other transmission-related metrics instead of CPS goals. In this study, we introduce the framework of goal-oriented wireless communication resource allocations, accounting for the semantics and significance of data for CPS operation goals. This guarantees optimal CPS performance from a cybernetic standpoint. We formulate a bandwidth allocation problem aimed at maximizing the information utility gain of transmitted data brought to CPS operation goals. Since the goal-oriented bandwidth allocation problem is a large-scale combinational problem, we propose a divide-and-conquer and greedy solution algorithm. The information utility gain is first approximately decomposed into marginal utility information gains and computed in a parallel manner. Subsequently, the bandwidth allocation problem is reformulated as a knapsack problem, which can be further solved greedily with a guaranteed sub-optimality gap. We further demonstrate how our proposed goal-oriented bandwidth allocation algorithm can be applied in four potential CPS applications, including data-driven decision-making, edge learning, federated learning, and distributed optimization.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
BASS: Block-wise Adaptation for Speech Summarization
Authors:
Roshan Sharma,
Kenneth Zheng,
Siddhant Arora,
Shinji Watanabe,
Rita Singh,
Bhiksha Raj
Abstract:
End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the i…
▽ More
End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the input frames at a time. In this paper, we develop a method that allows one to train summarization models on very long sequences in an incremental manner. Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block based on new acoustic information. We devise and test strategies to pass semantic context across the blocks. Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Precheck Sequence Based False Base Station Detection During Handover: A Physical Layer Security Scheme
Authors:
Xiangyu Li,
Kaiwen Zheng,
Sidong Guo,
Xiaoli Ma
Abstract:
False Base Station (FBS) attack has been a severe security problem for the cellular network since 2G era. During handover, the user equipment (UE) periodically receives state information from surrounding base stations (BSs) and uploads it to the source BS. The source BS compares the uploaded signal power and shifts UE to another BS that can provide the strongest signal. An FBS can transmit signal…
▽ More
False Base Station (FBS) attack has been a severe security problem for the cellular network since 2G era. During handover, the user equipment (UE) periodically receives state information from surrounding base stations (BSs) and uploads it to the source BS. The source BS compares the uploaded signal power and shifts UE to another BS that can provide the strongest signal. An FBS can transmit signal with the proper power and attract UE to connect to it. In this paper, based on the 3GPP standard, a Precheck Sequence-based Detection (PSD) Scheme is proposed to secure the transition of legal base station (LBS) for UE. This scheme first analyzes the structure of received signals in blocks and symbols. Several additional symbols are added to the current signal sequence for verification. By designing a long table of symbol sequence, every UE which needs handover will be allocated a specific sequence from this table. The simulation results show that the performance of this PSD Scheme is better than that of any existing ones, even when a specific transmit power is designed for FBS.
△ Less
Submitted 3 November, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Optimal Scheduling in IoT-Driven Smart Isolated Microgrids Based on Deep Reinforcement Learning
Authors:
Jiaju Qi,
Lei Lei,
Kan Zheng,
Simon X. Yang,
Xuemin,
Shen
Abstract:
In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can gene…
▽ More
In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Vision-Assisted mmWave Beam Management for Next-Generation Wireless Systems: Concepts, Solutions and Open Challenges
Authors:
Kan Zheng,
Haojun Yang,
Ziqiang Ying,
Pengshuo Wang,
Lajos Hanzo
Abstract:
Beamforming techniques have been widely used in the millimeter wave (mmWave) bands to mitigate the path loss of mmWave radio links as the narrow straight beams by directionally concentrating the signal energy. However, traditional mmWave beam management algorithms usually require excessive channel state information overhead, leading to extremely high computational and communication costs. This hin…
▽ More
Beamforming techniques have been widely used in the millimeter wave (mmWave) bands to mitigate the path loss of mmWave radio links as the narrow straight beams by directionally concentrating the signal energy. However, traditional mmWave beam management algorithms usually require excessive channel state information overhead, leading to extremely high computational and communication costs. This hinders the widespread deployment of mmWave communications. By contrast, the revolutionary vision-assisted beam management system concept employed at base stations (BSs) can select the optimal beam for the target user equipment (UE) based on its location information determined by machine learning (ML) algorithms applied to visual data, without requiring channel information. In this paper, we present a comprehensive framework for a vision-assisted mmWave beam management system, its typical deployment scenarios as well as the specifics of the framework. Then, some of the challenges faced by this system and their efficient solutions are discussed from the perspective of ML. Next, a new simulation platform is conceived to provide both visual and wireless data for model validation and performance evaluation. Our simulation results indicate that the vision-assisted beam management is indeed attractive for next-generation wireless systems.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
From ORAN to Cell-Free RAN: Architecture, Performance Analysis, Testbeds and Trials
Authors:
Yang Cao,
Ziyang Zhang,
Xinjiang Xia,
Pengzhe Xin,
Dongjie Liu,
Kang Zheng,
Mengting Lou,
**g **,
Qixing Wang,
Dongming Wang,
Yongming Huang,
Xiaohu You,
Jiangzhou Wang
Abstract:
Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communicati…
▽ More
Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communication systems. Starting from scalable implementation of CF-mMIMO, we study a cell-free RAN (CF-RAN) under the ORAN architecture. Through theoretical analysis and numerical simulation, we investigate the uplink and downlink spectral efficiencies of CF-mMIMO with the new architecture. We then discuss the implementation issues of CF-RAN under ORAN architecture, including time-frequency synchronization and over-the-air reciprocity calibration, low layer splitting, deployment of ORAN radio units (O-RU), artificial intelligent based user associations. Finally, we present some representative experimental results for the uplink distributed reception and downlink coherent joint transmission of CF-RAN with commercial off-the-shelf O-RUs.
△ Less
Submitted 6 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Object Segmentation with Audio Context
Authors:
Kaihui Zheng,
Yuqing Ren,
Zixin Shen,
Tianxu Qin
Abstract:
Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we integrate audio features into our video segmentation model to conduct an audio-visual learning scheme. Our method is based on existing video instance segmentation…
▽ More
Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we integrate audio features into our video segmentation model to conduct an audio-visual learning scheme. Our method is based on existing video instance segmentation method which leverages rich contextual information across video frames. Since this is the first attempt to investigate the audio-visual instance segmentation, a novel dataset, including 20 vocal classes with synchronized video and audio recordings, is collected. By utilizing combined decoder to fuse both video and audio features, our model shows a slight improvements compared to the base model. Additionally, we managed to show the effectiveness of different modules by conducting extensive ablations.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese
Authors:
Song Zhang,
Ken Zheng,
Xiaoxu Zhu,
Baoxiang Li
Abstract:
Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chines…
▽ More
Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters. Firstly, we create 741 new Chinese monophonic characters from 354 source Chinese polyphonic characters by pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained Chinese BERT with 741 new Chinese monophonic characters and adding a corresponding embedding layer for new tokens, which is initialized by the embeddings of source Chinese polyphonic characters. In this way, we can turn the polyphone disambiguation task into a pre-training task of the Chinese polyphone BERT. Experimental results demonstrate the effectiveness of the proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%) improvement of average accuracy compared with the BERT-based classifier model, which is the prior state-of-the-art in polyphone disambiguation.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Electrochemical Parameter Identification for Lithium-ion Battery Sources in Self-Sustained Transportation Energy Systems
Authors:
Yuxuan Gu,
Jianxiao Wang,
Yuanbo Chen,
Kedi Zheng,
Zhongwei Deng,
Qixin Chen
Abstract:
Lithium-ion battery (LIB) sources have played an essential role in self-sustained transportation energy systems and have been widely deployed in the last few years. To realize reliable battery maintenance, identifying its electrochemical parameters is necessary. However, the battery model contains many parameters while the measurable states are only the current and voltage, inducing the identifica…
▽ More
Lithium-ion battery (LIB) sources have played an essential role in self-sustained transportation energy systems and have been widely deployed in the last few years. To realize reliable battery maintenance, identifying its electrochemical parameters is necessary. However, the battery model contains many parameters while the measurable states are only the current and voltage, inducing the identification inherently an ill-conditioned problem. A parameter identification approach is proposed, including the experiment, model, and algorithm. Electrochemical parameters are first grouped manually based on the physical properties and assigned to two sequenced tests for identification. The two tests named the quasi-static test and the dynamic test, are compressed on time for practical implementation. Proper optimization models and a sensitivity-oriented stepwise (SSO) optimization algorithm are developed to search for the optimal parameters efficiently. Typically, the Sobol method is applied to conduct the sensitivity analysis. Based on the sensitivity indexes, the SSO algorithm can decouple the mixed impacts of different parameters during the identification. For validation, numerical experiments on a typical NCM811 battery at different life stages are conducted. The proposed approach saves about half the time finding the proper parameter value. The identification accuracy of crucial parameters related to battery degradation can exceed 95\%. Case study results indicate that the identified parameters can not only improve the accuracy of the battery model but also be used as the indicator of the battery SOH.
△ Less
Submitted 13 March, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Autonomous Platoon Control with Integrated Deep Reinforcement Learning and Dynamic Programming
Authors:
Tong Liu,
Lei Lei,
Kan Zheng,
Kuan Zhang
Abstract:
Deep Reinforcement Learning (DRL) is regarded as a potential method for car-following control and has been mostly studied to support a single following vehicle. However, it is more challenging to learn a stable and efficient car-following policy when there are multiple following vehicles in a platoon, especially with unpredictable leading vehicle behavior. In this context, we adopt an integrated D…
▽ More
Deep Reinforcement Learning (DRL) is regarded as a potential method for car-following control and has been mostly studied to support a single following vehicle. However, it is more challenging to learn a stable and efficient car-following policy when there are multiple following vehicles in a platoon, especially with unpredictable leading vehicle behavior. In this context, we adopt an integrated DRL and Dynamic Programming (DP) approach to learn autonomous platoon control policies, which embeds the Deep Deterministic Policy Gradient (DDPG) algorithm into a finite-horizon value iteration framework. Although the DP framework can improve the stability and performance of DDPG, it has the limitations of lower sampling and training efficiency. In this paper, we propose an algorithm, namely Finite-Horizon-DDPG with Swee** through reduced state space using Stationary approximation (FH-DDPG-SS), which uses three key ideas to overcome the above limitations, i.e., transferring network weights backward in time, stationary policy approximation for earlier time steps, and swee** through reduced state space. In order to verify the effectiveness of FH-DDPG-SS, simulation using real driving data is performed, where the performance of FH-DDPG-SS is compared with those of the benchmark algorithms. Finally, platoon safety and string stability for FH-DDPG-SS are demonstrated.
△ Less
Submitted 17 November, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck
Authors:
Kaizhong Zheng,
Shujian Yu,
Baojuan Li,
Robert Jenssen,
Badong Chen
Abstract:
Develo** a new diagnostic models based on the underlying biological mechanisms rather than subjective symptoms for psychiatric disorders is an emerging consensus. Recently, machine learning-based classifiers using functional connectivity (FC) for psychiatric disorders and healthy controls are developed to identify brain markers. However, existing machine learningbased diagnostic models are prone…
▽ More
Develo** a new diagnostic models based on the underlying biological mechanisms rather than subjective symptoms for psychiatric disorders is an emerging consensus. Recently, machine learning-based classifiers using functional connectivity (FC) for psychiatric disorders and healthy controls are developed to identify brain markers. However, existing machine learningbased diagnostic models are prone to over-fitting (due to insufficient training samples) and perform poorly in new test environment. Furthermore, it is difficult to obtain explainable and reliable brain biomarkers elucidating the underlying diagnostic decisions. These issues hinder their possible clinical applications. In this work, we propose BrainIB, a new graph neural network (GNN) framework to analyze functional magnetic resonance images (fMRI), by leveraging the famed Information Bottleneck (IB) principle. BrainIB is able to identify the most informative edges in the brain (i.e., subgraph) and generalizes well to unseen data. We evaluate the performance of BrainIB against 8 popular brain network classification methods on two multi-site, largescale datasets and observe that our BrainIB always achieves the highest diagnosis accuracy. It also discovers the subgraph biomarkers which are consistent to clinical and neuroimaging findings.
△ Less
Submitted 31 May, 2023; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review
Authors:
Jiaxin Li,
Danfeng Hong,
Lianru Gao,
**g Yao,
Ke Zheng,
Bing Zhang,
Jocelyn Chanussot
Abstract:
With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendou…
▽ More
With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Event-triggered Observability: A Set-membership Perspective
Authors:
Kaikai Zheng,
Dawei Shi,
Tongwen Chen
Abstract:
This work attempts to discuss the observability of linear time-invariant systems with event-triggered measurements. A new notion of observability, namely, $ε$-observability is defined with parameter $ε$, which relates to the worst-case performance of inferring the initial state based on not only the received measurement but also the implicit information in the event-triggering conditions at no-eve…
▽ More
This work attempts to discuss the observability of linear time-invariant systems with event-triggered measurements. A new notion of observability, namely, $ε$-observability is defined with parameter $ε$, which relates to the worst-case performance of inferring the initial state based on not only the received measurement but also the implicit information in the event-triggering conditions at no-event instants. A criterion is developed to test the proposed $ε$-observability of discrete-time linear systems, based on which an iterative event-triggered set-membership observer is designed to evaluate a set containing all possible values of the state. The proposed set-membership observer is designed as the outer approximation of the ellipsoids predicted based on previous state estimates and the ellipsoids inferred by fusing the received measurement and communication conditions, which is optimal in the sense of trace at each step and is proved to be asymptotically bounded. The efficiency of the proposed event-triggered set-membership state observer is verified by numerical experiments.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Deep Reinforcement Learning Aided Platoon Control Relying on V2X Information
Authors:
Lei Lei,
Tong Liu,
Kan Zheng,
Lajos Hanzo
Abstract:
The impact of Vehicle-to-Everything (V2X) communications on platoon control performance is investigated. Platoon control is essentially a sequential stochastic decision problem (SSDP), which can be solved by Deep Reinforcement Learning (DRL) to deal with both the control constraints and uncertainty in the platoon leading vehicle's behavior. In this context, the value of V2X communications for DRL-…
▽ More
The impact of Vehicle-to-Everything (V2X) communications on platoon control performance is investigated. Platoon control is essentially a sequential stochastic decision problem (SSDP), which can be solved by Deep Reinforcement Learning (DRL) to deal with both the control constraints and uncertainty in the platoon leading vehicle's behavior. In this context, the value of V2X communications for DRL-based platoon controllers is studied with an emphasis on the tradeoff between the gain of including exogenous information in the system state for reducing uncertainty and the performance erosion due to the curse-of-dimensionality. Our objective is to find the specific set of information that should be shared among the vehicles for the construction of the most appropriate state space. SSDP models are conceived for platoon control under different information topologies (IFT) by taking into account `just sufficient' information. Furthermore, theorems are established for comparing the performance of their optimal policies. In order to determine whether a piece of information should or should not be transmitted for improving the DRL-based control policy, we quantify its value by deriving the conditional KL divergence of the transition models. More meritorious information is given higher priority in transmission, since including it in the state space has a higher probability in offsetting the negative effect of having higher state dimensions. Finally, simulation results are provided to illustrate the theoretical analysis.
△ Less
Submitted 27 March, 2022;
originally announced March 2022.
-
Min-Max Latency Optimization Based on Sensed Position State Information in Internet of Vehicles
Authors:
Pengzun Gao,
Long Zhao,
Kan Zheng,
**zhi Fan
Abstract:
The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV). Consider that the road-side unit (RSU) employs the DFRC signals to sense the vehicles' position state information (PSI), and communicates with the vehicles based on PSI. The objective of this paper is to minimize the maximum communication delay among all vehicles by considering the estimation acc…
▽ More
The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV). Consider that the road-side unit (RSU) employs the DFRC signals to sense the vehicles' position state information (PSI), and communicates with the vehicles based on PSI. The objective of this paper is to minimize the maximum communication delay among all vehicles by considering the estimation accuracy constraint of the vehicles' PSI and the transmit power constraint of RSU. By leveraging convex optimization theory, two iterative power allocation algorithms are proposed with different complexities and applicable scenarios. Simulation results indicate that the proposed power allocation algorithm converges and can significantly reduce the maximum transmit delay among vehicles compared with other schemes.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Lumbar Bone Mineral Density Estimation from Chest X-ray Images: Anatomy-aware Attentive Multi-ROI Modeling
Authors:
Fakai Wang,
Kang Zheng,
Le Lu,
**g Xiao,
Min Wu,
Chang-Fu Kuo,
Shun Miao
Abstract:
Osteoporosis is a common chronic metabolic bone disease often under-diagnosed and under-treated due to the limited access to bone mineral density (BMD) examinations, e.g. via Dual-energy X-ray Absorptiometry (DXA). This paper proposes a method to predict BMD from Chest X-ray (CXR), one of the most commonly accessible and low-cost medical imaging examinations. Our method first automatically detects…
▽ More
Osteoporosis is a common chronic metabolic bone disease often under-diagnosed and under-treated due to the limited access to bone mineral density (BMD) examinations, e.g. via Dual-energy X-ray Absorptiometry (DXA). This paper proposes a method to predict BMD from Chest X-ray (CXR), one of the most commonly accessible and low-cost medical imaging examinations. Our method first automatically detects Regions of Interest (ROIs) of local CXR bone structures. Then a multi-ROI deep model with transformer encoder is developed to exploit both local and global information in the chest X-ray image for accurate BMD estimation. Our method is evaluated on 13719 CXR patient cases with ground truth BMD measured by the gold standard DXA. The model predicted BMD has a strong correlation with the ground truth (Pearson correlation coefficient 0.894 on lumbar 1). When applied in osteoporosis screening, it achieves a high classification performance (average AUC of 0.968). As the first effort of using CXR scans to predict the BMD, the proposed algorithm holds strong potential for early osteoporosis screening and public health promotion.
△ Less
Submitted 9 June, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Coherence Learning using Keypoint-based Pooling Network for Accurately Assessing Radiographic Knee Osteoarthritis
Authors:
Kang Zheng,
Yirui Wang,
Chen-I Hsieh,
Le Lu,
**g Xiao,
Chang-Fu Kuo,
Shun Miao
Abstract:
Knee osteoarthritis (OA) is a common degenerate joint disorder that affects a large population of elderly people worldwide. Accurate radiographic assessment of knee OA severity plays a critical role in chronic patient management. Current clinically-adopted knee OA grading systems are observer subjective and suffer from inter-rater disagreements. In this work, we propose a computer-aided diagnosis…
▽ More
Knee osteoarthritis (OA) is a common degenerate joint disorder that affects a large population of elderly people worldwide. Accurate radiographic assessment of knee OA severity plays a critical role in chronic patient management. Current clinically-adopted knee OA grading systems are observer subjective and suffer from inter-rater disagreements. In this work, we propose a computer-aided diagnosis approach to provide more accurate and consistent assessments of both composite and fine-grained OA grades simultaneously. A novel semi-supervised learning method is presented to exploit the underlying coherence in the composite and fine-grained OA grades by learning from unlabeled data. By representing the grade coherence using the log-probability of a pre-trained Gaussian Mixture Model, we formulate an incoherence loss to incorporate unlabeled data in training. The proposed method also describes a keypoint-based pooling network, where deep image features are pooled from the disease-targeted keypoints (extracted along the knee joint) to provide more aligned and pathologically informative feature representations, for accurate OA grade assessments. The proposed method is comprehensively evaluated on the public Osteoarthritis Initiative (OAI) data, a multi-center ten-year observational study on 4,796 subjects. Experimental results demonstrate that our method leads to significant improvements over previous strong whole image-based deep classification network baselines (like ResNet-50).
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
EmotionBox: a music-element-driven emotional music generation system using Recurrent Neural Network
Authors:
Kaitong Zheng,
Ruijie Meng,
Chengshi Zheng,
Xiaodong Li,
**qiu Sang,
Juanjuan Cai,
Jie Wang
Abstract:
With the development of deep neural networks, automatic music composition has made great progress. Although emotional music can evoke listeners' different emotions and it is important for artistic expression, only few researches have focused on generating emotional music. This paper presents EmotionBox -an music-element-driven emotional music generator that is capable of composing music given a sp…
▽ More
With the development of deep neural networks, automatic music composition has made great progress. Although emotional music can evoke listeners' different emotions and it is important for artistic expression, only few researches have focused on generating emotional music. This paper presents EmotionBox -an music-element-driven emotional music generator that is capable of composing music given a specific emotion, where this model does not require a music dataset labeled with emotions. Instead, pitch histogram and note density are extracted as features that represent mode and tempo respectively to control music emotions. The subjective listening tests show that the Emotionbox has a more competitive and balanced performance in arousing a specified emotion than the emotion-label-based method.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Noise-robust blind reverberation time estimation using noise-aware time-frequency masking
Authors:
Kaitong Zheng,
Chengshi Zheng,
**qiu Sang,
Yulong Zhang,
Xiaodong Li
Abstract:
The reverberation time is one of the most important parameters used to characterize the acoustic property of an enclosure. In real-world scenarios, it is much more convenient to estimate the reverberation time blindly from recorded speech compared to the traditional acoustic measurement techniques using professional measurement instruments. However, the recorded speech is often corrupted by noise,…
▽ More
The reverberation time is one of the most important parameters used to characterize the acoustic property of an enclosure. In real-world scenarios, it is much more convenient to estimate the reverberation time blindly from recorded speech compared to the traditional acoustic measurement techniques using professional measurement instruments. However, the recorded speech is often corrupted by noise, which has a detrimental effect on the estimation accuracy of the reverberation time. To address this issue, this paper proposes a two-stage blind reverberation time estimation method based on noise-aware time-frequency masking. This proposed method has a good ability to distinguish the reverberation tails from the noise, thus improving the estimation accuracy of reverberation time in noisy scenarios. The simulated and real-world acoustic experimental results show that the proposed method significantly outperforms other methods in challenging scenarios.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
A simplified electro-chemical lithium-ion battery model applicable for in situ monitoring and online control
Authors:
Yuxuan Gu,
Jianxiao Wang,
Yuanbo Chen,
Zhongwei Deng,
Hongye Guo,
Kedi Zheng,
Qixin Chen
Abstract:
The penetrations of lithium-ion batteries in transport, energy and communication systems are increasing rapidly. A meticulous model applicable for precise in-situ monitoring and convenient online controlling is in sought to bridge the gap between research and applications. This paper proposes a simplified electro-chemical model and its discrete-time state-space realization derived from the pseudo-…
▽ More
The penetrations of lithium-ion batteries in transport, energy and communication systems are increasing rapidly. A meticulous model applicable for precise in-situ monitoring and convenient online controlling is in sought to bridge the gap between research and applications. This paper proposes a simplified electro-chemical model and its discrete-time state-space realization derived from the pseudo-two-dimensional model. The solution-phase migration and solid-phase diffusion dynamics with varying parameters are captured and rigorous mathematical expressions of reaction rate distribution and terminal voltage are derived. A simulation framework including initializing, stabilizing and closed-loop correcting schemes with low computation cost are designed. Numeric experiments on different types of batteries in various operating scenarios are conducted for validation.
△ Less
Submitted 19 March, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Towards Robust Cross-domain Image Understanding with Unsupervised Noise Removal
Authors:
Lei Zhu,
Zhao**g Luo,
Wei Wang,
Meihui Zhang,
Gang Chen,
Kai** Zheng
Abstract:
Deep learning models usually require a large amount of labeled data to achieve satisfactory performance. In multimedia analysis, domain adaptation studies the problem of cross-domain knowledge transfer from a label rich source domain to a label scarce target domain, thus potentially alleviates the annotation requirement for deep learning models. However, we find that contemporary domain adaptation…
▽ More
Deep learning models usually require a large amount of labeled data to achieve satisfactory performance. In multimedia analysis, domain adaptation studies the problem of cross-domain knowledge transfer from a label rich source domain to a label scarce target domain, thus potentially alleviates the annotation requirement for deep learning models. However, we find that contemporary domain adaptation methods for cross-domain image understanding perform poorly when source domain is noisy. Weakly Supervised Domain Adaptation (WSDA) studies the domain adaptation problem under the scenario where source data can be noisy. Prior methods on WSDA remove noisy source data and align the marginal distribution across domains without considering the fine-grained semantic structure in the embedding space, which have the problem of class misalignment, e.g., features of cats in the target domain might be mapped near features of dogs in the source domain. In this paper, we propose a novel method, termed Noise Tolerant Domain Adaptation, for WSDA. Specifically, we adopt the cluster assumption and learn cluster discriminatively with class prototypes in the embedding space. We propose to leverage the location information of the data points in the embedding space and model the location information with a Gaussian mixture model to identify noisy source data. We then design a network which incorporates the Gaussian mixture noise model as a sub-module for unsupervised noise removal and propose a novel cluster-level adversarial adaptation method which aligns unlabeled target data with the less noisy class prototypes for map** the semantic structure across domains. We conduct extensive experiments to evaluate the effectiveness of our method on both general images and medical images from COVID-19 and e-commerce datasets. The results show that our method significantly outperforms state-of-the-art WSDA methods.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Opportunistic Screening of Osteoporosis Using Plain Film Chest X-ray
Authors:
Fakai Wang,
Kang Zheng,
Yirui Wang,
Xiaoyun Zhou,
Le Lu,
**g Xiao,
Min Wu,
Chang-Fu Kuo,
Shun Miao
Abstract:
Osteoporosis is a common chronic metabolic bone disease that is often under-diagnosed and under-treated due to the limited access to bone mineral density (BMD) examinations, Dual-energy X-ray Absorptiometry (DXA). In this paper, we propose a method to predict BMD from Chest X-ray (CXR), one of the most common, accessible, and low-cost medical image examinations. Our method first automatically dete…
▽ More
Osteoporosis is a common chronic metabolic bone disease that is often under-diagnosed and under-treated due to the limited access to bone mineral density (BMD) examinations, Dual-energy X-ray Absorptiometry (DXA). In this paper, we propose a method to predict BMD from Chest X-ray (CXR), one of the most common, accessible, and low-cost medical image examinations. Our method first automatically detects Regions of Interest (ROIs) of local and global bone structures from the CXR. Then a multi-ROI model is developed to exploit both local and global information in the chest X-ray image for accurate BMD estimation. Our method is evaluated on 329 CXR cases with ground truth BMD measured by DXA. The model predicted BMD has a strong correlation with the gold standard DXA BMD (Pearson correlation coefficient 0.840). When applied for osteoporosis screening, it achieves a high classification performance (AUC 0.936). As the first effort in the field to use CXR scans to predict the spine BMD, the proposed algorithm holds strong potential in enabling early osteoporosis screening through routine chest X-rays and contributing to the enhancement of public health.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
Semi-Supervised Learning for Bone Mineral Density Estimation in Hip X-ray Images
Authors:
Kang Zheng,
Yirui Wang,
Xiaoyun Zhou,
Fakai Wang,
Le Lu,
Chihung Lin,
Lingyun Huang,
Guotong Xie,
**g Xiao,
Chang-Fu Kuo,
Shun Miao
Abstract:
Bone mineral density (BMD) is a clinically critical indicator of osteoporosis, usually measured by dual-energy X-ray absorptiometry (DEXA). Due to the limited accessibility of DEXA machines and examinations, osteoporosis is often under-diagnosed and under-treated, leading to increased fragility fracture risks. Thus it is highly desirable to obtain BMDs with alternative cost-effective and more acce…
▽ More
Bone mineral density (BMD) is a clinically critical indicator of osteoporosis, usually measured by dual-energy X-ray absorptiometry (DEXA). Due to the limited accessibility of DEXA machines and examinations, osteoporosis is often under-diagnosed and under-treated, leading to increased fragility fracture risks. Thus it is highly desirable to obtain BMDs with alternative cost-effective and more accessible medical imaging examinations such as X-ray plain films. In this work, we formulate the BMD estimation from plain hip X-ray images as a regression problem. Specifically, we propose a new semi-supervised self-training algorithm to train the BMD regression model using images coupled with DEXA measured BMDs and unlabeled images with pseudo BMDs. Pseudo BMDs are generated and refined iteratively for unlabeled images during self-training. We also present a novel adaptive triplet loss to improve the model's regression accuracy. On an in-house dataset of 1,090 images (819 unique patients), our BMD estimation method achieves a high Pearson correlation coefficient of 0.8805 to ground-truth BMDs. It offers good feasibility to use the more accessible and cheaper X-ray imaging for opportunistic osteoporosis screening.
△ Less
Submitted 19 May, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
Discriminative Localized Sparse Representations for Breast Cancer Screening
Authors:
Sokratis Makrogiannis,
Chelsea E. Harris,
Keni Zheng
Abstract:
Breast cancer is the most common cancer among women both in developed and develo** countries. Early detection and diagnosis of breast cancer may reduce its mortality and improve the quality of life. Computer-aided detection (CADx) and computer-aided diagnosis (CAD) techniques have shown promise for reducing the burden of human expert reading and improve the accuracy and reproducibility of result…
▽ More
Breast cancer is the most common cancer among women both in developed and develo** countries. Early detection and diagnosis of breast cancer may reduce its mortality and improve the quality of life. Computer-aided detection (CADx) and computer-aided diagnosis (CAD) techniques have shown promise for reducing the burden of human expert reading and improve the accuracy and reproducibility of results. Sparse analysis techniques have produced relevant results for representing and recognizing imaging patterns. In this work we propose a method for Label Consistent Spatially Localized Ensemble Sparse Analysis (LC-SLESA). In this work we apply dictionary learning to our block based sparse analysis method to classify breast lesions as benign or malignant. The performance of our method in conjunction with LC-KSVD dictionary learning is evaluated using 10-, 20-, and 30-fold cross validation on the MIAS dataset. Our results indicate that the proposed sparse analyses may be a useful component for breast cancer screening applications.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
SOUP: Spatial-Temporal Demand Forecasting and Competitive Supply
Authors:
Bolong Zheng,
Qi Hu,
Lingfeng Ming,
Jilin Hu,
Lu Chen,
Kai Zheng,
Christian S. Jensen
Abstract:
We consider a setting with an evolving set of requests for transportation from an origin to a destination before a deadline and a set of agents capable of servicing the requests. In this setting, an assignment authority is to assign agents to requests such that the average idle time of the agents is minimized. An example is the scheduling of taxis (agents) to meet incoming requests for trips while…
▽ More
We consider a setting with an evolving set of requests for transportation from an origin to a destination before a deadline and a set of agents capable of servicing the requests. In this setting, an assignment authority is to assign agents to requests such that the average idle time of the agents is minimized. An example is the scheduling of taxis (agents) to meet incoming requests for trips while ensuring that the taxis are empty as little as possible. In this paper, we study the problem of spatial-temporal demand forecasting and competitive supply (SOUP). We address the problem in two steps. First, we build a granular model that provides spatial-temporal predictions of requests. Specifically, we propose a Spatial-Temporal Graph Convolutional Sequential Learning (ST-GCSL) algorithm that predicts the service requests across locations and time slots. Second, we provide means of routing agents to request origins while avoiding competition among the agents. In particular, we develop a demand-aware route planning (DROP) algorithm that considers both the spatial-temporal predictions and the supplydemand state. We report on extensive experiments with realworld and synthetic data that offer insight into the performance of the solution and show that it is capable of outperforming the state-of-the-art proposals.
△ Less
Submitted 18 January, 2021; v1 submitted 24 September, 2020;
originally announced September 2020.
-
Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution
Authors:
Ke Zheng,
Lianru Gao,
Wenzhi Liao,
Danfeng Hong,
Bing Zhang,
Ximin Cui,
Jocelyn Chanussot
Abstract:
Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most…
▽ More
Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most of these methods assume that the prior information of the Point Spread Function (PSF) and Spectral Response Function (SRF) are known. However, in practice, this information is often limited or unavailable. In this work, an unsupervised deep learning-based fusion method - HyCoNet - that can solve the problems in HSI-MSI fusion without the prior PSF and SRF information is proposed. HyCoNet consists of three coupled autoencoder nets in which the HSI and MSI are unmixed into endmembers and abundances based on the linear unmixing model. Two special convolutional layers are designed to act as a bridge that coordinates with the three autoencoder nets, and the PSF and SRF parameters are learned adaptively in the two convolution layers during the training process. Furthermore, driven by the joint loss function, the proposed method is straightforward and easily implemented in an end-to-end training manner. The experiments performed in the study demonstrate that the proposed method performs well and produces robust results for different datasets and arbitrary PSFs and SRFs.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Learning Hidden Markov Models for Linear Gaussian Systems with Applications to Event-based State Estimation
Authors:
Kaikai Zheng,
Dawei Shi,
Ling Shi
Abstract:
This work attempts to approximate a linear Gaussian system with a finite-state hidden Markov model (HMM), which is found useful in solving sophisticated event-based state estimation problems. An indirect modeling approach is developed, wherein a state space model (SSM) is firstly identified for a Gaussian system and the SSM is then used as an emulator for learning an HMM. In the proposed method, t…
▽ More
This work attempts to approximate a linear Gaussian system with a finite-state hidden Markov model (HMM), which is found useful in solving sophisticated event-based state estimation problems. An indirect modeling approach is developed, wherein a state space model (SSM) is firstly identified for a Gaussian system and the SSM is then used as an emulator for learning an HMM. In the proposed method, the training data for the HMM are obtained from the data generated by the SSM through building a quantization map**. Parameter learning algorithms are designed to learn the parameters of the HMM, through exploiting the periodical structural characteristics of the HMM. The convergence and asymptotic properties of the proposed algorithms are analyzed. The HMM learned using the proposed algorithms is applied to event-triggered state estimation, and numerical results on model learning and state estimation demonstrate the validity of the proposed algorithms.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
LSTM-based Anomaly Detection for Non-linear Dynamical System
Authors:
Yue Tan,
Chun**g Hu,
Kuan Zhang,
Kan Zheng,
Ethan A. Davis,
Jae Sung Park
Abstract:
Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability. However, it is usually complex and has to be solved by large-scale simulation which requires extensive computing resources. In this paper, we propose a novel anomaly detection scheme in non-linear dynamical system based on Long Short-Term Memory (LSTM) to capture complex temporal changes of…
▽ More
Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability. However, it is usually complex and has to be solved by large-scale simulation which requires extensive computing resources. In this paper, we propose a novel anomaly detection scheme in non-linear dynamical system based on Long Short-Term Memory (LSTM) to capture complex temporal changes of the time sequence and make multi-step predictions. Specifically, we first present the framework of LSTM-based anomaly detection in non-linear dynamical system, including data preprocessing, multi-step prediction and anomaly detection. According to the prediction requirement, two types of training modes are explored in multi-step prediction, where samples in a wall shear stress dataset are collected by an adaptive sliding window. On the basis of the multi-step prediction result, a Local Average with Adaptive Parameters (LAAP) algorithm is proposed to extract local numerical features of the time sequence and estimate the upcoming anomaly. The experimental results show that our proposed multi-step prediction method can achieve a higher prediction accuracy than traditional method in wall shear stress dataset, and the LAAP algorithm performs better than the absolute value-based method in anomaly detection task.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications
Authors:
Kai** Zheng,
Shaofeng Cai,
Horng Ruey Chua,
Wei Wang,
Kee Yuan Ngiam,
Beng Chin Ooi
Abstract:
In high stakes applications such as healthcare and finance analytics, the interpretability of predictive models is required and necessary for domain practitioners to trust the predictions. Traditional machine learning models, e.g., logistic regression (LR), are easy to interpret in nature. However, many of these models aggregate time-series data without considering the temporal correlations and va…
▽ More
In high stakes applications such as healthcare and finance analytics, the interpretability of predictive models is required and necessary for domain practitioners to trust the predictions. Traditional machine learning models, e.g., logistic regression (LR), are easy to interpret in nature. However, many of these models aggregate time-series data without considering the temporal correlations and variations. Therefore, their performance cannot match up to recurrent neural network (RNN) based models, which are nonetheless difficult to interpret. In this paper, we propose a general framework TRACER to facilitate accurate and interpretable predictions, with a novel model TITV devised for healthcare analytics and other high stakes applications such as financial investment and risk management. Different from LR and other existing RNN-based models, TITV is designed to capture both the time-invariant and the time-variant feature importance using a feature-wise transformation subnetwork and a self-attention subnetwork, for the feature influence shared over the entire time series and the time-related importance respectively. Healthcare analytics is adopted as a driving use case, and we note that the proposed TRACER is also applicable to other domains, e.g., fintech. We evaluate the accuracy of TRACER extensively in two real-world hospital datasets, and our doctors/clinicians further validate the interpretability of TRACER in both the patient level and the feature level. Besides, TRACER is also validated in a high stakes financial application and a critical temperature forecasting application. The experimental results confirm that TRACER facilitates both accurate and interpretable analytics for high stakes applications.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Hierarchical Transformer Network for Utterance-level Emotion Recognition
Authors:
QingBiao Li,
ChunHua Wu,
KangFeng Zheng,
Zhe Wang
Abstract:
While there have been significant advances in de-tecting emotions in text, in the field of utter-ance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog sys-tems. (1) The same utterance can deliver different emotions when it is in different contexts or from different speakers. (2) Long-range contextual in-format…
▽ More
While there have been significant advances in de-tecting emotions in text, in the field of utter-ance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog sys-tems. (1) The same utterance can deliver different emotions when it is in different contexts or from different speakers. (2) Long-range contextual in-formation is hard to effectively capture. (3) Unlike the traditional text classification problem, this task is supported by a limited number of datasets, among which most contain inadequate conversa-tions or speech. To address these problems, we propose a hierarchical transformer framework (apart from the description of other studies, the "transformer" in this paper usually refers to the encoder part of the transformer) with a lower-level transformer to model the word-level input and an upper-level transformer to capture the context of utterance-level embeddings. We use a pretrained language model bidirectional encoder representa-tions from transformers (BERT) as the lower-level transformer, which is equivalent to introducing external data into the model and solve the problem of data shortage to some extent. In addition, we add speaker embeddings to the model for the first time, which enables our model to capture the in-teraction between speakers. Experiments on three dialog emotion datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed hierarchical transformer network models achieve 1.98%, 2.83%, and 3.94% improvement, respec-tively, over the state-of-the-art methods on each dataset in terms of macro-F1.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Leveraging Linear Quadratic Regulator Cost and Energy Consumption for Ultra-Reliable and Low-Latency IoT Control Systems
Authors:
Haojun Yang,
Kuan Zhang,
Kan Zheng,
Yi Qian
Abstract:
To efficiently support the real-time control applications, networked control systems operating with ultra-reliable and low-latency communications (URLLCs) become fundamental technology for future Internet of things (IoT). However, the design of control, sensing and communications is generally isolated at present. In this paper, we propose the joint optimization of control cost and energy consumpti…
▽ More
To efficiently support the real-time control applications, networked control systems operating with ultra-reliable and low-latency communications (URLLCs) become fundamental technology for future Internet of things (IoT). However, the design of control, sensing and communications is generally isolated at present. In this paper, we propose the joint optimization of control cost and energy consumption for a centralized wireless networked control system. Specifically, with the ``sensing-then-control'' protocol, we first develop an optimization framework which jointly takes control, sensing and communications into account. In this framework, we derive the spectral efficiency, linear quadratic regulator cost and energy consumption. Then, a novel performance metric called the \textit{energy-to-control efficiency} is proposed for the IoT control system. In addition, we optimize the energy-to-control efficiency while guaranteeing the requirements of URLLCs, thereupon a general and complex max-min joint optimization problem is formulated for the IoT control system. To optimally solve the formulated problem by reasonable complexity, we propose two radio resource allocation algorithms. Finally, simulation results show that our proposed algorithms can significantly improve the energy-to-control efficiency for the IoT control system with URLLCs.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Joint Frame Design and Resource Allocation for Ultra-Reliable and Low-Latency Vehicular Networks
Authors:
Haojun Yang,
Kuan Zhang,
Kan Zheng,
Yi Qian
Abstract:
The rapid development of the fifth generation mobile communication systems accelerates the implementation of vehicle-to-everything communications. Compared with the other types of vehicular communications, vehicle-to-vehicle (V2V) communications mainly focus on the exchange of driving safety information with neighboring vehicles, which requires ultra-reliable and low-latency communications (URLLCs…
▽ More
The rapid development of the fifth generation mobile communication systems accelerates the implementation of vehicle-to-everything communications. Compared with the other types of vehicular communications, vehicle-to-vehicle (V2V) communications mainly focus on the exchange of driving safety information with neighboring vehicles, which requires ultra-reliable and low-latency communications (URLLCs). However, the frame size is significantly shortened in V2V URLLCs because of the rigorous latency requirements, and thus the overhead is no longer negligible compared with the payload information from the perspective of size. In this paper, we investigate the frame design and resource allocation for an urban V2V URLLC system in which the uplink cellular resources are reused at the underlay mode. Specifically, we first analyze the lower bounds of performance for V2V pairs and cellular users based on the regular pilot scheme and superimposed pilot scheme. Then, we propose a frame design algorithm and a semi-persistent scheduling algorithm to achieve the optimal frame design and resource allocation with the reasonable complexity. Finally, our simulation results show that the proposed frame design and resource allocation scheme can greatly satisfy the URLLC requirements of V2V pairs and guarantee the communication quality of cellular users.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Design and Implementation of a High-Accuracy Positioning System Using RTK on Smartphones
Authors:
Geng Shi,
Ziqiang Ying,
Rongtao Xu,
Kan Zheng
Abstract:
In recent years, with the development of the Global Navigation Satellite System (GNSS), the satellite navigation technology has played a crucial role in smartphone navigation. To solve the problem of the low positioning accuracy in the smartphones based on GNSS, this paper proposes to apply real-time dynamic carrier phase difference technique (RTK) in the smartphones, and a real-time positioning s…
▽ More
In recent years, with the development of the Global Navigation Satellite System (GNSS), the satellite navigation technology has played a crucial role in smartphone navigation. To solve the problem of the low positioning accuracy in the smartphones based on GNSS, this paper proposes to apply real-time dynamic carrier phase difference technique (RTK) in the smartphones, and a real-time positioning system for smartphones based on RTK is implemented. This paper presents the implementation and experimental results of this system. This system is mainly composed of the GNSS reference station, the NTRIP system and the smartphones. The experimental results show that the system effectively improves the positioning accuracy of smartphones
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Design and Implementation of a Low-Latency and High-Reliability System Based on Software-Defined Radio (SDR)
Authors:
Zhiwen Wu,
Tianpeng Yang,
Haitao Liu,
Geng Shi,
Kan Zheng
Abstract:
Ultra-reliable and low-latency communication (URLLC) is one of the three major service classes supported by the fifth generation (5G) New Radio (NR) technical specifications. In this paper, we introduce a physical layer architecture that can meet the low-latency and high-reliability requirements. The downlink system is designed according to the Third Generation Partner Project (3GPP) specification…
▽ More
Ultra-reliable and low-latency communication (URLLC) is one of the three major service classes supported by the fifth generation (5G) New Radio (NR) technical specifications. In this paper, we introduce a physical layer architecture that can meet the low-latency and high-reliability requirements. The downlink system is designed according to the Third Generation Partner Project (3GPP) specifications based on software defined radio (SDR) system. The URLLC system physical layer downlink is implemented on the open source OpenAirInterface (OAI) platform to evaluate the latency and reliability performance of the scheme. Not only the URLLC system reliability performance is tested based on the simulation platform, but also the delay performance is evaluated by the realization of the over-the-air system. The experimental results show that the designed scheme can approximately meet the reliability and delay performance requirements of the 3GPP specifications.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Dynamic Energy Dispatch Based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids
Authors:
Lei Lei,
Yue Tan,
Glenn Dahlenburg,
Wei Xiang,
Kan Zheng
Abstract:
Microgrids (MGs) are small, local power grids that can operate independently from the larger utility grid. Combined with the Internet of Things (IoT), a smart MG can leverage the sensory data and machine learning techniques for intelligent energy management. This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for IoT-driven smart isolated MGs with diesel generators (DGs),…
▽ More
Microgrids (MGs) are small, local power grids that can operate independently from the larger utility grid. Combined with the Internet of Things (IoT), a smart MG can leverage the sensory data and machine learning techniques for intelligent energy management. This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for IoT-driven smart isolated MGs with diesel generators (DGs), photovoltaic (PV) panels, and a battery. A finite-horizon Partial Observable Markov Decision Process (POMDP) model is formulated and solved by learning from historical data to capture the uncertainty in future electricity consumption and renewable power generation. In order to deal with the instability problem of DRL algorithms and unique characteristics of finite-horizon models, two novel DRL algorithms, namely, finite-horizon deep deterministic policy gradient (FH-DDPG) and finite-horizon recurrent deterministic policy gradient (FH-RDPG), are proposed to derive energy dispatch policies with and without fully observable state information. A case study using real isolated MG data is performed, where the performance of the proposed algorithms are compared with the other baseline DRL and non-DRL algorithms. Moreover, the impact of uncertainties on MG performance is decoupled into two levels and evaluated respectively.
△ Less
Submitted 16 November, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing
Authors:
Lei Lei,
Huijuan Xu,
Xiong Xiong,
Kan Zheng,
Wei Xiang,
Xianbin Wang
Abstract:
By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allo…
▽ More
By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allocated and scheduled efficiently for better system performance. In this paper, we propose a joint computation offloading and multi-user scheduling algorithm for IoT edge computing system to minimize the long-term average weighted sum of delay and power consumption under stochastic traffic arrival. We formulate the dynamic optimization problem as an infinite-horizon average-reward continuous-time Markov decision process (CTMDP) model. One critical challenge in solving this MDP problem for the multi-user resource control is the curse-of-dimensionality problem, where the state space of the MDP model and the computation complexity increase exponentially with the growing number of users or IoT devices. In order to overcome this challenge, we use the deep reinforcement learning (RL) techniques and propose a neural network architecture to approximate the value functions for the post-decision system states. The designed algorithm to solve the CTMDP problem supports semi-distributed auction-based implementation, where the IoT devices submit bids to the BS to make the resource control decisions centrally. Simulation results show that the proposed algorithm provides significant performance improvement over the baseline algorithms, and also outperforms the RL algorithms based on other neural network architectures.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Cooperative V2X for High Definition Map Transmission Based on Vehicle Mobility
Authors:
Fangfei Wang,
Dong Guan,
Long Zhao,
Kan Zheng
Abstract:
High-definition (HD) map transmission is considered as a key technology for automatic driving, which enables vehicles to obtain the precise road and surrounding environment information for further localization and navigation. Guaranteeing the huge requirement of HD map data, the objective of this paper is to reduce the power consumption of vehicular networks. By leveraging the mobile rule of vehic…
▽ More
High-definition (HD) map transmission is considered as a key technology for automatic driving, which enables vehicles to obtain the precise road and surrounding environment information for further localization and navigation. Guaranteeing the huge requirement of HD map data, the objective of this paper is to reduce the power consumption of vehicular networks. By leveraging the mobile rule of vehicles, a collaborative vehicle to everything (V2X) transmission scheme is proposed for the HD map transmission. Numerical results indicate that the proposed scheme can satisfy the transmission rate requirement of HD map with low power consumption.
△ Less
Submitted 20 February, 2019;
originally announced February 2019.
-
A Novel Rate and Channel Control Scheme Based on Data Extraction Rate for LoRa Networks
Authors:
Qihao Zhou,
**yu Xing,
Lu Hou,
Rongtao Xu,
Kan Zheng
Abstract:
Long Range (LoRa) has become one of the most popular Low Power Wide Area (LPWA) technologies, which provides a desirable trade-off among communication range, battery life, and deployment cost. In LoRa networks, several transmission parameters can be allocated to ensure efficient and reliable communication. For example, the configuration of the spreading factor allows tuning the data rate and the t…
▽ More
Long Range (LoRa) has become one of the most popular Low Power Wide Area (LPWA) technologies, which provides a desirable trade-off among communication range, battery life, and deployment cost. In LoRa networks, several transmission parameters can be allocated to ensure efficient and reliable communication. For example, the configuration of the spreading factor allows tuning the data rate and the transmission distance. However, how to dynamically adjust the setting that minimizes the collision probability while meeting the required communication performance is an open challenge. This paper proposes a novel Data Rate and Channel Control (DRCC) scheme for LoRa networks so as to improve wireless resource utilization and support a massive number of LoRa nodes. The scheme estimates channel conditions based on the short-term Data Extraction Rate (DER), and opportunistically adjusts the spreading factor to adapt the variation of channel conditions. Furthermore, the channel control is carried out to balance the link load of all available channels with the global information of the channel usage, which is able to lower the access collisions under dense deployments. Our experiments demonstrate that the proposed DRCC performs well on improving the reliability and capacity compared with other spreading factor allocation schemes in dense deployment scenarios.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.