Search | arXiv e-print repository

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 9 pages, 3 figures, ACL24 accepted

arXiv:2406.00356 [pdf, other]

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficient and high-quality text-to-audio generation. AudioLCM integrates Consistency Models into the generation process, facilitating rapid inference through a map** from any point at any time step to the trajectory's initial point. To overcome the convergence issue inherent in LDMs with reduced sample iterations, we propose the Guided Latent Consistency Distillation with a multi-step Ordinary Differential Equation (ODE) solver. This innovation shortens the time schedule from thousands to dozens of steps while maintaining sample quality, thereby achieving fast convergence and high-quality generation. Furthermore, to optimize the performance of transformer-based neural network architectures, we integrate the advanced techniques pioneered by LLaMA into the foundational framework of transformers. This architecture supports stable and efficient training, ensuring robust performance in text-to-audio synthesis. Experimental results on text-to-sound generation and text-to-music synthesis tasks demonstrate that AudioLCM needs only 2 iterations to synthesize high-fidelity audios, while it maintains sample quality competitive with state-of-the-art models using hundreds of steps. AudioLCM enables a sampling speed of 333x faster than real-time on a single NVIDIA 4090Ti GPU, making generative models practically applicable to text-to-audio generation deployment. Our extensive preliminary analysis shows that each design in AudioLCM is effective. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2402.09658 [pdf]

Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm

Authors: Amir Mohammad Naderi, Jennifer G. Casey, Mao-Hsiang Huang, Rachelle Victorio, David Y. Chiang, Calum MacRae, Hung Cao, Vandana A. Gupta

Abstract: Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend… ▽ More Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend to be overfitted on their training dataset. This means that applying the same framework to new data with different imaging setups and mutant types can severely decrease performance. We have developed a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) to quantify the cardiac function in zebrafish. In this work, we further applied data augmentation, Transfer Learning (TL), and Test Time Augmentation (TTA) to ZACAF to improve the performance for the quantification of cardiovascular function quantification in zebrafish. This strategy can be integrated with the available frameworks to aid other researchers. We demonstrate that using TL, even with a constrained dataset, the model can be refined to accommodate a novel microscope setup, encompassing diverse mutant types and accommodating various video recording protocols. Additionally, as users engage in successive rounds of TL, the model is anticipated to undergo substantial enhancements in both generalizability and accuracy. Finally, we applied this approach to assess the cardiovascular function in nrap mutant zebrafish, a model of cardiomyopathy. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.09674 [pdf, ps, other]

QoS-Aware 3D Coverage Deployment of UAVs for Internet of Vehicles in Intelligent Transportation

Authors: engfei Du, Tingyue Xiao, Haotong Cao, Daosen Zhai

Abstract: It is a challenging problem to characterize the air-to-ground (A2G) channel and identify the best deployment location for 3D UAVs with the QoS awareness. To address this problem, we propose a QoS-aware UAV 3D coverage deployment algorithm, which simulates the three-dimensional urban road scenario, considers the UAV communication resource capacity and vehicle communication QoS requirements comprehe… ▽ More It is a challenging problem to characterize the air-to-ground (A2G) channel and identify the best deployment location for 3D UAVs with the QoS awareness. To address this problem, we propose a QoS-aware UAV 3D coverage deployment algorithm, which simulates the three-dimensional urban road scenario, considers the UAV communication resource capacity and vehicle communication QoS requirements comprehensively, and then obtains the optimal UAV deployment position by improving the genetic algorithm. Specifically, the K-means clustering algorithm is used to cluster the vehicles, and the center locations of these clusters serve as the initial UAV positions to generate the initial population. Subsequently, we employ the K-means initialized grey wolf optimization (KIGWO) algorithm to achieve the UAV location with an optimal fitness value by performing an optimal search within the grey wolf population. To enhance the algorithm's diversity and global search capability, we randomly substitute this optimal location with one of the individual locations from the initial population. The fitness value is determined by the total number of vehicles covered by UAVs in the system, while the allocation scheme's feasibility is evaluated based on the corresponding QoS requirements. Competitive selection operations are conducted to retain individuals with higher fitness values, while crossover and mutation operations are employed to maintain the diversity of solutions. Finally, the individual with the highest fitness, which represents the UAV deployment position that covers the maximum number of vehicles in the entire system, is selected as the optimal solution. Extensive experimental results demonstrate that the proposed algorithm can effectively enhance the reliability and vehicle communication QoS. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2307.08230 [pdf, other]

Image-based Regularization for Action Smoothness in Autonomous Miniature Racing Car with Deep Reinforcement Learning

Authors: Hoang-Giang Cao, I Lee, Bo-Jiun Hsu, Zheng-Yi Lee, Yu-Wei Shih, Hsueh-Cheng Wang, I-Chen Wu

Abstract: Deep reinforcement learning has achieved significant results in low-level controlling tasks. However, for some applications like autonomous driving and drone flying, it is difficult to control behavior stably since the agent may suddenly change its actions which often lowers the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollable, dangerous behavior to the… ▽ More Deep reinforcement learning has achieved significant results in low-level controlling tasks. However, for some applications like autonomous driving and drone flying, it is difficult to control behavior stably since the agent may suddenly change its actions which often lowers the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollable, dangerous behavior to the vehicle. Recently, a method called conditioning for action policy smoothness (CAPS) was proposed to solve the problem of jerkiness in low-dimensional features for applications such as quadrotor drones. To cope with high-dimensional features, this paper proposes image-based regularization for action smoothness (I-RAS) for solving jerky control in autonomous miniature car racing. We also introduce a control based on impact ratio, an adaptive regularization weight to control the smoothness constraint, called IR control. In the experiment, an agent with I-RAS and IR control significantly improves the success rate from 59% to 95%. In the real-world-track experiment, the agent also outperforms other methods, namely reducing the average finish lap time, while also improving the completion rate even without real world training. This is also justified by an agent based on I-RAS winning the 2022 AWS DeepRacer Final Championship Cup. △ Less

Submitted 10 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)2023

arXiv:2306.04970 [pdf, other]

Motion Planning for Aerial Pick-and-Place based on Geometric Feasibility Constraints

Authors: Huazi Cao, Jiahao Shen, Cunjia Liu, Bo Zhu, Shiyu Zhao

Abstract: This paper studies the motion planning problem of the pick-and-place of an aerial manipulator that consists of a quadcopter flying base and a Delta arm. We propose a novel partially decoupled motion planning framework to solve this problem. Compared to the state-of-the-art approaches, the proposed one has two novel features. First, it does not suffer from increased computation in high-dimensional… ▽ More This paper studies the motion planning problem of the pick-and-place of an aerial manipulator that consists of a quadcopter flying base and a Delta arm. We propose a novel partially decoupled motion planning framework to solve this problem. Compared to the state-of-the-art approaches, the proposed one has two novel features. First, it does not suffer from increased computation in high-dimensional configuration spaces. That is because it calculates the trajectories of the quadcopter base and the end-effector separately in the Cartesian space based on proposed geometric feasibility constraints. The geometric feasibility constraints can ensure the resulting trajectories satisfy the aerial manipulator's geometry. Second, collision avoidance for the Delta arm is achieved through an iterative approach based on a pinhole map** method, so that the feasible trajectory can be found in an efficient manner. The proposed approach is verified by three experiments on a real aerial manipulation platform. The experimental results show the effectiveness of the proposed method for the aerial pick-and-place task. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2303.16860 [pdf, other]

Physical Deep Reinforcement Learning Towards Safety Guarantee

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learnin… ▽ More Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Working Paper

arXiv:2212.05288 [pdf, other]

Robust Sum-Rate Maximization in Transmissive RMS Transceiver-Enabled SWIPT Networks

Authors: Zhendong Li, Wen Chen, Ziheng Zhang, Qingqing Wu, Huanqing Cao, Jun Li

Abstract: In this paper, we propose a state-of-the-art downlink communication transceiver design for transmissive reconfigurable metasurface (RMS)-enabled simultaneous wireless information and power transfer (SWIPT) networks. Specifically, a feed antenna is deployed in the transmissive RMS-based transceiver, which can be used to implement beamforming. According to the relationship between wavelength and pro… ▽ More In this paper, we propose a state-of-the-art downlink communication transceiver design for transmissive reconfigurable metasurface (RMS)-enabled simultaneous wireless information and power transfer (SWIPT) networks. Specifically, a feed antenna is deployed in the transmissive RMS-based transceiver, which can be used to implement beamforming. According to the relationship between wavelength and propagation distance, the spatial propagation models of plane and spherical waves are built. Then, in the case of imperfect channel state information (CSI), we formulate a robust system sum-rate maximization problem that jointly optimizes RMS transmissive coefficient, transmit power allocation, and power splitting ratio design while taking account of the non-linear energy harvesting model and outage probability criterion. Since the coupling of optimization variables, the whole optimization problem is non-convex and cannot be solved directly. Therefore, the alternating optimization (AO) framework is implemented to decompose the non-convex original problem. In detail, the whole problem is divided into three sub-problems to solve. For the non-convexity of the objective function, successive convex approximation (SCA) is used to transform it, and penalty function method and difference-of-convex (DC) programming are applied to deal with the non-convex constraints. Finally, we alternately solve the three sub-problems until the entire optimization problem converges. Numerical results show that our proposed algorithm has convergence and better performance than other benchmark algorithms. △ Less

Submitted 10 December, 2022; originally announced December 2022.

arXiv:2208.10006 [pdf, ps, other]

An SBR Based Ray Tracing Channel Modeling Method for THz and Massive MIMO Communications

Authors: Yuanzhe Wang, Hao Cao, Yifan **, Zizhe Zhou, Yinghua Wang, Jialing Huang, Yuxiao Li, Jie Huang, Cheng-Xiang Wang

Abstract: Terahertz (THz) communication and the application of massive multiple-input multiple-output (MIMO) technology have been proved significant for the sixth generation (6G) communication systems, and have gained global interests. In this paper, we employ the shooting and bouncing ray (SBR) method integrated with acceleration technology to model THz and massive MIMO channel. The results of ray tracing… ▽ More Terahertz (THz) communication and the application of massive multiple-input multiple-output (MIMO) technology have been proved significant for the sixth generation (6G) communication systems, and have gained global interests. In this paper, we employ the shooting and bouncing ray (SBR) method integrated with acceleration technology to model THz and massive MIMO channel. The results of ray tracing (RT) simulation in this paper, i.e., angle of departure (AoD), angle of arrival (AoA), and power delay profile (PDP) under the frequency band supported by the commercial RT software Wireless Insite (WI) are in agreement with those produced by WI. Based on the Kirchhoff scattering effect on material surfaces and atmospheric absorption loss showing at THz frequency band, the modified propagation models of Fresnel reflection coefficients and free-space attenuation are consistent with the measured results. For massive MIMO, the channel capacity and the stochastic power distribution are analyzed. The results indicate the applicability of SBR method for building deterministic models of THz and massive MIMO channels with extensive functions and acceptable accuracy. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: 6 pages, 11 figures, conference

arXiv:2208.03798 [pdf, ps, other]

Codebook Based Two-Time Scale Resource Allocation Design for IRS-Assisted eMBB-URLLC Systems

Authors: Walid R. Ghanem, Vahid Jamali, Malte Schellmann, Hanwen Cao, Joseph Eichinger, Robert Schober

Abstract: This paper investigates the resource allocation algorithm design for wireless systems assisted by large intelligent reflecting surfaces (IRSs) with coexisting enhanced mobile broadband (eMBB) and ultra reliable low-latency communication (URLLC) users. We consider a two-time scale resource allocation scheme, whereby the base station's precoders are optimized in each mini-slot to adapt to newly arri… ▽ More This paper investigates the resource allocation algorithm design for wireless systems assisted by large intelligent reflecting surfaces (IRSs) with coexisting enhanced mobile broadband (eMBB) and ultra reliable low-latency communication (URLLC) users. We consider a two-time scale resource allocation scheme, whereby the base station's precoders are optimized in each mini-slot to adapt to newly arriving URLLC traffic, whereas the IRS phase shifts are reconfigured only in each time slot to avoid excessive base station-IRS signaling. To facilitate efficient resource allocation design for large IRSs, we employ a codebook-based optimization framework, where the IRS is divided into several tiles and the phase-shift elements of each tile are selected from a pre-defined codebook. The resource allocation algorithm design is formulated as an optimization problem for the maximization of the average sum data rate of the eMBB users over a time slot while guaranteeing the quality-of-service (QoS) of each URLLC user in each mini-slot. An iterative algorithm based on alternating optimization (AO) is proposed to find a high-quality suboptimal solution. As a case study, the proposed algorithm is applied in an industrial indoor environment modelled via the Quadriga channel simulator. Our simulation results show that the proposed algorithm design enables the coexistence of eMBB and URLLC users and yields large performance gains compared to three baseline schemes. Furthermore, our simulation results reveal that the proposed two-time scale resource allocation design incurs only a small performance loss compared to the case when the IRSs are optimized in each mini-slot. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: 6 pages, 3 figures, 1 Table, and submitted for an IEEE conference publication

arXiv:2207.12027 [pdf, other]

Non-cascaded Control Barrier Functions for the Safe Control of Quadrotors

Authors: Weifeng Zeng, Huanhui Cao, Wenjie Lu, Hao Xiong

Abstract: Researchers have developed various cascaded controllers and non-cascaded controllers for the navigation and control of quadrotors in recent years. It is vital to ensure the safety of a quadrotor both in normal state and in abnormal state if a controller tends to make the quadrotor unsafe. To this end, this paper proposes a non-cascaded Control Barrier Function (CBF) for a quadrotor controlled by e… ▽ More Researchers have developed various cascaded controllers and non-cascaded controllers for the navigation and control of quadrotors in recent years. It is vital to ensure the safety of a quadrotor both in normal state and in abnormal state if a controller tends to make the quadrotor unsafe. To this end, this paper proposes a non-cascaded Control Barrier Function (CBF) for a quadrotor controlled by either cascaded controllers or a non-cascaded controller. Incorporated with a Quadratic Programming (QP), the non-cascaded CBF can simultaneously regulate the magnitude of the total thrust and the torque of the quadrotor determined a controller, so as to ensure the safety of the quadrotor both in normal state and in abnormal state. The non-cascaded CBF establishes a non-conservative forward invariant safe region, in which the controller of a quadrotor is fully or partially effective in the navigation or the pose control of the quadrotor. The non-cascaded CBF is applied to a quadrotor performing trajectory tracking and a quadrotor performing aggressive roll maneuvers in simulations to evaluate the effectiveness of the non-cascaded CBF. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.11819 [pdf]

A Novel ECG Denoising Scheme Using the Ensemble Kalman Filter

Authors: Sadaf Sarafan, Hoang Vuong, Daniel Jilani, Samir Malhotra, Michael P. H. Lau, Manoj Vishwanath, Tadesse Ghirmai, Hung Cao

Abstract: Monitoring of electrocardiogram (ECG) provides vital information as well as any cardiovascular anomalies. Recent advances in the technology of wearable electronics have enabled compact devices to acquire personal physiological signals in the home setting; however, signals are usually contaminated with high level noise. Thus, an efficient ECG filtering scheme is a dire need. In this paper, a novel… ▽ More Monitoring of electrocardiogram (ECG) provides vital information as well as any cardiovascular anomalies. Recent advances in the technology of wearable electronics have enabled compact devices to acquire personal physiological signals in the home setting; however, signals are usually contaminated with high level noise. Thus, an efficient ECG filtering scheme is a dire need. In this paper, a novel method using Ensemble Kalman Filter (EnKF) is developed for denoising ECG signals. We also intensively explore various filtering algorithms, including Savitzky-Golay (SG) filter, Ensemble Empirical mode decomposition (EEMD), Normalized Least-Mean-Square (NLMS), Recursive least squares (RLS) filter, Total variation denoising (TVD), Wavelet and extended Kalman filter (EKF) for comparison. Data from the MIT-BIH Noise Stress Test database were used. The proposed methodology shows the average signal to noise ratio (SNR) of 10.96, the Percentage Root Difference of 150.45, and the correlation coefficient of 0.959 from the modified MIT-BIH database with added motion artifacts. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.00842 [pdf, other]

Safe Reinforcement Learning for a Robot Being Pursued but with Objectives Covering More Than Capture-avoidance

Authors: Huanhui Cao, Zhiyuan Cai, Hairuo Wei, Wenjie Lu, Lin Zhang, Hao Xiong

Abstract: Reinforcement Learning (RL) algorithms show amazing performance in recent years, but placing RL in real-world applications such as self-driven vehicles may suffer safety problems. A self-driven vehicle moving to a target position following a learned policy may suffer a vehicle with unpredictable aggressive behaviors or even being pursued by a vehicle following a Nash strategy. To address the safet… ▽ More Reinforcement Learning (RL) algorithms show amazing performance in recent years, but placing RL in real-world applications such as self-driven vehicles may suffer safety problems. A self-driven vehicle moving to a target position following a learned policy may suffer a vehicle with unpredictable aggressive behaviors or even being pursued by a vehicle following a Nash strategy. To address the safety issue of the self-driven vehicle in this scenario, this paper conducts a preliminary study based on a system of robots. A safe RL framework with safety guarantees is developed for a robot being pursued but with objectives covering more than capture-avoidance. Simulations and experiments are conducted based on the system of robots to evaluate the effectiveness of the developed safe RL framework. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2205.09658 [pdf, other]

Image-Based Conditioning for Action Policy Smoothness in Autonomous Miniature Car Racing with Reinforcement Learning

Authors: Bo-Jiun Hsu, Hoang-Giang Cao, I Lee, Chih-Yu Kao, **-Bo Huang, I-Chen Wu

Abstract: In recent years, deep reinforcement learning has achieved significant results in low-level controlling tasks. However, the problem of control smoothness has less attention. In autonomous driving, unstable control is inevitable since the vehicle might suddenly change its actions. This problem will lower the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollabl… ▽ More In recent years, deep reinforcement learning has achieved significant results in low-level controlling tasks. However, the problem of control smoothness has less attention. In autonomous driving, unstable control is inevitable since the vehicle might suddenly change its actions. This problem will lower the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollable, dangerous behavior to the vehicle. In this paper, we apply the Conditioning for Action Policy Smoothness (CAPS) with image-based input to smooth the control of an autonomous miniature car racing. Applying CAPS and sim-to-real transfer methods helps to stabilize the control at a higher speed. Especially, the agent with CAPS and CycleGAN reduces 21.80% of the average finishing lap time. Moreover, we also conduct extensive experiments to analyze the impact of CAPS components. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2203.14924 [pdf, other]

Sandboxing (AI-based) Unverified Controllers in Stochastic Games: An Abstraction-based Approach with Safe-visor Architecture

Authors: Bingzhuo Zhong, Hongpeng Cao, Majid Zamani, Marco Caccamo

Abstract: In this paper, we propose a construction scheme for a Safe-visor architecture for sandboxing unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, in two-players non-cooperative stochastic games. Concretely, we leverage abstraction-based approaches to construct a supervisor that checks and decides whether or not to accept the inputs provided by the unverified c… ▽ More In this paper, we propose a construction scheme for a Safe-visor architecture for sandboxing unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, in two-players non-cooperative stochastic games. Concretely, we leverage abstraction-based approaches to construct a supervisor that checks and decides whether or not to accept the inputs provided by the unverified controller, and a safety advisor that provides fallback control inputs to ensure safety whenever the unverified controller is rejected. Moreover, by leveraging an ($ε,δ$)-approximate probabilistic relation between the original game and its finite abstraction, we provide a formal safety guarantee with respect to safety specifications modeled by deterministic finite automata (DFA), while the functionality of the unverified controllers is still exploited. To show the effectiveness of the proposed results, we apply them to a control problem of a quadrotor tracking a moving ground vehicle, in which an AI-based unverified controller is employed to control the quadrotor. △ Less

Submitted 28 March, 2022; originally announced March 2022.

arXiv:2203.03428 [pdf, other]

Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition

Authors: Jay Desai, Houwei Cao, Ravi Shah

Abstract: Automatic emotion recognition for real-life appli-cations is a challenging task. Human emotion expressions aresubtle, and can be conveyed by a combination of several emo-tions. In most existing emotion recognition studies, each audioutterance/video clip is labelled/classified in its entirety. However,utterance/clip-level labelling and classification can be too coarseto capture the subtle intra-utt… ▽ More Automatic emotion recognition for real-life appli-cations is a challenging task. Human emotion expressions aresubtle, and can be conveyed by a combination of several emo-tions. In most existing emotion recognition studies, each audioutterance/video clip is labelled/classified in its entirety. However,utterance/clip-level labelling and classification can be too coarseto capture the subtle intra-utterance/clip temporal dynamics. Forexample, an utterance/video clip usually contains only a fewemotion-salient regions and many emotionless regions. In thisstudy, we propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video,and further estimate the temporal emotion dynamics by aggre-gating those emotionally salient regions-of-interest. We comparethe ROI from audio and video and analyse them. We comparethe performance of the proposed attention networks with thestate-of-the-art LSTM models on multi-class classification task ofrecognizing six basic human emotions, and the proposed attentionmodels exhibit significantly better performance. Furthermore, theattention weight distribution can be used to interpret how anutterance can be expressed as a mixture of possible emotions. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: Paper written in 2019

arXiv:2203.01630 [pdf, ps, other]

Optimization-based Phase-shift Codebook Design for Large IRSs

Authors: Walid R. Ghanem, Vahid Jamali, Malte Schellmann, Hanwen Cao, Joseph Eichinger, Robert Schober

Abstract: In this paper, we focus on large intelligent reflecting surfaces (IRSs) and propose a new codebook construction method to obtain a set of pre-designed phase-shift configurations for the IRS unit cells. Since the complexity of online optimization and the overhead for channel estimation scale with the size of the phase-shift codebook, the design of small codebooks is of high importance. We consider… ▽ More In this paper, we focus on large intelligent reflecting surfaces (IRSs) and propose a new codebook construction method to obtain a set of pre-designed phase-shift configurations for the IRS unit cells. Since the complexity of online optimization and the overhead for channel estimation scale with the size of the phase-shift codebook, the design of small codebooks is of high importance. We consider both continuous and discrete phase-shift designs and formulate the codebook construction as optimization problems. To solve the optimization problems, we propose an optimal algorithm for the discrete phase-shift design and a low-complexity sub-optimal solution for the continuous design. Simulation results show that the proposed algorithms facilitate the construction of codebooks of different sizes and with different beamwidths. Moreover, the performance of the discrete phaseshift design with 2-bit quantization is shown to approach that of the continuous phase-shift design. Finally, our simulation results show that the proposed designs enable large transmit power savings compared to the existing linear and quadratic codebook designs [1], [2]. △ Less

Submitted 8 August, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 13 pages, 4 figures

arXiv:2202.08309 [pdf, other]

Contextualize differential privacy in image database: a lightweight image differential privacy approach based on principle component analysis inverse

Authors: Shiliang Zhang, Xuehui Ma, Hui Cao, Tengyuan Zhao, Yajie Yu, Zhuzhu Wang

Abstract: Differential privacy (DP) has been the de-facto standard to preserve privacy-sensitive information in database. Nevertheless, there lacks a clear and convincing contextualization of DP in image database, where individual images' indistinguishable contribution to a certain analysis can be achieved and observed when DP is exerted. As a result, the privacy-accuracy trade-off due to integrating DP is… ▽ More Differential privacy (DP) has been the de-facto standard to preserve privacy-sensitive information in database. Nevertheless, there lacks a clear and convincing contextualization of DP in image database, where individual images' indistinguishable contribution to a certain analysis can be achieved and observed when DP is exerted. As a result, the privacy-accuracy trade-off due to integrating DP is insufficiently demonstrated in the context of differentially-private image database. This work aims at contextualizing DP in image database by an explicit and intuitive demonstration of integrating conceptional differential privacy with images. To this end, we design a lightweight approach dedicating to privatizing image database as a whole and preserving the statistical semantics of the image database to an adjustable level, while making individual images' contribution to such statistics indistinguishable. The designed approach leverages principle component analysis (PCA) to reduce the raw image with large amount of attributes to a lower dimensional space whereby DP is performed, so as to decrease the DP load of calculating sensitivity attribute-by-attribute. The DP-exerted image data, which is not visible in its privatized format, is visualized through PCA inverse such that both a human and machine inspector can evaluate the privatization and quantify the privacy-accuracy trade-off in an analysis on the privatized image database. Using the devised approach, we demonstrate the contextualization of DP in images by two use cases based on deep learning models, where we show the indistinguishability of individual images induced by DP and the privatized images' retention of statistical semantics in deep learning tasks, which is elaborated by quantitative analyses on the privacy-accuracy trade-off under different privatization settings. △ Less

Submitted 19 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

arXiv:2111.03301 [pdf, other]

Frequency-Aware Physics-Inspired Degradation Model for Real-World Image Super-Resolution

Authors: Zhenxing Dong, Hong Cao, Wang Shen, Yu Gan, Yuye Ling, Guangtao Zhai, Yikai Su

Abstract: Current learning-based single image super-resolution (SISR) algorithms underperform on real data due to the deviation in the assumed degrada-tion process from that in the real-world scenario. Conventional degradation processes consider applying blur, noise, and downsampling (typicallybicubic downsampling) on high-resolution (HR) images to synthesize low-resolution (LR) counterparts. However, few w… ▽ More Current learning-based single image super-resolution (SISR) algorithms underperform on real data due to the deviation in the assumed degrada-tion process from that in the real-world scenario. Conventional degradation processes consider applying blur, noise, and downsampling (typicallybicubic downsampling) on high-resolution (HR) images to synthesize low-resolution (LR) counterparts. However, few works on degradation modelling have taken the physical aspects of the optical imaging system intoconsideration. In this paper, we analyze the imaging system optically andexploit the characteristics of the real-world LR-HR pairs in the spatial frequency domain. We formulate a real-world physics-inspired degradationmodel by considering bothopticsandsensordegradation; The physical degradation of an imaging system is modelled as a low-pass filter, whose cut-off frequency is dictated by the object distance, the focal length of thelens, and the pixel size of the image sensor. In particular, we propose to use a convolutional neural network (CNN) to learn the cutoff frequency of real-world degradation process. The learned network is then applied to synthesize LR images from unpaired HR images. The synthetic HR-LR image pairs are later used to train an SISR network. We evaluatethe effectiveness and generalization capability of the proposed degradation model on real-world images captured by different imaging systems. Experimental results showcase that the SISR network trained by using our synthetic data performs favorably against the network using the traditional degradation model. Moreover, our results are comparable to that obtained by the same network trained by using real-world LR-HR pairs, which are challenging to obtain in real scenes. △ Less

Submitted 11 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: 22 pages,12 figures

arXiv:2107.11016 [pdf, other]

Joint Communication and Trajectory Design for Intelligent Reflecting Surface Empowered UAV SWIPT Networks

Authors: Zhendong Li, Wen Chen, Huanqing Cao, Hongying Tang, Kunlun Wang, Jun Li

Abstract: Aiming at the limited battery capacity of widely deployed low-power smart devices in the Internet-of-things (IoT), this paper proposes a novel intelligent reflecting surface (IRS) empowered unmanned aerial vehicle (UAV) simultaneous wireless information and power transfer (SWIPT) network framework, in which IRS is used to reconstruct the wireless channel to enhance the wireless energy transmission… ▽ More Aiming at the limited battery capacity of widely deployed low-power smart devices in the Internet-of-things (IoT), this paper proposes a novel intelligent reflecting surface (IRS) empowered unmanned aerial vehicle (UAV) simultaneous wireless information and power transfer (SWIPT) network framework, in which IRS is used to reconstruct the wireless channel to enhance the wireless energy transmission efficiency and coverage area of the UAV SWIPT networks. In this paper, we formulate an achievable sum-rate maximization problem by jointly optimizing UAV trajectory, successive interference cancellation (SIC) decoding order, UAV transmit power allocation, power splitting (PS) ratio and IRS reflection coefficient while taking account of user non-orthogonal multiple access (NOMA) and a non-linear energy harvesting model. Due to the coupling of optimization variables, this problem is a complex non-convex optimization problem, and it is challenging to solve it directly. We first transform the problem, and then apply the alternating optimization (AO) algorithm framework to divide the transformed problem into four sub-problems to solve it. Specifically, by applying successive convex approximation (SCA), penalty function method and difference-convex (DC) programming, UAV trajectory, SIC decoding order, UAV transmit power allocation, PS ratio and IRS reflection coefficient are alternately optimized until the convergence is achieved. Numerical simulation results verify the effectiveness of our proposed algorithm compared to other algorithms. △ Less

Submitted 27 October, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

arXiv:2107.11013 [pdf, other]

Beamforming Design and Power Allocation for Transmissive RMS-based Transmitter Architectures

Authors: Zhendong Li, Wen Chen, Huanqing Cao

Abstract: This letter investigates a downlink multiple input single output (MISO) system based on transmissive reconfigurable metasurface (RMS) transmitter. Specifically, a transmitter design based on a transmissive RMS equipped with a feed antenna is first proposed. Then, in order to maximize the achievable sum-rate of the system, the beamforming design and power allocation are jointly optimized. Since the… ▽ More This letter investigates a downlink multiple input single output (MISO) system based on transmissive reconfigurable metasurface (RMS) transmitter. Specifically, a transmitter design based on a transmissive RMS equipped with a feed antenna is first proposed. Then, in order to maximize the achievable sum-rate of the system, the beamforming design and power allocation are jointly optimized. Since the optimization variables are coupled, this formulated optimization problem is non-convex, so it is difficult to solve it directly. To solve this problem, we propose an alternating optimization (AO) technique based on difference-of-convex (DC) programming and successive convex approximation (SCA). Simulation results verify that the proposed algorithm can achieve convergence and improve the achievable sum-rate of the system. △ Less

Submitted 21 September, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

arXiv:2105.06226 [pdf, other]

Robust Beamforming Design and Time Allocation for IRS-assisted Wireless Powered Communication Networks

Authors: Zhendong Li, Wen Chen, Qingqing Wu, Huanqing Cao, Kunlun Wang, Jun Li

Abstract: In this paper, a novel intelligent reflecting surface (IRS)-assisted wireless powered communication network (WPCN) architecture is proposed for power-constrained Internet-of-Things (IoT) smart devices, where IRS is exploited to improve the performance of WPCN under imperfect channel state information (CSI). We formulate a hybrid access point (HAP) transmit energy minimization problem by jointly op… ▽ More In this paper, a novel intelligent reflecting surface (IRS)-assisted wireless powered communication network (WPCN) architecture is proposed for power-constrained Internet-of-Things (IoT) smart devices, where IRS is exploited to improve the performance of WPCN under imperfect channel state information (CSI). We formulate a hybrid access point (HAP) transmit energy minimization problem by jointly optimizing time allocation, HAP energy beamforming, receiving beamforming, user transmit power allocation, IRS energy reflection coefficient and information reflection coefficient under the imperfect CSI and non-linear energy harvesting model. On account of the high coupling of optimization variables, the formulated problem is a non-convex optimization problem that is difficult to solve directly. To address the above-mentioned challenging problem, alternating optimization (AO) technique is applied to decouple the optimization variables to solve the problem. Specifically, through AO, time allocation, HAP energy beamforming, receiving beamforming, user transmit power allocation, IRS energy reflection coefficient and information reflection coefficient are divided into three sub-problems to be solved alternately. The difference-of-convex (DC) programming is used to solve the non-convex rank-one constraint in solving IRS energy reflection coefficient and information reflection coefficient. Numerical simulations verify the superiority of the proposed optimization algorithm in decreasing HAP transmit energy compared with other benchmark schemes. △ Less

Submitted 6 December, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

arXiv:2105.05537 [pdf, other]

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

Authors: Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, Manning Wang

Abstract: In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due… ▽ More In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due to the locality of the convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by 4x, experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes and trained models will be publicly available at https://github.com/HuCaoFighting/Swin-Unet. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: a drafted manuscript

arXiv:2104.14273 [pdf, other]

A Rigid Registration Method in TEVAR

Authors: Meng Li, Changyan Lin, Heng Wu, Jiasong Li, Hongshuai Cao

Abstract: Since the map** relationship between definitized intra-interventional X-ray and undefined pre-interventional Computed Tomography(CT) is uncertain, auxiliary positioning devices or body markers, such as medical implants, are commonly used to determine this relationship. However, such approaches can not be widely used in clinical due to the complex realities. To determine the map** relationship,… ▽ More Since the map** relationship between definitized intra-interventional X-ray and undefined pre-interventional Computed Tomography(CT) is uncertain, auxiliary positioning devices or body markers, such as medical implants, are commonly used to determine this relationship. However, such approaches can not be widely used in clinical due to the complex realities. To determine the map** relationship, and achieve a initializtion post estimation of human body without auxiliary equipment or markers, proposed method applies image segmentation and deep feature matching to directly match the X-ray and CT images. As a result, the well-trained network can directly predict the spatial correspondence between arbitrary X-ray and CT. The experimental results show that when combining our approach with the conventional approach, the achieved accuracy and speed can meet the basic clinical intervention needs, and it provides a new direction for intra-interventional registration. △ Less

Submitted 28 November, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

MSC Class: 68U10

arXiv:2104.12959 [pdf, other]

Realtime Mobile Bandwidth and Handoff Predictions in 4G/5G Networks

Authors: Lifan Mei, **rui Gou, Yu** Cai, Houwei Cao, Yong Liu

Abstract: Mobile apps are increasingly relying on high-throughput and low-latency content delivery, while the available bandwidth on wireless access links is inherently time-varying. The handoffs between base stations and access modes due to user mobility present additional challenges to deliver a high level of user Quality-of-Experience (QoE). The ability to predict the available bandwidth and the upcoming… ▽ More Mobile apps are increasingly relying on high-throughput and low-latency content delivery, while the available bandwidth on wireless access links is inherently time-varying. The handoffs between base stations and access modes due to user mobility present additional challenges to deliver a high level of user Quality-of-Experience (QoE). The ability to predict the available bandwidth and the upcoming handoffs will give applications valuable leeway to make proactive adjustments to avoid significant QoE degradation. In this paper, we explore the possibility and accuracy of realtime mobile bandwidth and handoff predictions in 4G/LTE and 5G networks. Towards this goal, we collect long consecutive traces with rich bandwidth, channel, and context information from public transportation systems. We develop Recurrent Neural Network models to mine the temporal patterns of bandwidth evolution in fixed-route mobility scenarios. Our models consistently outperform the conventional univariate and multivariate bandwidth prediction models. For 4G \& 5G co-existing networks, we propose a new problem of handoff prediction between 4G and 5G, which is important for low-latency applications like self-driving strategy in realistic 5G scenarios. We develop classification and regression based prediction models, which achieve more than 80\% accuracy in predicting 4G and 5G handoffs in a recent 5G dataset. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 12 pages

arXiv:2104.12863 [pdf]

doi 10.1186/s40064-016-2040-9

Advances on image interpolation based on ant colony algorithm

Authors: Olivier Rukundo, Hanqiang Cao

Abstract: This paper presents an advance on image interpolation based on ant colony algorithm (AACA) for high-resolution image scaling. The difference between the proposed algorithm and the previously proposed optimization of bilinear interpolation based on ant colony algorithm (OBACA) is that AACA uses global weighting, whereas OBACA uses a local weighting scheme. The strength of the proposed global weight… ▽ More This paper presents an advance on image interpolation based on ant colony algorithm (AACA) for high-resolution image scaling. The difference between the proposed algorithm and the previously proposed optimization of bilinear interpolation based on ant colony algorithm (OBACA) is that AACA uses global weighting, whereas OBACA uses a local weighting scheme. The strength of the proposed global weighting of the AACA algorithm depends on employing solely the pheromone matrix information present on any group of four adjacent pixels to decide which case deserves a maximum global weight value or not. Experimental results are further provided to show the higher performance of the proposed AACA algorithm with reference to the algorithms mentioned in this paper. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 17 pages, 14 figures, 3 tables

Journal ref: SpringerPlus, 5(1), 403, 2016

arXiv:2104.01445 [pdf]

A Dynamics Perspective of Pursuit-Evasion Games of Intelligent Agents with the Ability to Learn

Authors: Hao Xiong, Huanhui Cao, Lin Zhang, Wenjie Lu

Abstract: Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired… ▽ More Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired dynamics formulation of a pursuit-evasion game and baseline pursuit and evasion strategies are introduced at first. Then, reinforcement learning techniques are used to mimic the ability of intelligent agents to learn from experience. Based on the dynamics formulation and reinforcement learning techniques, the effects of improving both pursuit and evasion strategies based on experience on pursuit-evasion games are investigated at two levels 1) individual runs and 2) ranges of the parameters of pursuit-evasion games. Results of the investigation are consistent with nature observations and the natural law - survival of the fittest. More importantly, with respect to the result of a pursuit-evasion game of agents with baseline strategies, this study achieves a different result. It is shown that, in a pursuit-evasion game with a dynamics formulation, an evader is not able to escape from a slightly faster pursuer with an effective learned pursuit strategy, based on agile maneuvers and an effective learned evasion strategy. △ Less

Submitted 3 April, 2021; originally announced April 2021.

arXiv:2103.08888 [pdf, other]

AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing

Authors: Pengqi Lu, Liang Yuan, Yunquan Zhang, Hang Cao, Kun Li

Abstract: Stream applications are widely deployed on the cloud. While modern distributed streaming systems like Flink and Spark Streaming can schedule and execute them efficiently, streaming dataflows are often dynamically changing, which may cause computation imbalance and backpressure. We introduce AutoFlow, an automatic, hotspot-aware dynamic load balance system for streaming dataflows. It incorporates a… ▽ More Stream applications are widely deployed on the cloud. While modern distributed streaming systems like Flink and Spark Streaming can schedule and execute them efficiently, streaming dataflows are often dynamically changing, which may cause computation imbalance and backpressure. We introduce AutoFlow, an automatic, hotspot-aware dynamic load balance system for streaming dataflows. It incorporates a centralized scheduler which monitors the load balance in the entire dataflow dynamically and implements state migrations correspondingly. The scheduler achieves these two tasks using a simple asynchronous distributed control message mechanism and a hotspot-diminishing algorithm. The timing mechanism supports implicit barriers and a highly efficient state-migration without global barriers or pauses to operators. It also supports a time-window based load-balance measurement and feeds them to the hotspot-diminishing algorithm without user interference. We implemented AutoFlow on top of Ray, an actor-based distributed execution framework. Our evaluation based on various streaming benchmark dataset shows that AutoFlow achieves good load-balance and incurs a low latency overhead in highly data-skew workload. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2102.12173 [pdf]

Deep learning-based framework for cardiac function assessment in embryonic zebrafish from heart beating videos

Authors: Amir Mohammad Naderi, Haisong Bu, **gcheng Su, Mao-Hsiang Huang, Khuong Vo, Ramses Seferino Trigo Torres, J. -C. Chiao, Juhyun Lee, Michael P. H. Lau, Xiaolei Xu, Hung Cao

Abstract: Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validate… ▽ More Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validated a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) based on a U-net deep learning model for automated assessment of cardiovascular indices, such as ejection fraction (EF) and fractional shortening (FS) from microscopic videos of wildtype and cardiomyopathy mutant zebrafish embryos. Our approach yielded favorable performance with accuracy above 90% compared with manual processing. We used only black and white regular microscopic recordings with frame rates of 5-20 frames per second (fps); thus, the framework could be widely applicable with any laboratory resources and infrastructure. Most importantly, the automatic feature holds promise to enable efficient, consistent and reliable processing and analysis capacity for large amounts of videos, which can be generated by diverse collaborating teams. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.05490 [pdf, other]

doi 10.1016/j.nahs.2021.101110

Safe-visor Architecture for Sandboxing (AI-based) Unverified Controllers in Stochastic Cyber-Physical Systems

Authors: Bingzhuo Zhong, Abolfazl Lavaei, Hongpeng Cao, Majid Zamani, Marco Caccamo

Abstract: High performance but unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, are widely employed in cyber-physical systems (CPSs) to accomplish complex control missions. However, guaranteeing the safety and reliability of CPSs with this kind of controllers is currently very challenging, which is of vital importance in many real-life safety-critical applications.… ▽ More High performance but unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, are widely employed in cyber-physical systems (CPSs) to accomplish complex control missions. However, guaranteeing the safety and reliability of CPSs with this kind of controllers is currently very challenging, which is of vital importance in many real-life safety-critical applications. To cope with this difficulty, we propose in this work a Safe-visor architecture for sandboxing unverified controllers in CPSs operating in noisy environments (a.k.a. stochastic CPSs). The proposed architecture contains a history-based supervisor, which checks inputs from the unverified controller and makes a compromise between functionality and safety of the system, and a safety advisor that provides fallback when the unverified controller endangers the safety of the system. Both the history-based supervisor and the safety advisor are designed based on an approximate probabilistic relation between the original system and its finite abstraction. By employing this architecture, we provide formal probabilistic guarantees on preserving the safety specifications expressed by accepting languages of deterministic finite automata (DFA). Meanwhile, the unverified controllers can still be employed in the control loop even though they are not reliable. We demonstrate the effectiveness of our proposed results by applying them to two (physical) case studies. △ Less

Submitted 19 August, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

arXiv:2101.10869 [pdf]

A Raspberry Pi-based Traumatic Brain Injury Detection System for Single-Channel Electroencephalogram

Authors: Navjodh Singh Dhillon, Agustinus Sutandi, Manoj Vishwanath, Miranda M. Lim, Hung Cao, Dong Si

Abstract: Traumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and… ▽ More Traumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and portable systems for early detection of TBI. This work describes a Raspberry Pi based portable, real-time data acquisition, and automated processing system that uses machine learning to efficiently identify TBI and automatically score sleep stages from a single-channel Electroen-cephalogram (EEG) signal. We discuss the design, implementation, and verification of the system that can digitize EEG signal using an Analog to Digital Converter (ADC) and perform real-time signal classification to detect the presence of mild TBI (mTBI). We utilize Convolutional Neural Networks (CNN) and XGBoost based predictive models to evaluate the performance and demonstrate the versatility of the system to operate with multiple types of predictive models. We achieve a peak classification accuracy of more than 90% with a classification time of less than 1 s across 16 s - 64 s epochs for TBI vs control conditions. This work can enable development of systems suitable for field use without requiring specialized medical equipment for early TBI detection applications and TBI research. Further, this work opens avenues to implement connected, real-time TBI related health and wellness monitoring systems. △ Less

Submitted 29 January, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: 12 pages, 6 figures

arXiv:2009.00857 [pdf]

Breast mass detection in digital mammography based on anchor-free architecture

Authors: Haichao Cao

Abstract: Background and Objective: Accurate detection of breast masses in mammography images is critical to diagnose early breast cancer, which can greatly improve the patients survival rate. However, it is still a big challenge due to the heterogeneity of breast masses and the complexity of their surrounding environment.Methods: To address these problems, we propose a one-stage object detection architectu… ▽ More Background and Objective: Accurate detection of breast masses in mammography images is critical to diagnose early breast cancer, which can greatly improve the patients survival rate. However, it is still a big challenge due to the heterogeneity of breast masses and the complexity of their surrounding environment.Methods: To address these problems, we propose a one-stage object detection architecture, called Breast Mass Detection Network (BMassDNet), based on anchor-free and feature pyramid which makes the detection of breast masses of different sizes well adapted. We introduce a truncation normalization method and combine it with adaptive histogram equalization to enhance the contrast between the breast mass and the surrounding environment. Meanwhile, to solve the overfitting problem caused by small data size, we propose a natural deformation data augmentation method and mend the train data dynamic updating method based on the data complexity to effectively utilize the limited data. Finally, we use transfer learning to assist the training process and to improve the robustness of the model ulteriorly.Results: On the INbreast dataset, each image has an average of 0.495 false positives whilst the recall rate is 0.930; On the DDSM dataset, when each image has 0.599 false positives, the recall rate reaches 0.943.Conclusions: The experimental results on datasets INbreast and DDSM show that the proposed BMassDNet can obtain competitive detection performance over the current top ranked methods. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: 26 pages, 12 figures

arXiv:2004.04901 [pdf, ps, other]

Accurate DOA Estimation Based on Real-Valued Singular Value Decomposition

Authors: Hui Cao, Qi Liu

Abstract: In this paper, an accurate direction-of-arrival (DOA) estimator is developed based on the real-valued singular value decomposition (SVD) of covariance matrix. Unitary transform on the complex-valued covariance matrix is first applied, and then SVD performs on the resulting real-valued data matrix. The singular vector is then utilized with a weighted least squares (WLS) method to achieve DOA estima… ▽ More In this paper, an accurate direction-of-arrival (DOA) estimator is developed based on the real-valued singular value decomposition (SVD) of covariance matrix. Unitary transform on the complex-valued covariance matrix is first applied, and then SVD performs on the resulting real-valued data matrix. The singular vector is then utilized with a weighted least squares (WLS) method to achieve DOA estimation. The performance of the proposed algorithm is compared with several state-of-the-art methods as well as the CRB. The results indicate the accuracy and effectiveness of the proposed method. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: 4 pages, 5 figures, IEEE International Conference on Signal, Information and Data Processing 2019

arXiv:2003.07065 [pdf]

Fast implementation of synchrosqueezing transform based on downsampling for large-scale vibration signal analysis

Authors: Dong He, Hongrui Cao

Abstract: Synchrosqueezing transform (SST) is a useful tool for vibration signal analysis due to its high time-frequency (TF) concentration and reconstruction properties. However, existing SST requires much processing time for large-scale data. In this paper, some fast implementation methods of SST based on downsampled short-time Fourier transform (STFT) are proposed. By controlling the downsampling factor… ▽ More Synchrosqueezing transform (SST) is a useful tool for vibration signal analysis due to its high time-frequency (TF) concentration and reconstruction properties. However, existing SST requires much processing time for large-scale data. In this paper, some fast implementation methods of SST based on downsampled short-time Fourier transform (STFT) are proposed. By controlling the downsampling factor both in time and frequency, combined with the proposed selective reassignment and frequency subdivision scheme, one can keep a balance between efficiency and accuracy according to practical needs. Moreover, the reconstruction property is available, accomplished by an approximate but direct inverse formula under downsampling. The effects of parameters on the concentration, computing efficiency, and reconstruction accuracy are also investigated quantitatively, followed by a mathematic model of reassignment behavior with decimate factors. Experimental results on an aero-engine and a spindle show that the fast implementation of SST can effectively characterize the non-stationary characteristics of the large-scale vibration signal to reveal the mechanism of mechanical systems. △ Less

Submitted 16 March, 2020; originally announced March 2020.

arXiv:1910.02533 [pdf, other]

Compressed Video Action Recognition with Refined Motion Vector

Authors: Haoyuan Cao, Shining Yu, Jiashi Feng

Abstract: Although CNN has reached satisfactory performance in image-related tasks, using CNN to process videos is much more challenging due to the enormous size of raw video streams. In this work, we propose to use motion vectors and residuals from modern video compression techniques to effectively learn the representation of the raw frames and greatly remove the temporal redundancy, giving a faster video… ▽ More Although CNN has reached satisfactory performance in image-related tasks, using CNN to process videos is much more challenging due to the enormous size of raw video streams. In this work, we propose to use motion vectors and residuals from modern video compression techniques to effectively learn the representation of the raw frames and greatly remove the temporal redundancy, giving a faster video processing model. Compressed Video Action Recognition(CoViAR) has explored to directly use compressed video to train the deep neural network, where the motion vectors were utilized to present temporal information. However, motion vector is designed for minimizing video size where precise motion information is not obligatory. Compared with optical flow, motion vectors contain noisy and unreliable motion information. Inspired by the mechanism of video compression codecs, we propose an approach to refine the motion vectors where unreliable movement will be removed while temporal information is largely reserved. We prove that replacing the original motion vector with refined one and using the same network as CoViAR has achieved state-of-art performance on the UCF-101 and HMDB-51 with negligible efficiency degrades comparing with original CoViAR. △ Less

Submitted 6 October, 2019; originally announced October 2019.

Comments: 8 pages, 3 figures, 4 tables

arXiv:1905.08413 [pdf]

Dual-branch residual network for lung nodule segmentation

Authors: Haichao Cao, Hong Liu, Enmin Song, Chih-Cheng Hung, Guangzhi Ma, Xiangyang Xu, Renchao **, Jianguo Lu

Abstract: An accurate segmentation of lung nodules in computed tomography (CT) images is critical to lung cancer analysis and diagnosis. However, due to the variety of lung nodules and the similarity of visual characteristics between nodules and their surroundings, a robust segmentation of nodules becomes a challenging problem. In this study, we propose the Dual-branch Residual Network (DB-ResNet) which is… ▽ More An accurate segmentation of lung nodules in computed tomography (CT) images is critical to lung cancer analysis and diagnosis. However, due to the variety of lung nodules and the similarity of visual characteristics between nodules and their surroundings, a robust segmentation of nodules becomes a challenging problem. In this study, we propose the Dual-branch Residual Network (DB-ResNet) which is a data-driven model. Our approach integrates two new schemes to improve the generalization capability of the model: 1) the proposed model can simultaneously capture multi-view and multi-scale features of different nodules in CT images; 2) we combine the features of the intensity and the convolution neural networks (CNN). We propose a pooling method, called the central intensity-pooling layer (CIP), to extract the intensity features of the center voxel of the block, and then use the CNN to obtain the convolutional features of the center voxel of the block. In addition, we designed a weighted sampling strategy based on the boundary of nodules for the selection of those voxels using the weighting score, to increase the accuracy of the model. The proposed method has been extensively evaluated on the LIDC dataset containing 986 nodules. Experimental results show that the DB-ResNet achieves superior segmentation performance with an average dice score of 82.74% on the dataset. Moreover, we compared our results with those of four radiologists on the same dataset. The comparison showed that our average dice score was 0.49% higher than that of human experts. This proves that our proposed method is as good as the experienced radiologist. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: 24 pages, 6 figures

arXiv:1905.03445 [pdf]

Two-Stage Convolutional Neural Network Architecture for Lung Nodule Detection

Authors: Haichao Cao, Hong Liu, Enmin Song, Guangzhi Ma, Xiangyang Xu, Renchao **, Tengying Liu, Chih-Cheng Hung

Abstract: Early detection of lung cancer is an effective way to improve the survival rate of patients. It is a critical step to have accurate detection of lung nodules in computed tomography (CT) images for the diagnosis of lung cancer. However, due to the heterogeneity of the lung nodules and the complexity of the surrounding environment, robust nodule detection has been a challenging task. In this study,… ▽ More Early detection of lung cancer is an effective way to improve the survival rate of patients. It is a critical step to have accurate detection of lung nodules in computed tomography (CT) images for the diagnosis of lung cancer. However, due to the heterogeneity of the lung nodules and the complexity of the surrounding environment, robust nodule detection has been a challenging task. In this study, we propose a two-stage convolutional neural network (TSCNN) architecture for lung nodule detection. The CNN architecture in the first stage is based on the improved UNet segmentation network to establish an initial detection of lung nodules. Simultaneously, in order to obtain a high recall rate without introducing excessive false positive nodules, we propose a novel sampling strategy, and use the offline hard mining idea for training and prediction according to the proposed cascaded prediction method. The CNN architecture in the second stage is based on the proposed dual pooling structure, which is built into three 3D CNN classification networks for false positive reduction. Since the network training requires a significant amount of training data, we adopt a data augmentation method based on random mask. Furthermore, we have improved the generalization ability of the false positive reduction model by means of ensemble learning. The proposed method has been experimentally verified on the LUNA dataset. Experimental results show that the proposed TSCNN architecture can obtain competitive detection performance. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: 29 pages, 10 figures

arXiv:1506.05171 [pdf, other]

doi 10.1371/journal.pcbi.1004688

On the Origins and Control of Community Types in the Human Microbiome

Authors: Travis E. Gibson, Amir Bashan, Hong-Tai Cao, Scott T. Weiss, Yang-Yu Liu

Abstract: Microbiome-based stratification of healthy individuals into compositional categories, referred to as "community types", holds promise for drastically improving personalized medicine. Despite this potential, the existence of community types and the degree of their distinctness have been highly debated. Here we adopted a dynamic systems approach and found that heterogeneity in the interspecific inte… ▽ More Microbiome-based stratification of healthy individuals into compositional categories, referred to as "community types", holds promise for drastically improving personalized medicine. Despite this potential, the existence of community types and the degree of their distinctness have been highly debated. Here we adopted a dynamic systems approach and found that heterogeneity in the interspecific interactions or the presence of strongly interacting species is sufficient to explain community types, independent of the topology of the underlying ecological network. By controlling the presence or absence of these strongly interacting species we can steer the microbial ecosystem to any desired community type. This open-loop control strategy still holds even when the community types are not distinct but appear as dense regions within a continuous gradient. This finding can be used to develop viable therapeutic strategies for shifting the microbial composition to a healthy configuration △ Less

Submitted 21 January, 2016; v1 submitted 16 June, 2015; originally announced June 2015.

Comments: Main Text, Figures, Methods, Supplementary Figures, and Supplementary Text

arXiv:1410.8119 [pdf, ps, other]

Black-box Modeling and Compensation of Bursty Communication Signals in RF Power Amplifiers with Power-Dependent Parameters

Authors: Ali Soltani Tehrani, Haiying Cao, Thomas Eriksson, Christian Fager

Abstract: This paper presents a new black-box technique for modeling long term memory effects in radio frequency power amplifiers. The proposed technique extends commonly used behavioral models by utilizing parameters that dynamically change depending on a long term memory effect while kee** the original model structure intact. This enables us to accurately track and model transient changes in power ampli… ▽ More This paper presents a new black-box technique for modeling long term memory effects in radio frequency power amplifiers. The proposed technique extends commonly used behavioral models by utilizing parameters that dynamically change depending on a long term memory effect while kee** the original model structure intact. This enables us to accurately track and model transient changes in power amplifier characteristics that vary slowly and are induced by the input signal. Identification of long term memory effects is discussed and an iterative identification algorithm for the model parameters is proposed. The model is experimentally tested on a 100 Watt Doherty power amplifier with a 4 MHz Gaussian noise signal that has a step--like change in the amplitude, representative of a realistic communication signal with bursty behavior and a 20 MHz 3GPP LTE test data. Results of behavioral modeling show a 2-2.5 dB and 5-6 dB improvement in average and peak NMSE modeling performance respectively, which shows the suitability of the technique to model bursty signals. △ Less

Submitted 29 October, 2014; originally announced October 2014.

Showing 1–39 of 39 results for author: Cao, H