-
Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
Authors:
Zuheng Kang,
Yayun He,
Jianzong Wang,
Junqing Peng,
**g Xiao
Abstract:
Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp…
▽ More
Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Retrieval-Augmented Audio Deepfake Detection
Authors:
Zuheng Kang,
Yayun He,
Botao Zhao,
Xiaoyang Qu,
Junqing Peng,
**g Xiao,
Jianzong Wang
Abstract:
With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired…
▽ More
With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA sets. Further sample analysis indicates that the retriever consistently retrieves samples mostly from the same speaker with acoustic characteristics highly consistent with the query audio, thereby improving detection performance.
△ Less
Submitted 23 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Authors:
Yayun He,
Zuheng Kang,
Jianzong Wang,
Junqing Peng,
**g Xiao
Abstract:
Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusio…
▽ More
Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusion model-based sample generator that leverages SE guidance to augment the speech features based on a short utterance. Extensive experimental results on the VoxCeleb1 dataset show that our method outperforms the baseline, with relative improvements in equal error rate (EER) of 46.1%, 35.7%, 10.4%, and 5.7% for the short utterance conditions of 0.5, 1.0, 1.5, and 2.0 seconds, respectively.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Double-Active-IRS Aided Wireless Communication: Deployment Optimization and Capacity Scaling
Authors:
Zhenyu Kang,
Changsheng You,
Rui Zhang
Abstract:
In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate…
▽ More
In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate maximization problem subject to practical constraints on the reflection design, elements allocation, and placement of active IRSs. To solve this non-convex problem, we first obtain the optimal active-IRS reflections and BS beamforming, based on which we then jointly optimize the active-IRS elements allocation and placement by using the alternating optimization (AO) method. Moreover, we show that given the fixed per-element amplification power, the received signal-to-noise ratio (SNR) at the user increases asymptotically with the square of the number of reflecting elements; while given the fixed number of reflecting elements, the SNR does not increase with the per-element amplification power when it is asymptotically large. Last, numerical results are presented to validate the effectiveness of the proposed AO-based algorithm and compare the rate performance of the considered double-active-IRS aided wireless system with various benchmark systems.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
SVVAD: Personal Voice Activity Detection for Speaker Verification
Authors:
Zuheng Kang,
Jianzong Wang,
Junqing Peng,
**g Xiao
Abstract:
Voice activity detection (VAD) improves the performance of speaker verification (SV) by preserving speech segments and attenuating the effects of non-speech. However, this scheme is not ideal: (1) it fails in noisy environments or multi-speaker conversations; (2) it is trained based on inaccurate non-SV sensitive labels. To address this, we propose a speaker verification-based voice activity detec…
▽ More
Voice activity detection (VAD) improves the performance of speaker verification (SV) by preserving speech segments and attenuating the effects of non-speech. However, this scheme is not ideal: (1) it fails in noisy environments or multi-speaker conversations; (2) it is trained based on inaccurate non-SV sensitive labels. To address this, we propose a speaker verification-based voice activity detection (SVVAD) framework that can adapt the speech features according to which are most informative for SV. To achieve this, we introduce a label-free training method with triplet-like losses that completely avoids the performance degradation of SV due to incorrect labeling. Extensive experiments show that SVVAD significantly outperforms the baseline in terms of equal error rate (EER) under conditions where other speakers are mixed at different ratios. Moreover, the decision boundaries reveal the importance of the different parts of speech, which are largely consistent with human judgments.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification
Authors:
Zuheng Kang,
Yayun He,
Jianzong Wang,
Junqing Peng,
Xiaoyang Qu,
**g Xiao
Abstract:
Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major breakthroughs in computer vision. Despite promising results, the technique has not been well applied to audio and signal processing. Due to the variable duration of audio signals, it has its own unique way of modeling. In this work, we propose feature-rich audio model i…
▽ More
Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major breakthroughs in computer vision. Despite promising results, the technique has not been well applied to audio and signal processing. Due to the variable duration of audio signals, it has its own unique way of modeling. In this work, we propose feature-rich audio model inversion (FRAMI), a data-free knowledge distillation framework for general sound classification tasks. It first generates high-quality and feature-rich Mel-spectrograms through a feature-invariant contrastive loss. Then, the hidden states before and after the statistics pooling layer are reused when knowledge distillation is performed on these feature-rich samples. Experimental results on the Urbansound8k, ESC-50, and audioMNIST datasets demonstrate that FRAMI can generate feature-rich samples. Meanwhile, the accuracy of the student model is further improved by reusing the hidden state and significantly outperforms the baseline method.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Active-IRS-Aided Wireless Communication: Fundamentals, Designs and Open Issues
Authors:
Zhenyu Kang,
Changsheng You,
Rui Zhang
Abstract:
Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enabl…
▽ More
Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enables simultaneous signal reflection and amplification, thus significantly extending the signal coverage of passive IRS. We first present the fundamentals of active IRS, including its hardware architecture, signal and channel models, as well as practical constraints, in comparison with those of passive IRS. Then, we discuss new considerations and open issues in designing active-IRS-aided wireless communications, such as the reflection optimization, channel estimation, and deployment for active IRS, as well as its integrated design with passive IRS. Finally, numerical results are provided to show the potential performance gains of active IRS as compared to passive IRS and traditional active relay.
△ Less
Submitted 25 June, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Authors:
Zuheng Kang,
Jianzong Wang,
Junqing Peng,
**g Xiao
Abstract:
Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent adjacent indistinguishable ages well, the uncertainty of the age estimate for each utterance varies from person to person, i.e., the variance of the age distribution is different. To address this issue, we propose selective variance label distribution learning (SVLDL) me…
▽ More
Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent adjacent indistinguishable ages well, the uncertainty of the age estimate for each utterance varies from person to person, i.e., the variance of the age distribution is different. To address this issue, we propose selective variance label distribution learning (SVLDL) method to adapt the variance of different age distributions. Furthermore, the model uses WavLM as the speech feature extractor and adds the auxiliary task of gender recognition to further improve the performance. Two tricks are applied on the loss function to enhance the robustness of the age estimation and improve the quality of the fitted age distribution. Extensive experiments show that the model achieves state-of-the-art performance on all aspects of the NIST SRE08-10 and a real-world datasets.
△ Less
Submitted 16 November, 2022; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Active-Passive IRS aided Wireless Communication: New Hybrid Architecture and Elements Allocation Optimization
Authors:
Zhenyu Kang,
Changsheng You,
Rui Zhang
Abstract:
Intelligent reflecting surface (IRS) has emerged as a promising technology to enhance the wireless communication network coverage and capacity by dynamically controlling the radio signal propagation environment. In contrast to the existing works that considered active or passive IRS only, we propose in this paper a new hybrid active-passive IRS architecture that consists of both active and passive…
▽ More
Intelligent reflecting surface (IRS) has emerged as a promising technology to enhance the wireless communication network coverage and capacity by dynamically controlling the radio signal propagation environment. In contrast to the existing works that considered active or passive IRS only, we propose in this paper a new hybrid active-passive IRS architecture that consists of both active and passive reflecting elements, thus achieving their combined advantages flexibly. Under a practical channel setup with Rician fading where only the statistical channel state information (CSI) is available, we study the hybrid IRS design in a multi-user communication system. Specifically, we formulate an optimization problem to maximize the achievable ergodic capacity of the worst-case user by designing the hybrid IRS beamforming and active/passive elements allocation based on the statistical CSI, subject to various practical constraints on the active-element amplification factor and amplification power consumption, as well as the total active and passive elements deployment budget. To solve this challenging problem, we first approximate the ergodic capacity in a simpler form and then propose an efficient algorithm to solve the problem optimally. Moreover, we show that for the special case with all channels to be line-of-sight (LoS), only active elements need to be deployed when the total deployment budget is sufficiently small, while both active and passive elements should be deployed with a decreasing number ratio when the budget increases and exceeds a certain threshold. Finally, numerical results are presented which demonstrate the performance gains of the proposed hybrid IRS architecture and its optimal design over the conventional schemes with active/passive IRS only under various practical system setups.
△ Less
Submitted 4 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Generative Anomaly Detection for Time Series Datasets
Authors:
Zhuangwei Kang,
Ayan Mukhopadhyay,
Aniruddha Gokhale,
Shijie Wen,
Abhishek Dubey
Abstract:
Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of mul…
▽ More
Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of multivariate time series (MTS). However, existing works are either not scalable or unable to capture the spatial-temporal information in MTS simultaneously. To this end, we propose a principled and comprehensive framework consisting of a data-driven generative approach that can perform tractable density estimation for detecting traffic anomalies. Our approach first clusters segments in the feature space and then uses conditional normalizing flow to identify anomalous temporal snapshots at the cluster level in an unsupervised setting. Then, we identify anomalies at the segment level by using a kernel density estimator on the anomalous cluster. Extensive experiments on synthetic datasets show that our approach significantly outperforms several state-of-the-art congestion anomaly detection and diagnosis methods in terms of Recall and F1-Score. We also use the generative model to sample labeled data, which can train classifiers in a supervised setting, alleviating the lack of labeled data for anomaly detection in sparse settings.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Authors:
Zuheng Kang,
Junqing Peng,
Jianzong Wang,
**g Xiao
Abstract:
Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion In…
▽ More
Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0% and 6.5% improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral) also show the proposed method achieves a state-of-the-art of both weighted accuracy (WA) of 78.16% and unweighted accuracy (UA) of 77.47%.
△ Less
Submitted 27 July, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Expression-preserving face frontalization improves visually assisted speech processing
Authors:
Zhiqi Kang,
Mostafa Sadeghi,
Radu Horaud,
Xavier Alameda-Pineda
Abstract:
Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translatio…
▽ More
Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the generalized Student t-distribution in combination with a linear dynamic system in order to account for both rigid head motions and time-varying facial deformations caused by speech production. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability of the method to preserve facial expressions. The method is thoroughly evaluated and compared with several state of the art methods, either based on traditional geometric models or on deep learning. Moreover, we show that the method, when incorporated into deep learning pipelines, namely lip reading and speech enhancement, improves word recognition and speech intelligibilty scores by a considerable margin. Supplemental material is accessible at https://team.inria.fr/robotlearn/research/facefrontalization/
△ Less
Submitted 15 December, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
The impact of removing head movements on audio-visual speech enhancement
Authors:
Zhiqi Kang,
Mostafa Sadeghi,
Radu Horaud,
Xavier Alameda-Pineda,
Jacob Donley,
Anurag Kumar
Abstract:
This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to…
▽ More
This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to use robust face frontalization (RFF) in combination with an AVSE method based on a variational auto-encoder (VAE) model. We briefly describe the basic ingredients of the proposed pipeline and we perform experiments with a recently released audio-visual dataset. In the light of these experiments, and based on three standard metrics, namely STOI, PESQ and SI-SDR, we conclude that RFF improves the performance of AVSE by a considerable margin.
△ Less
Submitted 2 February, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Hyperspectral Image Denoising Using Non-convex Local Low-rank and Sparse Separation with Spatial-Spectral Total Variation Regularization
Authors:
Chong Peng,
Yang Liu,
Yongyong Chen,
Xinxin Wu,
Andrew Cheng,
Zhao Kang,
Chenglizhao Chen,
Qiang Cheng
Abstract:
In this paper, we propose a novel nonconvex approach to robust principal component analysis for HSI denoising, which focuses on simultaneously develo** more accurate approximations to both rank and column-wise sparsity for the low-rank and sparse components, respectively. In particular, the new method adopts the log-determinant rank approximation and a novel $\ell_{2,\log}$ norm, to restrict the…
▽ More
In this paper, we propose a novel nonconvex approach to robust principal component analysis for HSI denoising, which focuses on simultaneously develo** more accurate approximations to both rank and column-wise sparsity for the low-rank and sparse components, respectively. In particular, the new method adopts the log-determinant rank approximation and a novel $\ell_{2,\log}$ norm, to restrict the local low-rank or column-wisely sparse properties for the component matrices, respectively. For the $\ell_{2,\log}$-regularized shrinkage problem, we develop an efficient, closed-form solution, which is named $\ell_{2,\log}$-shrinkage operator. The new regularization and the corresponding operator can be generally used in other problems that require column-wise sparsity. Moreover, we impose the spatial-spectral total variation regularization in the log-based nonconvex RPCA model, which enhances the global piece-wise smoothness and spectral consistency from the spatial and spectral views in the recovered HSI. Extensive experiments on both simulated and real HSIs demonstrate the effectiveness of the proposed method in denoising HSIs.
△ Less
Submitted 8 January, 2022;
originally announced January 2022.
-
Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
David Plowman,
Samarth Shukla,
Radu Timofte,
Ziyu Zhang,
Yicheng Wang,
Zilong Huang,
Guozhong Luo,
Gang Yu,
Bin Fu,
Yiran Wang,
Xingyi Li,
Min Shi,
Ke Xian,
Zhiguo Cao,
**-Hua Du,
Pei-Lin Wu,
Chao Ge,
Jiaoyang Yao,
Fangwen Tu,
Bo Li,
Jung Eun Yoo,
Kwanggyoon Seo,
Jialei Xu
, et al. (13 additional authors not shown)
Abstract:
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d…
▽ More
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
IRS-Aided Wireless Relaying: Optimal Deployment and Capacity Scaling
Authors:
Zhenyu Kang,
Changsheng You,
Rui Zhang
Abstract:
In this letter, we consider an intelligent reflecting surface (IRS)-aided wireless relaying system, where a decode-and-forward relay (R) is employed to forward data from a source (S) to a destination (D), aided by M passive reflecting elements. We consider two practical IRS deployment strategies, namely, single-IRS deployment where all reflecting elements are mounted on one single IRS that is depl…
▽ More
In this letter, we consider an intelligent reflecting surface (IRS)-aided wireless relaying system, where a decode-and-forward relay (R) is employed to forward data from a source (S) to a destination (D), aided by M passive reflecting elements. We consider two practical IRS deployment strategies, namely, single-IRS deployment where all reflecting elements are mounted on one single IRS that is deployed near S, R, or D, and multi-IRS deployment where the reflecting elements are allocated over three separate IRSs which are deployed near S, R, and D, respectively. Under the line-of-sight (LoS) channel model, we characterize the capacity scaling orders with respect to an increasing M for the IRS-aided relay system with different IRS deployment strategies. For single-IRS deployment, we show that deploying the IRS near R achieves the highest capacity as compared to that near S or D. While for multi-IRS deployment, we propose a practical cooperative IRS passive beamforming design which is analytically shown to achieve a larger capacity scaling order than the single-IRS deployment (i.e., near R or S/D) when M is sufficiently large. Numerical examples are provided, which validate our theoretical results.
△ Less
Submitted 27 October, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Enabling Smart Reflection in Integrated Air-Ground Wireless Network: IRS Meets UAV
Authors:
Changsheng You,
Zhenyu Kang,
Yong Zeng,
Rui Zhang
Abstract:
Intelligent reflecting surface (IRS) and unmanned aerial vehicle (UAV) have emerged as two promising technologies to boost the performance of wireless communication networks, by proactively altering the wireless communication channels via smart signal reflection and maneuver control, respectively. However, they face different limitations in practice, which restrain their future applications. In th…
▽ More
Intelligent reflecting surface (IRS) and unmanned aerial vehicle (UAV) have emerged as two promising technologies to boost the performance of wireless communication networks, by proactively altering the wireless communication channels via smart signal reflection and maneuver control, respectively. However, they face different limitations in practice, which restrain their future applications. In this article, we propose new methods to jointly apply IRS and UAV in integrated air-ground wireless networks by exploiting their complementary advantages. Specifically, terrestrial IRS is used to enhance the UAV-ground communication performance, while UAV-mounted IRS is employed to assist in the terrestrial communication. We present their promising application scenarios, new communication design issues as well as potential solutions. In particular, we show that it is practically beneficial to deploy both the terrestrial and aerial IRSs in future wireless networks to reap the benefits of smart reflections in three-dimensional (3D) space.
△ Less
Submitted 29 March, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Plant and Controller Optimization for Power and Energy Systems with Model Predictive Control
Authors:
Donald J. Docimo,
Ziliang Kang,
Kai A. James,
Andrew G. Alleyne
Abstract:
This article explores the optimization of plant characteristics and controller parameters for electrified mobility. Electrification of mobile transportation systems, such as automobiles and aircraft, presents the ability to improve key performance metrics such as efficiency and cost. However, the strong bidirectional coupling between electrical and thermal dynamics within new components creates in…
▽ More
This article explores the optimization of plant characteristics and controller parameters for electrified mobility. Electrification of mobile transportation systems, such as automobiles and aircraft, presents the ability to improve key performance metrics such as efficiency and cost. However, the strong bidirectional coupling between electrical and thermal dynamics within new components creates integration challenges, increasing component degradation and reducing performance. Diminishing these issues requires novel plant designs and control strategies. The electrified mobility literature provides prior studies on plant and controller optimization, known as control co-design (CCD). A void within these studies is the lack of model predictive control (MPC), recognized to manage multi-domain dynamics for electrified systems, within CCD frameworks. This article addresses this through three contributions. First, a thermo-electro-mechanical hybrid electric vehicle (HEV) model is developed that is suitable for both plant optimization and MPC. Second, simultaneous plant and controller optimization is performed for this multi-domain system. Third, MPC is integrated within a CCD framework using the candidate HEV model. Results indicate that optimizing both the plant and MPC parameters simultaneously can reduce physical component sizes by over 60% and key performance metric errors by over 50%.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
A Noise Filter for Dynamic Vision Sensors using Self-adjusting Threshold
Authors:
Shasha Guo,
Ziyang Kang,
Lei Wang,
Limeng Zhang,
Xiaofan Chen,
Shiming Li,
Weixia Xu
Abstract:
Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imagers. However, they are sensitive to background activity (BA) events which are unwanted. we propose a new criterion with little computation overhead for defining real events and BA events by utilizing the global space and time information rather than the local inform…
▽ More
Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imagers. However, they are sensitive to background activity (BA) events which are unwanted. we propose a new criterion with little computation overhead for defining real events and BA events by utilizing the global space and time information rather than the local information by Gaussian convolution, which can be also used as a filter. We denote the filter as GF. We demonstrate GF on three datasets, each recorded by a different DVS with different output size. The experimental results show that our filter produces the clearest frames compared with baseline filters and run fast.
△ Less
Submitted 1 June, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.