-
Learning Autonomous Race Driving with Action Map** Reinforcement Learning
Authors:
Yuanda Wang,
Xin Yuan,
Changyin Sun
Abstract:
Autonomous race driving poses a complex control challenge as vehicles must be operated at the edge of their handling limits to reduce lap times while respecting physical and safety constraints. This paper presents a novel reinforcement learning (RL)-based approach, incorporating the action map** (AM) mechanism to manage state-dependent input constraints arising from limited tire-road friction. A…
▽ More
Autonomous race driving poses a complex control challenge as vehicles must be operated at the edge of their handling limits to reduce lap times while respecting physical and safety constraints. This paper presents a novel reinforcement learning (RL)-based approach, incorporating the action map** (AM) mechanism to manage state-dependent input constraints arising from limited tire-road friction. A numerical approximation method is proposed to implement AM, addressing the complex dynamics associated with the friction constraints. The AM mechanism also allows the learned driving policy to be generalized to different friction conditions. Experimental results in our developed race simulator demonstrate that the proposed AM-RL approach achieves superior lap times and better success rates compared to the conventional RL-based approaches. The generalization capability of driving policy with AM is also validated in the experiments.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities
Authors:
Yinxiao Zhuo,
Tianqi Mao,
Hao** Li,
Chen Sun,
Zhaocheng Wang,
Zhu Han,
Sheng Chen
Abstract:
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav…
▽ More
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWave) and above, ISAC can realize simultaneous communications and sensing of the ultra-high throughput level and radar resolution with compact design, which relies on directional beamforming against the path loss. With the multi-beam technology, the dual functions of ISAC can be seamlessly incorporated at the beamspace level by unleashing the potential of joint beamforming. To this end, this article investigates the key technologies for multi-beam ISAC system. We begin with an overview of the current state-of-the-art solutions in multi-beam ISAC. Subsequently, a detailed analysis of the advantages associated with the multi-beam ISAC is provided. Additionally, the key technologies for transmitter, channel and receiver of the multi-beam ISAC are introduced. Finally, we explore the challenges and opportunities presented by multi-beam ISAC, offering valuable insights into this emerging field.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Enhancing Energy Efficiency in O-RAN Through Intelligent xApps Deployment
Authors:
Xuanyu Liang,
Ahmed Al-Tahmeesschi,
Qiao Wang,
Swarna Chetty,
Chenrui Sun,
Hamed Ahmadi
Abstract:
The proliferation of 5G technology presents an unprecedented challenge in managing the energy consumption of densely deployed network infrastructures, particularly Base Stations (BSs), which account for the majority of power usage in mobile networks. The O-RAN architecture, with its emphasis on open and intelligent design, offers a promising framework to address the Energy Efficiency (EE) demands…
▽ More
The proliferation of 5G technology presents an unprecedented challenge in managing the energy consumption of densely deployed network infrastructures, particularly Base Stations (BSs), which account for the majority of power usage in mobile networks. The O-RAN architecture, with its emphasis on open and intelligent design, offers a promising framework to address the Energy Efficiency (EE) demands of modern telecommunication systems. This paper introduces two xApps designed for the O-RAN architecture to optimize power savings without compromising the Quality of Service (QoS). Utilizing a commercial RAN Intelligent Controller (RIC) simulator, we demonstrate the effectiveness of our proposed xApps through extensive simulations that reflect real-world operational conditions. Our results show a significant reduction in power consumption, achieving up to 50% power savings with a minimal number of User Equipments (UEs), by intelligently managing the operational state of Radio Cards (RCs), particularly through switching between active and sleep modes based on network resource block usage conditions.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Continuous Transfer Learning for UAV Communication-aware Trajectory Design
Authors:
Chenrui Sun,
Gianluca Fontanesi,
Swarna Bindu Chetty,
Xuanyu Liang,
Berk Canberk,
Hamed Ahmadi
Abstract:
Deep Reinforcement Learning (DRL) emerges as a prime solution for Unmanned Aerial Vehicle (UAV) trajectory planning, offering proficiency in navigating high-dimensional spaces, adaptability to dynamic environments, and making sequential decisions based on real-time feedback. Despite these advantages, the use of DRL for UAV trajectory planning requires significant retraining when the UAV is confron…
▽ More
Deep Reinforcement Learning (DRL) emerges as a prime solution for Unmanned Aerial Vehicle (UAV) trajectory planning, offering proficiency in navigating high-dimensional spaces, adaptability to dynamic environments, and making sequential decisions based on real-time feedback. Despite these advantages, the use of DRL for UAV trajectory planning requires significant retraining when the UAV is confronted with a new environment, resulting in wasted resources and time. Therefore, it is essential to develop techniques that can reduce the overhead of retraining DRL models, enabling them to adapt to constantly changing environments. This paper presents a novel method to reduce the need for extensive retraining using a double deep Q network (DDQN) model as a pretrained base, which is subsequently adapted to different urban environments through Continuous Transfer Learning (CTL). Our method involves transferring the learned model weights and adapting the learning parameters, including the learning and exploration rates, to suit each new environment specific characteristics. The effectiveness of our approach is validated in three scenarios, each with different levels of similarity. CTL significantly improves learning speed and success rates compared to DDQN models initiated from scratch. For similar environments, Transfer Learning (TL) improved stability, accelerated convergence by 65%, and facilitated 35% faster adaptation in dissimilar settings.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization
Authors:
Rui Sun,
Li You,
An-An Lu,
Chen Sun,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov…
▽ More
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By proving that the precoder set satisfying the per-BS power constraints forms a Riemannian submanifold of a linear product manifold, we transform the constrained precoder design problem in Euclidean space to an unconstrained one on the Riemannian submanifold. Riemannian ingredients, including orthogonal projection, Riemannian gradient, retraction and vector transport, of the problem on the Riemannian submanifold are further derived, with which the Riemannian conjugate gradient (RCG) design method is proposed for solving the unconstrained problem. The proposed method avoids the inverses of large dimensional matrices, which is beneficial in practice. The complexity analyses show the high computational efficiency of RCG precoder design. Simulation results demonstrate the numerical superiority of the proposed precoder design and the high efficiency of the UCN mMIMO system.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A Signature Based Approach Towards Global Channel Charting with Ultra Low Complexity
Authors:
Longhai Zhao,
Yunchuan Yang,
Qi Xiong,
He Wang,
Bin Yu,
Feifei Sun,
Chengjun Sun
Abstract:
Channel charting, an unsupervised learning method that learns a low-dimensional representation from channel information to preserve geometrical property of physical space of user equipments (UEs), has drawn many attentions from both academic and industrial communities, because it can facilitate many downstream tasks, such as indoor localization, UE handover, beam management, and so on. However, ma…
▽ More
Channel charting, an unsupervised learning method that learns a low-dimensional representation from channel information to preserve geometrical property of physical space of user equipments (UEs), has drawn many attentions from both academic and industrial communities, because it can facilitate many downstream tasks, such as indoor localization, UE handover, beam management, and so on. However, many previous works mainly focus on charting that only preserves local geometry and use raw channel information to learn the chart, which do not consider the global geometry and are often computationally intensive and very time-consuming. Therefore, in this paper, a novel signature based approach for global channel charting with ultra low complexity is proposed. By using an iterated-integral based method called signature transform, a compact feature map and a novel distance metric are proposed, which enable channel charting with ultra low complexity and preserving both local and global geometry. We demonstrate the efficacy of our method using synthetic and open-source real-field datasets.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Authors:
Chang Sun,
Hong Yang,
Bo Qin
Abstract:
Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, design…
▽ More
Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, designed to more effectively utilize audio features during model training. Central to JEP-KD is the inclusion of a generative network within the embedding layer, which enhances the video encoder's capacity for semantic feature extraction and brings it into closer alignment with the audio features from a pre-trained ASR model's encoder. This approach aims to progressively reduce the performance gap between VSR and ASR. Moreover, a comprehensive multimodal, multistage training regimen for the JEP-KD framework is established, bolstering the robustness and efficacy of the training process. Experiment results demonstrate that JEP-KD significantly improves the performance of VSR models and demonstrates versatility across different VSR platforms, indicating its potential for broader application within other multimodal tasks.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Data-Driven Sliding Mode Control for Partially Unknown Nonlinear Systems
Authors:
Jianglin Lan,
Xianxian Zhao,
Congcong Sun
Abstract:
This paper introduces a new design method for data-driven control of nonlinear systems with partially unknown dynamics and unknown bounded disturbance. Since it is not possible to achieve exact nonlinearity cancellation in the presence of unknown disturbance, this paper adapts the idea of sliding mode control (SMC) to ensure system stability and robustness without assuming that the nonlinearity go…
▽ More
This paper introduces a new design method for data-driven control of nonlinear systems with partially unknown dynamics and unknown bounded disturbance. Since it is not possible to achieve exact nonlinearity cancellation in the presence of unknown disturbance, this paper adapts the idea of sliding mode control (SMC) to ensure system stability and robustness without assuming that the nonlinearity goes to zero faster than the state as in the existing methods. The SMC consists of a data-dependent robust controller ensuring the system state trajectory reach and remain on the sliding surface and a nominal controller solved from a data-dependent semidefinite program (SDP) ensuring robust stability of the state trajectory on the sliding surface. Numerical simulation results demonstrate effectiveness of the proposed data-driven SMC and its superior in terms of robust stability over the existing data-driven control that also uses approximate nonlinearity cancellation.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Human Detection in Realistic Through-the-Wall Environments using Raw Radar ADC Data and Parametric Neural Networks
Authors:
Wei Wang,
Naike Du,
Yuchao Guo,
Chao Sun,
**gyang Liu,
Rencheng Song,
Xiuzhu Ye
Abstract:
The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an…
▽ More
The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an end-to-end through-wall radar human detection network (TWP-CNN), which takes raw radar Analog-to-Digital Converter (ADC) signals without any preprocessing as input. We replace the conventional radar signal processing flow with the proposed DFT-based adaptive feature extraction (DAFE) module. This module employs learnable parameterized 3D complex convolution layers to extract superior feature representations from ADC signals, which is beyond the limitation of traditional preprocessing methods. Additionally, by embedding phase information from radar data within the network and employing multi-task learning, a more accurate detection is achieved. Finally, due to the absence of through-wall radar datasets containing raw ADC data, we gathered a realistic through-wall (RTW) dataset using our in-house developed through-wall radar system. We trained and validated our proposed method on this dataset to confirm its effectiveness and superiority in real through-wall detection scenarios.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Identity information based on human magnetocardiography signals
Authors:
Pengju Zhang,
Chenxi Sun,
Jianwei Zhang,
Hong Guo
Abstract:
We have developed an individual identification system based on magnetocardiography (MCG) signals captured using optically pumped magnetometers (OPMs). Our system utilizes pattern recognition to analyze the signals obtained at different positions on the body, by scanning the matrices composed of MCG signals with a 2*2 window. In order to make use of the spatial information of MCG signals, we transf…
▽ More
We have developed an individual identification system based on magnetocardiography (MCG) signals captured using optically pumped magnetometers (OPMs). Our system utilizes pattern recognition to analyze the signals obtained at different positions on the body, by scanning the matrices composed of MCG signals with a 2*2 window. In order to make use of the spatial information of MCG signals, we transform the signals from adjacent small areas into four channels of a dataset. We further transform the data into time-frequency matrices using wavelet transforms and employ a convolutional neural network (CNN) for classification. As a result, our system achieves an accuracy rate of 97.04% in identifying individuals. This finding indicates that the MCG signal holds potential for use in individual identification systems, offering a valuable tool for personalized healthcare management.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer
Authors:
Qinyu Chen,
Congyi Sun,
Chang Gao,
Shih-Chii Liu
Abstract:
Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram…
▽ More
Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram (EEG) recordings. We report evaluation results from the Spiking Conformer model using the Boston Children's Hospital-MIT (CHB-MIT) EEG dataset. By leveraging spike-based addition operations, the Spiking Conformer significantly reduces the classification computational cost compared to the non-spiking model. Additionally, we introduce an approximate spiking neuron layer to further reduce spike-triggered neuron updates by nearly 38% without sacrificing accuracy. Using raw EEG data as input, the proposed Spiking Conformer achieved an average sensitivity rate of 94.9% and a specificity rate of 99.3% for the seizure detection task, and 96.8%, 89.5% for the seizure prediction task, and needs >10x fewer operations compared to the non-spiking equivalent model.
△ Less
Submitted 21 January, 2024;
originally announced February 2024.
-
A Closed-loop Brain-Machine Interface SoC Featuring a 0.2$μ$J/class Multiplexer Based Neural Network
Authors:
Chao Zhang,
Yongxiang Guo,
Dawid Sheng,
Zhixiong Ma,
Chao Sun,
Yuwei Zhang,
Wenxin Zhao,
Fenyan Zhang,
Tongfei Wang,
Xing Sheng,
Milin Zhang
Abstract:
This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation. The first multiplexer with static assignment based table lookup solution (MUXnet) for multiplier-free NN processor was proposed. A state-of-the-art average a…
▽ More
This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation. The first multiplexer with static assignment based table lookup solution (MUXnet) for multiplier-free NN processor was proposed. A state-of-the-art average accuracy of 82.4% was achieved with an energy consumption of only 0.2$μ$J/class in sleep staging task.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering
Authors:
Chuanneng Sun,
Yu Zhou,
Gueyoung Jung,
Tuyen Xuan Tran,
Dario Pompili
Abstract:
The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility se…
▽ More
The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized map** onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing
Authors:
Longfeng Nie,
Yuntian Chen,
Mengge Du,
Changqi Sun,
Dongxiao Zhang
Abstract:
Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectr…
▽ More
Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectral information from Himawari-8/9 satellite sensors is designed. And a novel, simple and efficient network, named CldNet, is proposed. Compared with widely used semantic segmentation networks, including SegNet, PSPNet, DeepLabV3+, UNet, and ResUnet, our proposed model CldNet with an accuracy of 80.89+-2.18% is state-of-the-art in identifying cloud types and has increased by 32%, 46%, 22%, 2%, and 39%, respectively. With the assistance of auxiliary information (e.g., satellite zenith/azimuth angle, solar zenith/azimuth angle), the accuracy of CldNet-W using visible and near-infrared bands and CldNet-O not using visible and near-infrared bands on the test dataset is 82.23+-2.14% and 73.21+-2.02%, respectively. Meanwhile, the total parameters of CldNet are only 0.46M, making it easy for edge deployment. More importantly, the trained CldNet without any fine-tuning can predict cloud types with higher spatial resolution using satellite spectral data with spatial resolution 0.02°*0.02°, which indicates that CldNet possesses a strong generalization ability. In aggregate, the KBDD framework using CldNet is a highly effective cloud-type identification system capable of providing a high-fidelity, all-day, spatiotemporal cloud-type database for many climate assessment fields.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Authors:
Shansong Liu,
Atin Sakkeer Hussain,
Chenshuo Sun,
Ying Shan
Abstract:
The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both un…
▽ More
The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both understanding and generation using LLMs is still limited and in its nascent stage. To address this gap, we introduce a Multi-modal Music Understanding and Generation (M$^{2}$UGen) framework that integrates LLM's abilities to comprehend and generate music for different modalities. The M$^{2}$UGen framework is purpose-built to unlock creative potential from diverse sources of inspiration, encompassing music, image, and video through the use of pretrained MERT, ViT, and ViViT models, respectively. To enable music generation, we explore the use of AudioLDM 2 and MusicGen. Bridging multi-modal understanding and music generation is accomplished through the integration of the LLaMA 2 model. Furthermore, we make use of the MU-LLaMA model to generate extensive datasets that support text/image/video-to-music generation, facilitating the training of our M$^{2}$UGen framework. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models.
△ Less
Submitted 4 March, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Spec-NeRF: Multi-spectral Neural Radiance Fields
Authors:
Jiabao Li,
Yuqi Li,
Ciliang Sun,
Chong Wang,
**hui Xiang
Abstract:
We propose Multi-spectral Neural Radiance Fields(Spec-NeRF) for jointly reconstructing a multispectral radiance field and spectral sensitivity functions(SSFs) of the camera from a set of color images filtered by different filters. The proposed method focuses on modeling the physical imaging process, and applies the estimated SSFs and radiance field to synthesize novel views of multispectral scenes…
▽ More
We propose Multi-spectral Neural Radiance Fields(Spec-NeRF) for jointly reconstructing a multispectral radiance field and spectral sensitivity functions(SSFs) of the camera from a set of color images filtered by different filters. The proposed method focuses on modeling the physical imaging process, and applies the estimated SSFs and radiance field to synthesize novel views of multispectral scenes. In this method, the data acquisition requires only a low-cost trichromatic camera and several off-the-shelf color filters, making it more practical than using specialized 3D scanning and spectral imaging equipment. Our experiments on both synthetic and real scenario datasets demonstrate that utilizing filtered RGB images with learnable NeRF and SSFs can achieve high fidelity and promising spectral reconstruction while retaining the inherent capability of NeRF to comprehend geometric structures. Code is available at https://github.com/CPREgroup/SpecNeRF-v2.
△ Less
Submitted 14 September, 2023;
originally announced October 2023.
-
Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming
Authors:
Alaa Eddine Chriat,
Chuangchuang Sun
Abstract:
Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this pa…
▽ More
Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Track-before-detect Algorithm based on Cost-reference Particle Filter Bank for Weak Target Detection
Authors:
** Lu,
Guojie Peng,
Weichuan Zhang,
Changming Sun
Abstract:
Detecting weak target is an important and challenging problem in many applications such as radar, sonar etc. However, conventional detection methods are often ineffective in this case because of low signal-to-noise ratio (SNR). This paper presents a track-before-detect (TBD) algorithm based on an improved particle filter, i.e. cost-reference particle filter bank (CRPFB), which turns the problem of…
▽ More
Detecting weak target is an important and challenging problem in many applications such as radar, sonar etc. However, conventional detection methods are often ineffective in this case because of low signal-to-noise ratio (SNR). This paper presents a track-before-detect (TBD) algorithm based on an improved particle filter, i.e. cost-reference particle filter bank (CRPFB), which turns the problem of target detection to the problem of two-layer hypothesis testing. The first layer is implemented by CRPFB for state estimation of possible target. CRPFB has entirely parallel structure, consisting amounts of cost-reference particle filters with different hypothesized prior information. The second layer is to compare a test metric with a given threshold, which is constructed from the output of the first layer and fits GEV distribution. The performance of our proposed TBD algorithm and the existed TBD algorithms are compared according to the experiments on nonlinear frequency modulated (NLFM) signal detection and tracking. Simulation results show that the proposed TBD algorithm has better performance than the state-of-the-arts in detection, tracking, and time efficiency.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming
Authors:
Alaa Eddine Chriat,
Chuangchuang Sun
Abstract:
Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for their deployment in real-world safety-critical systems. However, the perception of the surrounding environment is often subject to stochasticity and further distributional shift from the nominal one. In this paper, we present distributional robust CBF (DR-CBF) to achieve resilience under distribu…
▽ More
Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for their deployment in real-world safety-critical systems. However, the perception of the surrounding environment is often subject to stochasticity and further distributional shift from the nominal one. In this paper, we present distributional robust CBF (DR-CBF) to achieve resilience under distributional shift while kee** the advantages of CBF, such as computational efficacy and forward invariance.
To achieve this goal, we first propose a single-level convex reformulation to estimate the conditional value at risk (CVaR) of the safety constraints under distributional shift measured by a Wasserstein metric, which is by nature tri-level programming. Moreover, to construct a control barrier condition to enforce the forward invariance of the CVaR, the technique of differentiable convex programming is applied to enable differentiation through the optimization layer of CVaR estimation. We also provide an approximate variant of DR-CBF for higher-order systems. Simulation results are presented to validate the chance-constrained safety guarantee under the distributional shift in both first and second-order systems.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Contextual Biasing of Named-Entities with Large Language Models
Authors:
Chuanneng Sun,
Zeeshan Ahmed,
Yingyi Ma,
Zhe Liu,
Lucas Kabela,
Yutong Pang,
Ozlem Kalinli
Abstract:
This paper studies contextual biasing with Large Language Models (LLMs), where during second-pass rescoring additional contextual information is provided to a LLM to boost Automatic Speech Recognition (ASR) performance. We propose to leverage prompts for a LLM without fine tuning during rescoring which incorporate a biasing list and few-shot examples to serve as additional information when calcula…
▽ More
This paper studies contextual biasing with Large Language Models (LLMs), where during second-pass rescoring additional contextual information is provided to a LLM to boost Automatic Speech Recognition (ASR) performance. We propose to leverage prompts for a LLM without fine tuning during rescoring which incorporate a biasing list and few-shot examples to serve as additional information when calculating the score for the hypothesis. In addition to few-shot prompt learning, we propose multi-task training of the LLM to predict both the entity class and the next token. To improve the efficiency for contextual biasing and to avoid exceeding LLMs' maximum sequence lengths, we propose dynamic prompting, where we select the most likely class using the class tag prediction, and only use entities in this class as contexts for next token prediction. Word Error Rate (WER) evaluation is performed on i) an internal calling, messaging, and dictation dataset, and ii) the SLUE-Voxpopuli dataset. Results indicate that biasing lists and few-shot examples can achieve 17.8% and 9.6% relative improvement compared to first pass ASR, and that multi-task training and dynamic prompting can achieve 20.0% and 11.3% relative WER improvement, respectively.
△ Less
Submitted 21 September, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Authors:
Yi Meng,
Xiang Li,
Zhiyong Wu,
Tingtian Li,
Zixun Sun,
Xinyu Xiao,
Chi Sun,
Hui Zhan,
Helen Meng
Abstract:
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but…
▽ More
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but also style irrelevant information. The information irrelevant to speaking style in the text could interfere the reference audio selection and result in improper speaking styles. To improve the reference selection, we propose Contrastive Acoustic-Linguistic Module (CALM) to extract the Style-related Text Feature (STF) from the text. CALM optimizes the correlation between the speaking style embedding and the extracted STF with contrastive learning. Thus, a certain number of the most appropriate reference speeches for the input text are selected by retrieving the speeches with the top STF similarities. Then the style embeddings are weighted summarized according to their STF similarities and used to stylize the synthesized speech of TTS. Experiment results demonstrate the effectiveness of our proposed approach, with both objective evaluations and subjective evaluations on the speaking styles of the synthesized speeches outperform a baseline approach with semantic-feature-based reference selection.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Authors:
Shansong Liu,
Atin Sakkeer Hussain,
Chenshuo Sun,
Ying Shan
Abstract:
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract musi…
▽ More
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress
Authors:
Xinze Li,
Xin Zhang,
Fanfan Lin,
Changjiang Sun,
Kezhi Mao
Abstract:
The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB…
▽ More
The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Authors:
Xinze Li,
Xin Zhang,
Fanfan Lin,
Changjiang Sun,
Kezhi Mao
Abstract:
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to a…
▽ More
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while kee** high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
NLOS Dies Twice: Challenges and Solutions of V2X for Cooperative Perception
Authors:
Lantao Li,
Chen Sun
Abstract:
Multi-agent multi-lidar sensor fusion between connected vehicles for cooperative perception has recently been recognized as the best technique for minimizing the blind zone of individual vehicular perception systems and further enhancing the overall safety of autonomous driving systems. This technique relies heavily on the reliability and availability of vehicle-to-everything (V2X) communication.…
▽ More
Multi-agent multi-lidar sensor fusion between connected vehicles for cooperative perception has recently been recognized as the best technique for minimizing the blind zone of individual vehicular perception systems and further enhancing the overall safety of autonomous driving systems. This technique relies heavily on the reliability and availability of vehicle-to-everything (V2X) communication. In practical sensor fusion application scenarios, the non-line-of-sight (NLOS) issue causes blind zones for not only the perception system but also V2X direct communication. To counteract underlying communication issues, we introduce an abstract perception matrix matching method for quick sensor fusion matching procedures and mobility-height hybrid relay determination procedures, proactively improving the efficiency and performance of V2X communication to serve the upper layer application fusion requirements. To demonstrate the effectiveness of our solution, we design a new simulation framework to consider autonomous driving, sensor fusion and V2X communication in general, paving the way for end-to-end performance evaluation and further solution derivation.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Moving pattern-based modeling using a new type of interval ARX model
Authors:
Chang** Sun
Abstract:
In this paper,firstly,to overcome the shortcoming of traditional ARX model, a new operator between an interval number and a real matrix is defined, and then it is applied to the traditional ARX model to get a new type of structure interval ARX model that can deal with interval data, which is defined as interval ARX model (IARX). Secondly,the IARX model is applied to moving pattern-based modeling.…
▽ More
In this paper,firstly,to overcome the shortcoming of traditional ARX model, a new operator between an interval number and a real matrix is defined, and then it is applied to the traditional ARX model to get a new type of structure interval ARX model that can deal with interval data, which is defined as interval ARX model (IARX). Secondly,the IARX model is applied to moving pattern-based modeling. Finally,to verify the validity of the proposed modeling method,it is applied to a sintering process. The simulation results show the moving pattern-based modeling using the new type of interval ARX model is robust to variation in parameters of the model, and the performance of the modeling using the proposed IARX is superior to that of the previous work.
△ Less
Submitted 12 July, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
A Motion Assessment Method for Reference Stack Selection in Fetal Brain MRI Reconstruction Based on Tensor Rank Approximation
Authors:
Haoan Xu,
Wen Shi,
Jiwei Sun,
Tianshu Zheng,
Cong Sun,
Sun Yi,
Guangbin Wang,
Dan Wu
Abstract:
Purpose: Slice-to-volume registration and super-resolution reconstruction (SVR-SRR) is commonly used to generate 3D volumes of the fetal brain from 2D stacks of slices acquired in multiple orientations. A critical initial step in this pipeline is to select one stack with the minimum motion as a reference for registration. An accurate and unbiased motion assessment (MA) is thus crucial for successf…
▽ More
Purpose: Slice-to-volume registration and super-resolution reconstruction (SVR-SRR) is commonly used to generate 3D volumes of the fetal brain from 2D stacks of slices acquired in multiple orientations. A critical initial step in this pipeline is to select one stack with the minimum motion as a reference for registration. An accurate and unbiased motion assessment (MA) is thus crucial for successful selection. Methods: We presented a MA method that determines the minimum motion stack based on 3D low-rank approximation using CANDECOMP/PARAFAC (CP) decomposition. Compared to the current 2D singular value decomposition (SVD) based method that requires flattening stacks into matrices to obtain ranks, in which the spatial information is lost, the CP-based method can factorize 3D stack into low-rank and sparse components in a computationally efficient manner. The difference between the original stack and its low-rank approximation was proposed as the motion indicator. Results: Compared to SVD-based methods, our proposed CP-based MA demonstrated higher sensitivity in detecting small motion with a lower baseline bias. Experiments on randomly simulated motion illustrated that the proposed CP method achieved a higher success rate of 95.45% in identifying the minimum motion stack, compared to SVD-based method with a success rate of 58.18%. We further demonstrated that combining CP-based MA with existing SRR-SVR pipeline significantly improved 3D volume reconstruction. Conclusion: The proposed CP-based MA method showed superior performance compared to SVD-based methods with higher sensitivity to motion, success rate, and lower baseline bias, and can be used as a prior step to improve fetal brain reconstruction.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
EE-TTS: Emphatic Expressive TTS with Linguistic Information
Authors:
Yi Zhong,
Chen Zhang,
Xule Liu,
Chenxi Sun,
Weishan Deng,
Haifeng Hu,
Zhongqian Sun
Abstract:
While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To reso…
▽ More
While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To resolve this matter, we propose Emphatic Expressive TTS (EE-TTS), which leverages multi-level linguistic information from syntax and semantics. EE-TTS contains an emphasis predictor that can identify appropriate emphasis positions from text and a conditioned acoustic model to synthesize expressive speech with emphasis and linguistic information. Experimental results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49 and 0.67 in expressiveness and naturalness. EE-TTS also shows strong generalization across different datasets according to AB test results.
△ Less
Submitted 14 April, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
On the Optimality, Stability, and Feasibility of Control Barrier Functions: An Adaptive Learning-Based Approach
Authors:
Alaa Eddine Chriat,
Chuangchuang Sun
Abstract:
Safety has been a critical issue for the deployment of learning-based approaches in real-world applications. To address this issue, control barrier function (CBF) and its variants have attracted extensive attention for safety-critical control. However, due to the myopic one-step nature of CBF and the lack of principled methods to design the class-$\mathcal{K}$ functions, there are still fundamenta…
▽ More
Safety has been a critical issue for the deployment of learning-based approaches in real-world applications. To address this issue, control barrier function (CBF) and its variants have attracted extensive attention for safety-critical control. However, due to the myopic one-step nature of CBF and the lack of principled methods to design the class-$\mathcal{K}$ functions, there are still fundamental limitations of current CBFs: optimality, stability, and feasibility. In this paper, we proposed a novel and unified approach to address these limitations with Adaptive Multi-step Control Barrier Function (AM-CBF), where we parameterize the class-$\mathcal{K}$ function by a neural network and train it together with the reinforcement learning policy. Moreover, to mitigate the myopic nature, we propose a novel \textit{multi-step training and single-step execution} paradigm to make CBF farsighted while the execution remains solving a single-step convex quadratic program. Our method is evaluated on the first and second-order systems in various scenarios, where our approach outperforms the conventional CBF both qualitatively and quantitatively.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Authors:
Chengzhe Sun,
Shan Jia,
Shuwei Hou,
Siwei Lyu
Abstract:
Advancements in AI-synthesized human voices have created a growing threat of impersonation and disinformation, making it crucial to develop methods to detect synthetic human voices. This study proposes a new approach to identifying synthetic human voices by detecting artifacts of vocoders in audio signals. Most DeepFake audio synthesis models use a neural vocoder, a neural network that generates w…
▽ More
Advancements in AI-synthesized human voices have created a growing threat of impersonation and disinformation, making it crucial to develop methods to detect synthetic human voices. This study proposes a new approach to identifying synthetic human voices by detecting artifacts of vocoders in audio signals. Most DeepFake audio synthesis models use a neural vocoder, a neural network that generates waveforms from temporal-frequency representations like mel-spectrograms. By identifying neural vocoder processing in audio, we can determine if a sample is synthesized. To detect synthetic human voices, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the feature extractor with a vocoder identification module. By treating vocoder identification as a pretext task, we constrain the feature extractor to focus on vocoder artifacts and provide discriminative features for the final binary classifier. Our experiments show that the improved RawNet2 model based on vocoder identification achieves high classification performance on the binary task overall.
△ Less
Submitted 27 April, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Filter-informed Spectral Graph Wavelet Networks for Multiscale Feature Extraction and Intelligent Fault Diagnosis
Authors:
Tianfu Li,
Chuang Sun,
Olga Fink,
Yuangui Yang,
Xuefeng Chen,
Ruqiang Yan
Abstract:
Intelligent fault diagnosis has been increasingly improved with the evolution of deep learning (DL) approaches. Recently, the emerging graph neural networks (GNNs) have also been introduced in the field of fault diagnosis with the goal to make better use of the inductive bias of the interdependencies between the different sensor measurements. However, there are some limitations with these GNN-base…
▽ More
Intelligent fault diagnosis has been increasingly improved with the evolution of deep learning (DL) approaches. Recently, the emerging graph neural networks (GNNs) have also been introduced in the field of fault diagnosis with the goal to make better use of the inductive bias of the interdependencies between the different sensor measurements. However, there are some limitations with these GNN-based fault diagnosis methods. First, they lack the ability to realize multiscale feature extraction due to the fixed receptive field of GNNs. Secondly, they eventually encounter the over-smoothing problem with increase of model depth. Lastly, the extracted features of these GNNs are hard to understand owing to the black-box nature of GNNs. To address these issues, a filter-informed spectral graph wavelet network (SGWN) is proposed in this paper. In SGWN, the spectral graph wavelet convolutional (SGWConv) layer is established upon the spectral graph wavelet transform, which can decompose a graph signal into scaling function coefficients and spectral graph wavelet coefficients. With the help of SGWConv, SGWN is able to prevent the over-smoothing problem caused by long-range low-pass filtering, by simultaneously extracting low-pass and band-pass features. Furthermore, to speed up the computation of SGWN, the scaling kernel function and graph wavelet kernel function in SGWConv are approximated by the Chebyshev polynomials. The effectiveness of the proposed SGWN is evaluated on the collected solenoid valve dataset and aero-engine intershaft bearing dataset. The experimental results show that SGWN can outperform the comparative methods in both diagnostic accuracy and the ability to prevent over-smoothing. Moreover, its extracted features are also interpretable with domain knowledge.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Authors:
Chengzhe Sun,
Shan Jia,
Shuwei Hou,
Ehab AlBadawy,
Siwei Lyu
Abstract:
The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural net…
▽ More
The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural network that synthesizes waveforms from temporal-frequency representations, e.g., mel-spectrograms. The neural vocoder is a core component in most DeepFake audio synthesis models. Hence the identification of neural vocoder processing implies that an audio sample may have been synthesized. To take advantage of the vocoder artifacts for synthetic human voice detection, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the front-end feature extractor with a vocoder identification module. We treat the vocoder identification as a pretext task to constrain the front-end feature extractor to focus on vocoder artifacts and provide discriminative features for the final binary classifier. Our experiments show that the improved RawNet2 model based on vocoder identification achieves an overall high classification performance on the binary task.
△ Less
Submitted 27 April, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Fisheye traffic data set of point center markers
Authors:
Chung-I Huang,
Wei-Yu Chen,
Wei Jan Ko,
Jih-Sheng Chang,
Chen-Kai Sun,
Hui Hung Yu,
Fang-Pang Lin
Abstract:
This study presents an open data-market platform and a dataset containing 160,000 markers and 18,000 images. We hope that this dataset will bring more new data value and applications In this paper, we introduce the format and usage of the dataset, and we show a demonstration of deep learning vehicle detection trained by this dataset.
This study presents an open data-market platform and a dataset containing 160,000 markers and 18,000 images. We hope that this dataset will bring more new data value and applications In this paper, we introduce the format and usage of the dataset, and we show a demonstration of deep learning vehicle detection trained by this dataset.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Hybrid stability augmentation control of multi-rotor UAV in confined space based on adaptive backstep** control
Authors:
QuanXi Zhan,
JunRui Zhang,
ChenYang Sun,
RunJie Shen,
Bin He
Abstract:
This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been wid…
▽ More
This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been widely used and can meet the requirements of general flight tasks, it cannot guarantee the stability of the system over a wide range. The inspection of a diversion line in an enclosed space requires high system stability and robustness of the UAV controller. In this paper, a hybrid stabilised adaptive backstep** control method is proposed. Firstly, a multi-rotor UAV model is analysed and transformed into a strict feedback form with external disturbances; then adaptive techniques are used to estimate the airflow disturbances caused by the blades, and the attitude and position tracking controllers are designed by combining backstep** control and PID control respectively; finally, the asymptotic stability of the system is ensured by constructing a Lyapunov function. The experimental data show that the flight controller designed in this paper has good robustness and tracking performance, and can better resist the disturbance caused by airflow disturbance in confined space.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
NeRFEditor: Differentiable Style Decomposition for Full 3D Scene Editing
Authors:
Chunyi Sun,
Yanbin Liu,
Junlin Han,
Stephen Gould
Abstract:
We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360° as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from…
▽ More
We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360° as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360°views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation.
△ Less
Submitted 8 December, 2022; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Superpixel Perception Graph Neural Network for Intelligent Defect Detection
Authors:
Hongbing Shang,
Qixiu Yang,
Chuang Sun,
Xuefeng Chen,
Ruqiang Yan
Abstract:
Aero-engine is the core component of aircraft and other spacecraft. The high-speed rotating blades provide power by sucking in air and fully combusting, and various defects will inevitably occur, threatening the operation safety of aero-engine. Therefore, regular inspections are essential for such a complex system. However, existing traditional technology which is borescope inspection is labor-int…
▽ More
Aero-engine is the core component of aircraft and other spacecraft. The high-speed rotating blades provide power by sucking in air and fully combusting, and various defects will inevitably occur, threatening the operation safety of aero-engine. Therefore, regular inspections are essential for such a complex system. However, existing traditional technology which is borescope inspection is labor-intensive, time-consuming, and experience-dependent. To endow this technology with intelligence, a novel superpixel perception graph neural network (SPGNN) is proposed by utilizing a multi-stage graph convolutional network (MSGCN) for feature extraction and superpixel perception region proposal network (SPRPN) for region proposal. First, to capture complex and irregular textures, the images are transformed into a series of patches, to obtain their graph representations. Then, MSGCN composed of several GCN blocks extracts graph structure features and performs graph information processing at graph level. Last but not least, the SPRPN is proposed to generate perceptual bounding boxes by fusing graph representation features and superpixel perception features. Therefore, the proposed SPGNN always implements feature extraction and information transmission at the graph level in the whole SPGNN pipeline, and SPRPN and MSGNN mutually benefit from each other. To verify the effectiveness of SPGNN, we meticulously construct a simulated blade dataset with 3000 images. A public aluminum dataset is also used to validate the performances of different methods. The experimental results demonstrate that the proposed SPGNN has superior performance compared with the state-of-the-art methods. The source code will be available at https://github.com/githbshang/SPGNN.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
EOCSA: Predicting Prognosis of Epithelial Ovarian Cancer with Whole Slide Histopathological Images
Authors:
Tianling Liu,
Ran Su,
Changming Sun,
Xiuting Li,
Leyi Wei
Abstract:
Ovarian cancer is one of the most serious cancers that threaten women around the world. Epithelial ovarian cancer (EOC), as the most commonly seen subtype of ovarian cancer, has rather high mortality rate and poor prognosis among various gynecological cancers. Survival analysis outcome is able to provide treatment advices to doctors. In recent years, with the development of medical imaging technol…
▽ More
Ovarian cancer is one of the most serious cancers that threaten women around the world. Epithelial ovarian cancer (EOC), as the most commonly seen subtype of ovarian cancer, has rather high mortality rate and poor prognosis among various gynecological cancers. Survival analysis outcome is able to provide treatment advices to doctors. In recent years, with the development of medical imaging technology, survival prediction approaches based on pathological images have been proposed. In this study, we designed a deep framework named EOCSA which analyzes the prognosis of EOC patients based on pathological whole slide images (WSIs). Specifically, we first randomly extracted patches from WSIs and grouped them into multiple clusters. Next, we developed a survival prediction model, named DeepConvAttentionSurv (DCAS), which was able to extract patch-level features, removed less discriminative clusters and predicted the EOC survival precisely. Particularly, channel attention, spatial attention, and neuron attention mechanisms were used to improve the performance of feature extraction. Then patient-level features were generated from our weight calculation method and the survival time was finally estimated using LASSO-Cox model. The proposed EOCSA is efficient and effective in predicting prognosis of EOC and the DCAS ensures more informative and discriminative features can be extracted. As far as we know, our work is the first to analyze the survival of EOC based on WSIs and deep neural network technologies. The experimental results demonstrate that our proposed framework has achieved state-of-the-art performance of 0.980 C-index. The implementation of the approach can be found at https://github.com/RanSuLab/EOCprognosis.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
An efficient approach for nonconvex semidefinite optimization via customized alternating direction method of multipliers
Authors:
Chuangchuang Sun
Abstract:
We investigate a class of general combinatorial graph problems, including MAX-CUT and community detection, reformulated as quadratic objectives over nonconvex constraints and solved via the alternating direction method of multipliers (ADMM).
We propose two reformulations: one using vector variables and a binary constraint, and the other further reformulating the Burer-Monteiro form for simpler s…
▽ More
We investigate a class of general combinatorial graph problems, including MAX-CUT and community detection, reformulated as quadratic objectives over nonconvex constraints and solved via the alternating direction method of multipliers (ADMM).
We propose two reformulations: one using vector variables and a binary constraint, and the other further reformulating the Burer-Monteiro form for simpler subproblems.
Despite the nonconvex constraint, we prove the ADMM iterates converge to a stationary point in both formulations, under mild assumptions.
Additionally, recent work suggests that in this latter form, when the matrix factors are wide enough, local optimum with high probability is also the global optimum.
To demonstrate the scalability of our algorithm, we include results for MAX-CUT, community detection, and image segmentation benchmark and simulated examples.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
AVATAR: Unconstrained Audiovisual Speech Recognition
Authors:
Valentin Gabeur,
Paul Hongsuck Seo,
Arsha Nagrani,
Chen Sun,
Karteek Alahari,
Cordelia Schmid
Abstract:
Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth. Unlike works that simply focus on the lip motion, we investigate the contribution of entire visual frames (visual actions, objects, background etc.). This is particularly useful for unconstrained videos, where the speaker is not necessarily visible…
▽ More
Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth. Unlike works that simply focus on the lip motion, we investigate the contribution of entire visual frames (visual actions, objects, background etc.). This is particularly useful for unconstrained videos, where the speaker is not necessarily visible. To solve this task, we propose a new sequence-to-sequence AudioVisual ASR TrAnsformeR (AVATAR) which is trained end-to-end from spectrograms and full-frame RGB. To prevent the audio stream from dominating training, we propose different word-masking strategies, thereby encouraging our model to pay attention to the visual stream. We demonstrate the contribution of the visual modality on the How2 AV-ASR benchmark, especially in the presence of simulated noise, and show that our model outperforms all other prior work by a large margin. Finally, we also create a new, real-world test bed for AV-ASR called VisSpeech, which demonstrates the contribution of the visual modality under challenging audio conditions.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
A microstructure estimation Transformer inspired by sparse representation for diffusion MRI
Authors:
Tianshu Zheng,
Cong Sun,
Weihao Zheng,
Wen Shi,
Haotian Li,
Yi Sun,
Yi Zhang,
Guangbin Wang,
Chuyang Ye,
Dan Wu
Abstract:
Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by th…
▽ More
Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructure estimation with downsampled q-space data. To take advantage of the Transformer while addressing its limitation in large training data requirements, we explicitly introduce an inductive bias - model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a map** stage. The embedding stage is a Transformer-based structure that encodes the signal to ensure the voxel is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The map** stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
AFFIRM: Affinity Fusion-based Framework for Iteratively Random Motion correction of multi-slice fetal brain MRI
Authors:
Wen Shi,
Haoan Xu,
Cong Sun,
Jiwei Sun,
Yamin Li,
Xinyi Xu,
Tianshu Zheng,
Yi Zhang,
Guangbin Wang,
Dan Wu
Abstract:
Multi-slice magnetic resonance images of the fetal brain are usually contaminated by severe and arbitrary fetal and maternal motion. Hence, stable and robust motion correction is necessary to reconstruct high-resolution 3D fetal brain volume for clinical diagnosis and quantitative analysis. However, the conventional registration-based correction has a limited capture range and is insufficient for…
▽ More
Multi-slice magnetic resonance images of the fetal brain are usually contaminated by severe and arbitrary fetal and maternal motion. Hence, stable and robust motion correction is necessary to reconstruct high-resolution 3D fetal brain volume for clinical diagnosis and quantitative analysis. However, the conventional registration-based correction has a limited capture range and is insufficient for detecting relatively large motions. Here, we present a novel Affinity Fusion-based Framework for Iteratively Random Motion (AFFIRM) correction of the multi-slice fetal brain MRI. It learns the sequential motion from multiple stacks of slices and integrates the features between 2D slices and reconstructed 3D volume using affinity fusion, which resembles the iterations between slice-to-volume registration and volumetric reconstruction in the regular pipeline. The method accurately estimates the motion regardless of brain orientations and outperforms other state-of-the-art learning-based methods on the simulated motion-corrupted data, with a 48.4% reduction of mean absolute error for rotation and 61.3% for displacement. We then incorporated AFFIRM into the multi-resolution slice-to-volume registration and tested it on the real-world fetal MRI scans at different gestation stages. The results indicated that adding AFFIRM to the conventional pipeline improved the success rate of fetal brain super-resolution reconstruction from 77.2% to 91.9%.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Learning Audio-Video Modalities from Image Captions
Authors:
Arsha Nagrani,
Paul Hongsuck Seo,
Bryan Seybold,
Anja Hauth,
Santiago Manen,
Chen Sun,
Cordelia Schmid
Abstract:
A major challenge in text-video and text-audio retrieval is the lack of large-scale training data. This is unlike image-captioning, where datasets are in the order of millions of samples. To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort. Using this pipeline, we create a new l…
▽ More
A major challenge in text-video and text-audio retrieval is the lack of large-scale training data. This is unlike image-captioning, where datasets are in the order of millions of samples. To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort. Using this pipeline, we create a new large-scale, weakly labelled audio-video captioning dataset consisting of millions of paired clips and captions. We show that training a multimodal transformed based model on this data achieves competitive performance on video retrieval and video captioning, matching or even outperforming HowTo100M pretraining with 20x fewer clips. We also show that our mined clips are suitable for text-audio pretraining, and achieve state of the art results for the task of audio retrieval.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Image Style Transfer: from Artistic to Photorealistic
Authors:
Chenggui Sun,
Li Bin Song
Abstract:
The rapid advancement of deep learning has significantly boomed the development of photorealistic style transfer. In this review, we reviewed the development of photorealistic style transfer starting from artistic style transfer and the contribution of traditional image processing techniques on photorealistic style transfer, including some work that had been completed in the Multimedia lab at the…
▽ More
The rapid advancement of deep learning has significantly boomed the development of photorealistic style transfer. In this review, we reviewed the development of photorealistic style transfer starting from artistic style transfer and the contribution of traditional image processing techniques on photorealistic style transfer, including some work that had been completed in the Multimedia lab at the University of Alberta. Many techniques were discussed in this review. However, our focus is on VGG-based techniques, whitening and coloring transform (WCTs) based techniques, the combination of deep learning with traditional image processing techniques.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Privacy Leakage in Proactive VR Streaming: Modeling and Tradeoff
Authors:
Xing Wei,
Chenyang Yang,
Chengjian Sun
Abstract:
Proactive tile-based virtual reality (VR) video streaming employs the viewpoint of a user to predict the tiles to be requested, renders and delivers the predicted tiles before playback. Recently, it has been found that the identity and preference of the user can be inferred from the trace of viewpoint uploaded for proactive streaming, which indicates that viewpoint leakage incurs privacy leakage.…
▽ More
Proactive tile-based virtual reality (VR) video streaming employs the viewpoint of a user to predict the tiles to be requested, renders and delivers the predicted tiles before playback. Recently, it has been found that the identity and preference of the user can be inferred from the trace of viewpoint uploaded for proactive streaming, which indicates that viewpoint leakage incurs privacy leakage. In this paper, we strive to answer the following questions regarding viewpoint leakage during proactive VR video streaming. When is the viewpoint leaked? Can privacy-preserving approaches (e.g., federated or individual training, using predictors with no need for training, or predicting locally) avoid viewpoint leakage? We find that if the prediction error or the quality of experience (QoE) metric is uploaded for adaptive streaming, the real viewpoint can be inferred even with the privacy-preserving approaches. Then, we define viewpoint leakage probability to characterize the accuracy of the inferred viewpoint, and respectively derive the probability when uploading prediction error and QoE metric. We find that the viewpoint leakage probability can be reduced by sacrificing QoE or increasing resources. Simulation with the state-of-the-art predictor over a real dataset shows that such a tradeoff does not exist only in rare cases.
△ Less
Submitted 10 April, 2022; v1 submitted 6 March, 2022;
originally announced March 2022.
-
A Holistic Review on Advanced Bi-directional EV Charging Control Algorithms
Authors:
Xiaoying Tang,
Chenxi Sun,
Suzhi Bi,
Shuoyao Wang,
Angela Yingjun Zhang
Abstract:
The rapid growth of electric vehicles (EVs) has promised a next-generation transportation system with reduced carbon emission. The fast development of EVs and charging facilities is driving the evolution of Internet of Vehicles (IoV) to Internet of Electric Vehicles (IoEV). IoEV benefits from both smart grid and Internet of Things (IoT) technologies which provide advanced bi-directional charging s…
▽ More
The rapid growth of electric vehicles (EVs) has promised a next-generation transportation system with reduced carbon emission. The fast development of EVs and charging facilities is driving the evolution of Internet of Vehicles (IoV) to Internet of Electric Vehicles (IoEV). IoEV benefits from both smart grid and Internet of Things (IoT) technologies which provide advanced bi-directional charging services and real-time data processing capability, respectively. The major design challenges of the IoEV charging control lie in the randomness of charging events and the mobility of EVs. In this article, we present a holistic review on advanced bi-directional EV charging control algorithms. For Grid-to-Vehicle (G2V), we introduce the charging control problem in two scenarios: 1) Operation of a single charging station and 2) Operation of multiple charging stations in coupled transportation and power networks. For Vehicle-to-Grid (V2G), we discuss how EVs can perform energy trading in the electricity market and provide ancillary services to the power grid. Besides, a case study is provided to illustrate the economic benefit of the joint optimization of routing and charging scheduling of multiple EVs in the IoEV. Last but not the least, we will highlight some open problems and future research directions of charging scheduling problems for IoEVs.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
A Lightweight Dual-Domain Attention Framework for Sparse-View CT Reconstruction
Authors:
Chang Sun,
Ken Deng,
Yitong Liu,
Hongwen Yang
Abstract:
Computed Tomography (CT) plays an essential role in clinical diagnosis. Due to the adverse effects of radiation on patients, the radiation dose is expected to be reduced as low as possible. Sparse sampling is an effective way, but it will lead to severe artifacts on the reconstructed CT image, thus sparse-view CT image reconstruction has been a prevailing and challenging research area. With the po…
▽ More
Computed Tomography (CT) plays an essential role in clinical diagnosis. Due to the adverse effects of radiation on patients, the radiation dose is expected to be reduced as low as possible. Sparse sampling is an effective way, but it will lead to severe artifacts on the reconstructed CT image, thus sparse-view CT image reconstruction has been a prevailing and challenging research area. With the popularity of mobile devices, the requirements for lightweight and real-time networks are increasing rapidly. In this paper, we design a novel lightweight network called CAGAN, and propose a dual-domain reconstruction pipeline for parallel beam sparse-view CT. CAGAN is an adversarial auto-encoder, combining the Coordinate Attention unit, which preserves the spatial information of features. Also, the application of Shuffle Blocks reduces the parameters by a quarter without sacrificing its performance. In the Radon domain, the CAGAN learns the map** between the interpolated data and fringe-free projection data. After the restored Radon data is reconstructed to an image, the image is sent into the second CAGAN trained for recovering the details, so that a high-quality image is obtained. Experiments indicate that the CAGAN strikes an excellent balance between model complexity and performance, and our pipeline outperforms the DD-Net and the DuDoNet.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
Resource allocation for reconfigurable intelligent surface aided broadcast channels
Authors:
Cong Sun,
Xian Liu,
Bile Peng,
Eduard Jorswieck
Abstract:
A two-user downlink network aided by a reconfigurable intelligent surface is considered. The weighted sum signal to interference plus noise ratio maximization and the sum rate maximization models are presented, where the precoding vectors and the RIS matrix are jointly optimized. Since the optimization problem is non-convex and difficult, new approximation models are proposed. The upper bounds of…
▽ More
A two-user downlink network aided by a reconfigurable intelligent surface is considered. The weighted sum signal to interference plus noise ratio maximization and the sum rate maximization models are presented, where the precoding vectors and the RIS matrix are jointly optimized. Since the optimization problem is non-convex and difficult, new approximation models are proposed. The upper bounds of the corresponding objective functions are derived and maximized. Two new algorithms based on the alternating direction method of multiplier are proposed. It is proved that the proposed algorithms converge to the KKT points of the approximation models as long as the iteration points converge. Simulation results show the good performances of the proposed models compared to state of the art algorithms.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network
Authors:
Bile Peng,
Jan-Aike Termöhlen,
Cong Sun,
Dan** He,
Ke Guan,
Tim Fingscheidt,
Eduard A. Jorswieck
Abstract:
Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply…
▽ More
Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply a fully convolutional network (FCN) to solve this problem, which was originally designed for semantic segmentation of images. The rectangular shape of the RIS and the spatial correlation of channels with adjacent RIS antennas due to the short distance between them encourage us to apply it for the RIS configuration. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel. In the base station (BS), the differentiable minimum mean squared error (MMSE) precoder is used for pretraining and the weighted minimum mean squared error (WMMSE) precoder is then applied for fine-tuning, which is nondifferentiable, more complex, but achieves a better performance. Evaluation results show that the proposed solution has higher performance and allows for a faster evaluation than the baselines. Hence it scales better to a large number of antennas, advancing the RIS one step closer to practical deployment.
△ Less
Submitted 21 September, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
Control Parameters Considered Harmful: Detecting Range Specification Bugs in Drone Configuration Modules via Learning-Guided Search
Authors:
Ruidong Han,
Chao Yang,
Siqi Ma,
JiangFeng Ma,
Cong Sun,
Juanru Li,
Elisa Bertino
Abstract:
In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter re…
▽ More
In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter receives a value in the recommended value range, certain combinations of parameter values may affect the drone physical stability. In this paper we develop a novel learning-guided search system to find such combinations, that we refer to as incorrect configurations. Our system applies metaheuristic search algorithms mutating configurations to detect the configuration parameters that have values driving the drone to unstable physical states. To guide the mutations, our system leverages a machine learning predictor as the fitness evaluator. Finally, by utilizing multi-objective optimization, our system returns the feasible ranges based on the mutation search results. Because in our system the mutations are guided by a predictor, evaluating the parameter configurations does not require realistic/simulation executions. Therefore, our system supports a comprehensive and yet efficient detection of incorrect configurations. We have carried out an experimental evaluation of our system. The evaluation results show that the system successfully reports potentially incorrect configurations, of which over 85% lead to actual unstable physical states.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Co-Correcting: Noise-tolerant Medical Image Classification via mutual Label Correction
Authors:
Jiarun Liu,
Ruirui Li,
Chuan Sun
Abstract:
With the development of deep learning, medical image classification has been significantly improved. However, deep learning requires massive data with labels. While labeling the samples by human experts is expensive and time-consuming, collecting labels from crowd-sourcing suffers from the noises which may degenerate the accuracy of classifiers. Therefore, approaches that can effectively handle la…
▽ More
With the development of deep learning, medical image classification has been significantly improved. However, deep learning requires massive data with labels. While labeling the samples by human experts is expensive and time-consuming, collecting labels from crowd-sourcing suffers from the noises which may degenerate the accuracy of classifiers. Therefore, approaches that can effectively handle label noises are highly desired. Unfortunately, recent progress on handling label noise in deep learning has gone largely unnoticed by the medical image. To fill the gap, this paper proposes a noise-tolerant medical image classification framework named Co-Correcting, which significantly improves classification accuracy and obtains more accurate labels through dual-network mutual learning, label probability estimation, and curriculum label correcting. On two representative medical image datasets and the MNIST dataset, we test six latest Learning-with-Noisy-Labels methods and conduct comparative studies. The experiments show that Co-Correcting achieves the best accuracy and generalization under different noise ratios in various tasks. Our project can be found at: https://github.com/JiarunLiu/Co-Correcting.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.