Search | arXiv e-print repository

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2406.01173 [pdf, other]

Cascade Network Stability of Synchronized Traffic Load Balancing with Heterogeneous Energy Efficiency Policies

Authors: Mengbang Zou, Weisi Guo

Abstract: Cascade stability of load balancing is critical for ensuring high efficiency service delivery and preventing undesirable handovers. In energy efficient networks that employ diverse sleep mode operations, handing over traffic to neighbouring cells' expanded coverage must be done with minimal side effects. Current research is largely concerned with designing distributed and centralized efficient loa… ▽ More Cascade stability of load balancing is critical for ensuring high efficiency service delivery and preventing undesirable handovers. In energy efficient networks that employ diverse sleep mode operations, handing over traffic to neighbouring cells' expanded coverage must be done with minimal side effects. Current research is largely concerned with designing distributed and centralized efficient load balancing policies that are locally stable. There is a major research gap in identifying large-scale cascade stability for networks with heterogeneous load balancing policies arising from diverse plug-and-play sleep mode policies in ORAN, which will cause heterogeneity in the network stability behaviour. Here, we investigate whether cells arbitrarily connected for load balancing and having an arbitrary number undergoing sleep mode can: (i) synchronize to a desirable load-balancing state, and (ii) maintain stability. For the first time, we establish the criterion for stability and prove its validity for any general load dynamics and random network topology. Whilst its general form allows all load balancing and sleep mode dynamics to be incorporated, we propose an ORAN architecture where the network service management and orchestration (SMO) must monitor new load balancing policies to ensure overall network cascade stability. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00320 [pdf, other]

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and conducts sampling by solving ODE, outperforming autoregressive and score-based models in terms of audio quality. By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video. Furthermore, through reflow and one-step distillation with guided vector field, our model can generate decent audio in a few, or even only one sampling step. Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment on VGGSound, with alignment accuracy reaching 97.22%, and 6.2% improvement in inception score over the strong diffusion-based baseline. Audio samples are available at http://frieren-v2a.github.io . △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.14398 [pdf, other]

SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network

Authors: Weiyu Guo, Ying Sun, Yijie Xu, Ziyue Qiao, Yongkui Yang, Hui Xiong

Abstract: Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distri… ▽ More Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distribution shifts in real-world settings, compromises model robustness. To tackle these challenges, we propose a novel SpGesture framework based on Spiking Neural Networks, which possesses several unique merits compared with existing methods: (1) Robustness: By utilizing membrane potential as a memory list, we pioneer the introduction of Source-Free Domain Adaptation into SNN for the first time. This enables SpGesture to mitigate the accuracy degradation caused by distribution shifts. (2) High Accuracy: With a novel Spiking Jaccard Attention, SpGesture enhances the SNNs' ability to represent sEMG features, leading to a notable rise in system accuracy. To validate SpGesture's performance, we collected a new sEMG gesture dataset which has different forearm postures, where SpGesture achieved the highest accuracy among the baselines ($89.26\%$). Moreover, the actual deployment on the CPU demonstrated a system latency below 100ms, well within real-time requirements. This impressive performance showcases SpGesture's potential to enhance the applicability of sEMG in real-world scenarios. The code is available at https://anonymous.4open.science/r/SpGesture. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.09752 [pdf, other]

Time-Varying Graph Signal Recovery Using High-Order Smoothness and Adaptive Low-rankness

Authors: Weihong Guo, Yifei Lou, **g Qin, Ming Yan

Abstract: Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a… ▽ More Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a matrix form. As one of the most popular options, the graph Laplacian is commonly adopted in designing graph regularizations for reconstructing signals defined on a graph from partially observed data. In this work, we propose a time-varying graph signal recovery method based on the high-order Sobolev smoothness and an error-function weighted nuclear norm regularization to enforce the low-rankness. Two efficient algorithms based on the alternating direction method of multipliers and iterative reweighting are proposed, and convergence of one algorithm is shown in detail. We conduct various numerical experiments on synthetic and real-world data sets to demonstrate the proposed method's effectiveness compared to the state-of-the-art in graph signal recovery. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.11213 [pdf, other]

Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis

Authors: Weiyu Guo, Ziyue Qiao, Ying Sun, Hui Xiong

Abstract: Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation t… ▽ More Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation techniques. In this work, we revisit the problem from a short term enhancement perspective to improve precision and robustness against various common noisy scenarios with learnable denoise using sEMG intrinsic pattern information and sliding-window attention. We propose a Short Term Enhancement Module(STEM) which can be easily integrated with various models. STEM offers several benefits: 1) Learnable denoise, enabling noise reduction without manual data augmentation; 2) Scalability, adaptable to various models; and 3) Cost-effectiveness, achieving short-term enhancement through minimal weight-sharing in an efficient attention mechanism. In particular, we incorporate STEM into a transformer, creating the Short Term Enhanced Transformer (STET). Compared with best-competing approaches, the impact of noise on STET is reduced by more than 20%. We also report promising results on both classification and regression datasets and demonstrate that STEM generalizes across different gesture recognition tasks. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09131 [pdf, other]

Design of Artificial Interference Signals for Covert Communication Aided by Multiple Friendly Nodes

Authors: Xuyang Zhao. Wei Guo, Yongchao Wang

Abstract: In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of ch… ▽ More In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of channel coefficients to optimize the basis matrix of the artificial noise signals space in the absence of accurate channel fading information between the friendly interference nodes and the legitimate receiver. The optimization problem aims to design artificial noise signals within the space to facilitate covert communication while minimizing the impact on the performance of legitimate communication. Second, a customized Rimannian Stochastic Variance Reduced Gradient (R-SVRG) algorithm is proposed to solve the non-convex problem. In the algorithm, we employ the Riemannian optimization framework to analyze the geometric structure of the basis matrix constraints and transform the original non-convex optimization problem into an unconstrained problem on the complex Stiefel manifold for solution. Third, we theoretically prove the convergence of the proposed algorithm to a stationary point. In the end, we evaluate the performance of the proposed strategy for generating artificial noise signals through numerical simulations. The results demonstrate that our approach significantly outperforms the Gaussian artificial noise strategy without optimization. △ Less

Submitted 9 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.02731 [pdf, other]

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Authors: Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong

Abstract: Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing ne… ▽ More Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing, thereby broadening the model's applicability within the entire imaging process. Our method harnesses multi-scale processing and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://github.com/yunfanLu/ev-demosaic △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted for the CVPR 2024 Workshop on Mobile Intelligent Photography & Imaging

arXiv:2404.00309 [pdf, other]

Model-Driven Deep Learning for Distributed Detection with Binary Quantization

Authors: Wei Guo, Meng He, Chuan Huang, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) w… ▽ More Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) with binary quantization to strike a balance between communication overhead and detection performance in WSNs. We begin by establishing the lower bound of detection error probability for distributed detection using the maximum a posteriori (MAP) criterion. Furthermore, we prove the global optimality of employing identical local quantizers across sensors, thereby maximizing the corresponding Chernoff information. Subsequently, the paper derives the minimum MAP detection error probability (MAPDEP) by inplementing identical binary probabilistic quantizers across the sensors. Moreover, the paper establishes the equivalence between utilizing all quantized data and their average as input to the detector at the fusion center (FC). In particular, we derive the Kullback-Leibler (KL) divergence, which measures the difference between the true posterior probability and output of the proposed detector. Leveraging the MAPDEP and KL divergence as loss functions, the paper proposes model-driven DL method to separately train the probability controller module in the quantizer and the detector module at the FC. Numerical results validate the convergence and effectiveness of the proposed method, which achieves near-optimal performance with reduced complexity for Gaussian hypothesis testing. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2402.18527 [pdf, other]

Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures

Authors: Andrei Cozma, Landon Harris, Hairong Qi, ** Ji, Wenpeng Guo, Song Yuan

Abstract: This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and te… ▽ More This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and textures of tire X-ray images, the study emphasizes the significance of feature engineering to enhance the performance of defect detection systems. By meticulously integrating combinations of these features with a Random Forest (RF) classifier and comparing them against advanced models like YOLOv8, the research not only benchmarks the performance of traditional features in defect detection but also explores the synergy between classical and modern approaches. The experimental results demonstrate that these traditional features, when fine-tuned and combined with machine learning models, can significantly improve the accuracy and reliability of tire defect detection, aiming to set a new standard in automated quality assurance in tire manufacturing. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 2 figures, 3 tables, submitted to ICIP2024

ACM Class: I.4.7; I.4.9; I.4.0

arXiv:2311.18539 [pdf, other]

Bridging Both Worlds in Semantics and Time: Domain Knowledge Based Analysis and Correlation of Industrial Process Attacks

Authors: Moses Ike, Kandy Phan, Anwesh Badapanda, Matthew Landen, Keaton Sadoski, Wanda Guo, Asfahan Shah, Saman Zonouz, Wenke Lee

Abstract: Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process beha… ▽ More Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process behavior, i.e., SCADA execution does not map to process dynamics nor evolve at similar time scales. We propose BRIDGE to analyze and correlate SCADA and industrial process attacks using domain knowledge to bridge their unique semantic and time evolution. This enables operators to tie malicious SCADA operations to their adverse process effects, which reduces false alarms and improves attack understanding. BRIDGE (i) identifies process constraints violations in SCADA by measuring actuation dependencies in SCADA process-control, and (ii) detects malicious SCADA effects in processes via a physics-informed neural network that embeds generic knowledge of inertial process dynamics. BRIDGE then dynamically aligns both analysis (i and ii) in a time-window that adjusts their time evolution based on process inertial delays. We applied BRIDGE to 11 diverse real-world industrial processes, and adaptive attacks inspired by past events. BRIDGE correlated 98.3% of attacks with 0.8% false positives (FP), compared to 78.3% detection accuracy and 13.7% FP of recent work. △ Less

Submitted 3 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.06879 [pdf, other]

The Solution for the CVPR2023 NICE Image Captioning Challenge

Authors: Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu

Abstract: In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge. Different from the traditional image captioning datasets, this challenge includes a larger new variety of visual concepts from many domains (such as COVID-19) as well as various image types (photographs, illustrations, graphics). For the data level, we collect external training data from Laion-5B,… ▽ More In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge. Different from the traditional image captioning datasets, this challenge includes a larger new variety of visual concepts from many domains (such as COVID-19) as well as various image types (photographs, illustrations, graphics). For the data level, we collect external training data from Laion-5B, a large-scale CLIP-filtered image-text dataset. For the model level, we use OFA, a large-scale visual-language pre-training model based on handcrafted templates, to perform the image captioning task. In addition, we introduce contrastive learning to align image-text pairs to learn new visual concepts in the pre-training stage. Then, we propose a similarity-bucket strategy and incorporate this strategy into the template to force the model to generate higher quality and more matching captions. Finally, by retrieval-augmented strategy, we construct a content-rich template, containing the most relevant top-k captions from other image-text pairs, to guide the model in generating semantic-rich captions. Our method ranks first on the leaderboard, achieving 105.17 and 325.72 Cider-Score in the validation and test phase, respectively. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.10935 [pdf, other]

A Geometric Flow Approach for Segmentation of Images with Inhomongeneous Intensity and Missing Boundaries

Authors: Paramjyoti Mohapatra, Richard Lartey, Weihong Guo, Michael Judkovich, Xiaojuan Li

Abstract: Image segmentation is a complex mathematical problem, especially for images that contain intensity inhomogeneity and tightly packed objects with missing boundaries in between. For instance, Magnetic Resonance (MR) muscle images often contain both of these issues, making muscle segmentation especially difficult. In this paper we propose a novel intensity correction and a semi-automatic active conto… ▽ More Image segmentation is a complex mathematical problem, especially for images that contain intensity inhomogeneity and tightly packed objects with missing boundaries in between. For instance, Magnetic Resonance (MR) muscle images often contain both of these issues, making muscle segmentation especially difficult. In this paper we propose a novel intensity correction and a semi-automatic active contour based segmentation approach. The approach uses a geometric flow that incorporates a reproducing kernel Hilbert space (RKHS) edge detector and a geodesic distance penalty term from a set of markers and anti-markers. We test the proposed scheme on MR muscle segmentation and compare with some state of the art methods. To help deal with the intensity inhomogeneity in this particular kind of image, a new approach to estimate the bias field using a fat fraction image, called Prior Bias-Corrected Fuzzy C-means (PBCFCM), is introduced. Numerical experiments show that the proposed scheme leads to significantly better results than compared ones. The average dice values of the proposed method are 92.5%, 85.3%, 85.3% for quadriceps, hamstrings and other muscle groups while other approaches are at least 10% worse. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Presented at CVIT 2023 Conference. Accepted to Journal of Image and Graphics

arXiv:2309.01112 [pdf]

Swing Leg Motion Strategy for Heavy-load Legged Robot Based on Force Sensing

Authors: Ze Fu, Yinghui Li, Weizhong Guo

Abstract: The heavy-load legged robot has strong load carrying capacity and can adapt to various unstructured terrains. But the large weight results in higher requirements for motion stability and environmental perception ability. In order to utilize force sensing information to improve its motion performance, in this paper, we propose a finite state machine model for the swing leg in the static gait by imi… ▽ More The heavy-load legged robot has strong load carrying capacity and can adapt to various unstructured terrains. But the large weight results in higher requirements for motion stability and environmental perception ability. In order to utilize force sensing information to improve its motion performance, in this paper, we propose a finite state machine model for the swing leg in the static gait by imitating the movement of the elephant. Based on the presence or absence of additional terrain information, different trajectory planning strategies are provided for the swing leg to enhance the success rate of step** and save energy. The experimental results on a novel quadruped robot show that our method has strong robustness and can enable heavy-load legged robots to pass through various complex terrains autonomously and smoothly. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.08283 [pdf, other]

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Authors: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan **

Abstract: Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in perfo… ▽ More Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 8 pages

arXiv:2306.14097 [pdf, other]

Interpretable Small Training Set Image Segmentation Network Originated from Multi-Grid Variational Model

Authors: Junying Meng, Weihong Guo, Jun Liu, Mingrui Yang

Abstract: The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor… ▽ More The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor interpretability (known as the black box problem). The classical Mumford-Shah (MS) model is effective for segmentation and provides a piece-wise smooth approximation of the original image. In this paper, we replace the hand-crafted regularity term in the MS model with a data adaptive generalized learnable regularity term and use a multi-grid framework to unroll the MS model and obtain a variational model-based segmentation network with better generalizability and interpretability. This approach allows for the incorporation of learnable prior information into the network structure design. Moreover, the multi-grid framework enables multi-scale feature extraction and offers a mathematical explanation for the effectiveness of the U-shaped network structure in producing good image segmentation results. Due to the proposed network originates from a variational model, it can also handle small training sizes. Our experiments on the REFUGE dataset, the White Blood Cell image dataset, and 3D thigh muscle magnetic resonance (MR) images demonstrate that even with smaller training datasets, our method yields better segmentation results compared to related state of the art segmentation methods. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 25 pages, 9 figures, 6 tables

MSC Class: 94A08; 68U10

arXiv:2306.00303 [pdf, other]

Sea Ice Extraction via Remote Sensed Imagery: Algorithms, Datasets, Applications and Challenges

Authors: Anzhu Yu, Wenjun Huang, Qing Xu, Qun Sun, Wenyue Guo, Song Ji, Bowei Wen, Chun** Qiu

Abstract: The deep learning, which is a dominating technique in artificial intelligence, has completely changed the image understanding over the past decade. As a consequence, the sea ice extraction (SIE) problem has reached a new era. We present a comprehensive review of four important aspects of SIE, including algorithms, datasets, applications, and the future trends. Our review focuses on researches publ… ▽ More The deep learning, which is a dominating technique in artificial intelligence, has completely changed the image understanding over the past decade. As a consequence, the sea ice extraction (SIE) problem has reached a new era. We present a comprehensive review of four important aspects of SIE, including algorithms, datasets, applications, and the future trends. Our review focuses on researches published from 2016 to the present, with a specific focus on deep learning-based approaches in the last five years. We divided all relegated algorithms into 3 categories, including classical image segmentation approach, machine learning-based approach and deep learning-based methods. We reviewed the accessible ice datasets including SAR-based datasets, the optical-based datasets and others. The applications are presented in 4 aspects including climate research, navigation, geographic information systems (GIS) production and others. It also provides insightful observations and inspiring future research directions. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: 24 pages, 6 figures

arXiv:2303.01249 [pdf, ps, other]

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

Authors: Zhijie Shen, Wu Guo, Bin Gu

Abstract: In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic modeling, the wav2vec 2.0 pre-trained model is fine-tuned by inserting language-specific and language-universal adapters. An online knowledge distillation is then used to enable the language-universal adapters to learn b… ▽ More In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic modeling, the wav2vec 2.0 pre-trained model is fine-tuned by inserting language-specific and language-universal adapters. An online knowledge distillation is then used to enable the language-universal adapters to learn both language-specific and universal features. The linguistic information confusion is also reduced by leveraging language identifiers (LIDs). With LIDs we perform a position-wise modification on the multi-head attention outputs. In the inference procedure, the language-specific adapters are removed while the language-universal adapters are kept activated. The proposed method improves the recognition accuracy and addresses the linear increase of the number of adapters' parameters with the number of languages in common multilingual ASR systems. Experiments on the BABEL dataset confirm the effectiveness of the proposed framework. Compared to the conventional multilingual model, a 3.3% absolute error rate reduction is achieved. The code is available at: https://github.com/shen9712/UniversalAdapterLearning. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2302.12428 [pdf]

A holistically 3D-printed flexible millimeter-wave Doppler radar: Towards fully printed high-frequency multilayer flexible hybrid electronics systems

Authors: Hong Tang, Yingjie Zhang, Bowen Zheng, Sensong An, Mohammad Haerinia, Yunxi Dong, Yi Huang, Wei Guo, Hualiang Zhang

Abstract: Flexible hybrid electronics (FHE) is an emerging technology enabled through the integration of advanced semiconductor devices and 3D printing technology. It unlocks tremendous market potential by realizing low-cost flexible circuits and systems that can be conformally integrated into various applications. However, the operating frequencies of most reported FHE systems are relatively low. It is als… ▽ More Flexible hybrid electronics (FHE) is an emerging technology enabled through the integration of advanced semiconductor devices and 3D printing technology. It unlocks tremendous market potential by realizing low-cost flexible circuits and systems that can be conformally integrated into various applications. However, the operating frequencies of most reported FHE systems are relatively low. It is also worth to note that reported FHE systems have been limited to relatively simple design concept (since complex systems will impose challenges in aspects such as multilayer interconnections, printing materials, and bonding layers). Here, we report a fully 3D-printed flexible four-layer millimeter-wave Doppler radar (i.e., a millimeter-wave FHE system). The sensing performance and flexibility of the 3D-printed radar are characterized and validated by general field tests and bending tests, respectively. Our results demonstrate the feasibility of develo** fully 3D-printed high-frequency multilayer FHE, which can be conformally integrated into irregular surfaces (e.g., vehicle bumpers) for applications such as vehicle radars and wearable electronics. △ Less

Submitted 23 February, 2023; originally announced February 2023.

MSC Class: 78-05

arXiv:2211.02147 [pdf, other]

A Survey on Reinforcement Learning in Aviation Applications

Authors: Pouria Razzaghi, Amin Tabrizian, Wei Guo, Shulu Chen, Abenezer Taye, Ellis Thompson, Alexis Bregeon, Ali Baheri, Peng Wei

Abstract: Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential d… ▽ More Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation. △ Less

Submitted 22 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2206.10955 [pdf, other]

Adversarial Reconfigurable Intelligent Surface Against Physical Layer Key Generation

Authors: Zhuangkun Wei, Bin Li, Weisi Guo

Abstract: The development of reconfigurable intelligent surfaces (RIS) has recently advanced the research of physical layer security (PLS). Beneficial impacts of RIS include but are not limited to offering a new degree-of-freedom (DoF) for key-less PLS optimization, and increasing channel randomness for physical layer secret key generation (PL-SKG). However, there is a lack of research studying how adversar… ▽ More The development of reconfigurable intelligent surfaces (RIS) has recently advanced the research of physical layer security (PLS). Beneficial impacts of RIS include but are not limited to offering a new degree-of-freedom (DoF) for key-less PLS optimization, and increasing channel randomness for physical layer secret key generation (PL-SKG). However, there is a lack of research studying how adversarial RIS can be used to attack and obtain legitimate secret keys generated by PL-SKG. In this work, we show an Eve-controlled adversarial RIS (Eve-RIS), by inserting into the legitimate channel a random and reciprocal channel, can partially reconstruct the secret keys from the legitimate PL-SKG process. To operationalize this concept, we design Eve-RIS schemes against two PL-SKG techniques used: (i) the CSI-based PL-SKG, and (ii) the two-way cross multiplication based PL-SKG. The channel probing at Eve-RIS is realized by compressed sensing designs with a small number of radio-frequency (RF) chains. Then, the optimal RIS phase is obtained by maximizing the Eve-RIS inserted deceiving channel. Our analysis and results show that even with a passive RIS, our proposed Eve-RIS can achieve a high key match rate with legitimate users, and is resistant to most of the current defensive approaches. This means the novel Eve-RIS provides a new eavesdrop** threat on PL-SKG, which can spur new research areas to counter adversarial RIS attacks. △ Less

Submitted 6 November, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

arXiv:2205.09316 [pdf, other]

Dynamic Clustering and Power Control for Two-Tier Wireless Federated Learning

Authors: Wei Guo, Chuan Huang, Xiaoqi Qin, Lian Yang, Wei Zhang

Abstract: Federated learning (FL) has been recognized as a promising distributed learning paradigm to support intelligent applications at the wireless edge, where a global model is trained iteratively through the collaboration of the edge devices without sharing their data. However, due to the relatively large communication cost between the devices and parameter server (PS), direct computing based on the in… ▽ More Federated learning (FL) has been recognized as a promising distributed learning paradigm to support intelligent applications at the wireless edge, where a global model is trained iteratively through the collaboration of the edge devices without sharing their data. However, due to the relatively large communication cost between the devices and parameter server (PS), direct computing based on the information from the devices may not be resource efficient. This paper studies the joint communication and learning design for the over-the-air computation (AirComp)-based two-tier wireless FL scheme, where the lead devices first collect the local gradients from their nearby subordinate devices, and then send the merged results to the PS for the second round of aggregation. We establish a convergence result for the proposed scheme and derive the upper bound on the optimality gap between the expected and optimal global loss values. Next, based on the device distance and data importance, we propose a hierarchical clustering method to build the two-tier structure. Then, with only the instantaneous channel state information (CSI), we formulate the optimality gap minimization problem and solve it by using an efficient alternating minimization method. Numerical results show that the proposed scheme outperforms the baseline ones. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.09306 [pdf, other]

Joint Device Selection and Power Control for Wireless Federated Learning

Authors: Wei Guo, Ran Li, Chuan Huang, Xiaoqi Qin, Kaiming Shen, Wei Zhang

Abstract: This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an… ▽ More This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training and upload the updated model parameters to the PS via over-the-air computation (AirComp). First, we propose an AirComp-based adaptive reweighing scheme for the aggregation of local updated models, where the model aggregation weights are directly determined by the uplink transmit power values of the selected devices and which enables the joint learning and communication optimization simply by the device selection and power control. Furthermore, we provide a convergence analysis for the proposed wireless FL algorithm and the upper bound on the expected optimality gap between the expected and optimal global loss values is derived. With instantaneous channel state information (CSI), we formulate the optimality gap minimization problems under both the individual and sum uplink transmit power constraints, respectively, which are shown to be solved by the semidefinite programming (SDR) technique. Numerical results reveal that our proposed wireless FL algorithm achieves close to the best performance by using the ideal FedAvg scheme with error-free model exchange and full device participation. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2205.00581 [pdf]

Using a novel fractional-order gradient method for CNN back-propagation

Authors: Mundher Mohammed Taresh, Ningbo Zhu, Talal Ahmed Ali Ali, Mohammed Alghaili, Weihua Guo

Abstract: Computer-aided diagnosis tools have experienced rapid growth and development in recent years. Among all, deep learning is the most sophisticated and popular tool. In this paper, researchers propose a novel deep learning model and apply it to COVID-19 diagnosis. Our model uses the tool of fractional calculus, which has the potential to improve the performance of gradient methods. To this end, the r… ▽ More Computer-aided diagnosis tools have experienced rapid growth and development in recent years. Among all, deep learning is the most sophisticated and popular tool. In this paper, researchers propose a novel deep learning model and apply it to COVID-19 diagnosis. Our model uses the tool of fractional calculus, which has the potential to improve the performance of gradient methods. To this end, the researcher proposes a fractional-order gradient method for the back-propagation of convolutional neural networks based on the Caputo definition. However, if only the first term of the infinite series of the Caputo definition is used to approximate the fractional-order derivative, the length of the memory is truncated. Therefore, the fractional-order gradient (FGD) method with a fixed memory step and an adjustable number of terms is used to update the weights of the layers. Experiments were performed on the COVIDx dataset to demonstrate fast convergence, good accuracy, and the ability to bypass the local optimal point. We also compared the performance of the developed fractional-order neural networks and Integer-order neural networks. The results confirmed the effectiveness of our proposed model in the diagnosis of COVID-19. △ Less

Submitted 1 May, 2022; originally announced May 2022.

Comments: 9 pages, 6 figuers

MSC Class: D.1.2; F.3.1; F.4.1 ACM Class: F.2.2, I.2.7 K.5

arXiv:2201.07210 [pdf]

Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time

Authors: Wenzhe Guo, Mohammed E. Fouda, Ahmed M. Eltawil, Khaled Nabil Salama

Abstract: Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised train… ▽ More Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on dynamic-vision-sensor (DVS) recorded datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT. △ Less

Submitted 13 December, 2021; originally announced January 2022.

Comments: 16

arXiv:2111.00428 [pdf, other]

Reconfigurable Intelligent Surface-induced Randomness for mmWave Key Generation

Authors: Shubo Yang, Han Han, Yihong Liu, Weisi Guo, Zhibo Pang, Lei Zhang

Abstract: Secret key generation in physical layer security exploits the unpredictable random nature of wireless channels. The millimeter-wave (mmWave) channels have limited multipath and channel randomness in static environments. In this paper, for mmWave secret key generation of physical layer security, we use a reconfigurable intelligent surface (RIS) to induce randomness directly in wireless environments… ▽ More Secret key generation in physical layer security exploits the unpredictable random nature of wireless channels. The millimeter-wave (mmWave) channels have limited multipath and channel randomness in static environments. In this paper, for mmWave secret key generation of physical layer security, we use a reconfigurable intelligent surface (RIS) to induce randomness directly in wireless environments, without adding complexity to transceivers. We consider RIS to have continuous individual phase shifts (CIPS) and derive the RIS-assisted reflection channel distribution with its parameters. Then, we propose continuous group phase shifts (CGPS) to increase the randomness specifically at legal parties. Since the continuous phase shifts are expensive to implement, we analyze discrete individual phase shifts (DIPS) and derive the corresponding channel distribution, which is dependent on the quantization bit. We then derive the secret key rate (SKR) to evaluate the randomness performance. With the simulation results verifying the analytical results, this work explains the mathematical principles and lays a foundation for future mmWave evaluation and optimization of artificial channel randomness. △ Less

Submitted 8 August, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

Comments: Add contents, including continuous group phase shifts and secret key rate analysis

arXiv:2110.12785 [pdf, other]

Random Matrix based Physical Layer Secret Key Generation in Static Channels

Authors: Zhuangkun Wei, Weisi Guo

Abstract: Physical layer secret key generation exploits the reciprocal channel randomness for key generation and has proven to be an effective addition security layer in wireless communications. However, static or scarcely random channels require artificially induced dynamics to improve the secrecy performance, e.g., using intelligent reflecting surface (IRS). One key challenge is that the induced random ph… ▽ More Physical layer secret key generation exploits the reciprocal channel randomness for key generation and has proven to be an effective addition security layer in wireless communications. However, static or scarcely random channels require artificially induced dynamics to improve the secrecy performance, e.g., using intelligent reflecting surface (IRS). One key challenge is that the induced random phase from IRS is also reflected in the direction to eavesdroppers (Eve). This leakage enables Eve nodes to estimate the legitimate channels and secret key via a globally known pilot sequence. To mitigate the secret key leakage issue, we propose to exploit random matrix theory to inform the design of a new physical layer secret key generation (PL-SKG) algorithm. We prove that, when sending appropriate random Gaussian matrices, the singular values of Alice's and Bob's received signals follow a similar probability distribution. Leveraging these common singular values, we propose a random Gaussian matrix based PL-SKG (RGM PL-SKG), which avoids the usages of the globally known pilot and thereby prevents the aforementioned leakage issue. Our results show the following: (i) high noise resistance which leads to superior secret key rate (SKR) improvement (up to 300%) in low SNR regime, and (ii) general improved SKR performance against multiple colluded Eves. We believe our combination of random matrix theory and PL-SKG shows a new paradigm to secure the wireless communication channels. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.10435 [pdf, other]

RSS-based Multiple Sources Localization with Unknown Log-normal Shadow Fading

Authors: Yueyan Chu, Wenbin Guo, Kangyong You, Lei Zhao, Tao Peng, Wenbo Wang

Abstract: Multi-source localization based on received signal strength (RSS) has drawn great interest in wireless sensor networks. However, the shadow fading term caused by obstacles cannot be separated from the received signal, which leads to severe error in location estimate. In this paper, we approximate the log-normal sum distribution through Fenton-Wilkinson method to formulate a non-convex maximum like… ▽ More Multi-source localization based on received signal strength (RSS) has drawn great interest in wireless sensor networks. However, the shadow fading term caused by obstacles cannot be separated from the received signal, which leads to severe error in location estimate. In this paper, we approximate the log-normal sum distribution through Fenton-Wilkinson method to formulate a non-convex maximum likelihood (ML) estimator with unknown shadow fading factor. In order to overcome the difficulty in solving the non-convex problem, we propose a novel algorithm to estimate the locations of sources. Specifically, the region is divided into $N$ grids firstly, and the multi-source localization is converted into a sparse recovery problem so that we can obtain the sparse solution. Then we utilize the K-means clustering method to obtain the rough locations of the off-grid sources as the initial feasible point of the ML estimator. Finally, an iterative refinement of the estimated locations is proposed by dynamic updating of the localization dictionary. The proposed algorithm can efficiently approach a superior local optimal solution of the ML estimator. It is shown from the simulation results that the proposed method has a promising localization performance and improves the robustness for multi-source localization in unknown shadow fading environments. Moreover, the proposed method provides a better computational complexity from $O(K^3N^3)$ to $O(N^3)$. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: 11 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:2105.15097

arXiv:2106.08637 [pdf]

Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

Authors: Tan Liu, Wu Guo, Bin Gu

Abstract: Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken doc… ▽ More Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic classification. Experiments conducted on a subset selected from Switchboard corpus show that our proposed framework outperforms the conventional ASR+TTC systems and achieves a 3.13% improvement in ACC. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2105.15097 [pdf, other]

Multiple Sources Localization with Sparse Recovery under Log-normal Shadow Fading

Authors: Yueyan Chu, Kangyong You, Wenbin Guo

Abstract: Localization based on received signal strength (RSS) has drawn great interest in the wireless sensor network (WSN). In this paper, we investigate the RSS-based multi-sources localization problem with unknown transmitted power under shadow fading. The log-normal shadowing effect is approximated through Fenton-Wilkinson (F-W) method and maximum likelihood estimation is adopted to optimize the RSS-ba… ▽ More Localization based on received signal strength (RSS) has drawn great interest in the wireless sensor network (WSN). In this paper, we investigate the RSS-based multi-sources localization problem with unknown transmitted power under shadow fading. The log-normal shadowing effect is approximated through Fenton-Wilkinson (F-W) method and maximum likelihood estimation is adopted to optimize the RSS-based multiple sources localization problem. Moreover, we exploit a sparse recovery and weighted average of candidates (SR-WAC) based method to set up an initiation, which can efficiently approach a superior local optimal solution. It is shown from the simulation results that the proposed method has a much higher localization accuracy and outperforms the other △ Less

Submitted 31 March, 2021; originally announced May 2021.

arXiv:2104.00230 [pdf, other]

Bidirectional Multiscale Feature Aggregation for Speaker Verification

Authors: Jiajun Qi, Wu Guo, Bin Gu

Abstract: In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from… ▽ More In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from different stages, an attentional fusion module is designed to compute the fusion weights. Experiments are conducted on the NIST SRE16 and VoxCeleb1 datasets. The experimental results demonstrate the effectiveness of the bidirectional aggregation strategy and show that the proposed attentional fusion module can further improve the performance. △ Less

Submitted 31 March, 2021; originally announced April 2021.

arXiv:2103.15421 [pdf, other]

Improved Meta-Learning Training for Speaker Verification

Authors: Yafeng Chen, Wu Guo, Bin Gu

Abstract: Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional… ▽ More Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional transformation coefficients are trained with only the PN loss. The transformation coefficients are used to modify the original backbone embedding network in the x-vector extraction process. Furthermore, the random erasing data augmentation technique is applied to all support samples in each episode to construct positive pairs, and a contrastive loss between the augmented and the original support samples is added to the objective in model training. Experiments are carried out on the SITW and VOiCES databases. Both of the methods can obtain consistent improvements over existing meta-learning training frameworks. By combining these two methods, we can observe further improvements on these two databases. △ Less

Submitted 2 August, 2023; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:2011.12722 [pdf, other]

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Authors: Anzhu Yu, Wenyue Guo, Bing Liu, Xin Chen, Xin Wang, Xuefeng Cao, Bingchuan Jiang

Abstract: We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. While previous learning based reconstruction approaches performed quite well, most of them estimate depth maps at a fixed resolution using plane sweep volumes with a fixed depth hypothesis at each plane, which requires densely sampled planes for desired accuracy and therefore is difficult to achiev… ▽ More We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. While previous learning based reconstruction approaches performed quite well, most of them estimate depth maps at a fixed resolution using plane sweep volumes with a fixed depth hypothesis at each plane, which requires densely sampled planes for desired accuracy and therefore is difficult to achieve high resolution depth maps. In this paper we introduce a coarseto-fine depth inference strategy to achieve high resolution depth. This strategy estimates the depth map at coarsest level, while the depth maps at finer levels are considered as the upsampled depth map from previous level with pixel-wise depth residual. Thus, we narrow the depth searching range with priori information from previous level and construct new cost volumes from the pixel-wise depth residual to perform depth map refinement. Then the final depth map could be achieved iteratively since all the parameters are shared between different levels. At each level, the self-attention layer is introduced to the feature extraction block for capturing the long range dependencies for depth inference task, and the cost volume is generated using similarity measurement instead of the variance based methods used in previous work. Experiments were conducted on both the DTU benchmark dataset and recently released BlendedMVS dataset. The results demonstrated that our model could outperform most state-of-the-arts (SOTA) methods. The codebase of this project is at https://github.com/ArthasMil/AACVP-MVSNet. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2010.10919

Multi-task Metric Learning for Text-independent Speaker Verification

Authors: Yafeng Chen, Wu Guo, **g**g Shi, Jiajun Qi, Tan Liu

Abstract: In this work, we introduce metric learning (ML) to enhance the deep embedding learning for text-independent speaker verification (SV). Specifically, the deep speaker embedding network is trained with conventional cross entropy loss and auxiliary pair-based ML loss function. For the auxiliary ML task, training samples of a mini-batch are first arranged into pairs, then positive and negative pairs a… ▽ More In this work, we introduce metric learning (ML) to enhance the deep embedding learning for text-independent speaker verification (SV). Specifically, the deep speaker embedding network is trained with conventional cross entropy loss and auxiliary pair-based ML loss function. For the auxiliary ML task, training samples of a mini-batch are first arranged into pairs, then positive and negative pairs are selected and weighted through their own and relative similarities, and finally the auxiliary ML loss is calculated by the similarity of the selected pairs. To evaluate the proposed method, we conduct experiments on the Speaker in the Wild (SITW) dataset. The results demonstrate the effectiveness of the proposed method. △ Less

Submitted 22 March, 2023; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: Not a particularly high-quality work, so we request withdrawal

arXiv:2010.10163 [pdf, other]

Claw U-Net: A Unet-based Network with Deep Feature Concatenation for Scleral Blood Vessel Segmentation

Authors: Chang Yao, **gyu Tang, Menghan Hu, Yue Wu, Wenyi Guo, Qingli Li, Xiao-** Zhang

Abstract: Sturge-Weber syndrome (SWS) is a vascular malformation disease, and it may cause blindness if the patient's condition is severe. Clinical results show that SWS can be divided into two types based on the characteristics of scleral blood vessels. Therefore, how to accurately segment scleral blood vessels has become a significant problem in computer-aided diagnosis. In this research, we propose to co… ▽ More Sturge-Weber syndrome (SWS) is a vascular malformation disease, and it may cause blindness if the patient's condition is severe. Clinical results show that SWS can be divided into two types based on the characteristics of scleral blood vessels. Therefore, how to accurately segment scleral blood vessels has become a significant problem in computer-aided diagnosis. In this research, we propose to continuously upsample the bottom layer's feature maps to preserve image details, and design a novel Claw UNet based on UNet for scleral blood vessel segmentation. Specifically, the residual structure is used to increase the number of network layers in the feature extraction stage to learn deeper features. In the decoding stage, by fusing the features of the encoding, upsampling, and decoding parts, Claw UNet can achieve effective segmentation in the fine-grained regions of scleral blood vessels. To effectively extract small blood vessels, we use the attention mechanism to calculate the attention coefficient of each position in images. Claw UNet outperforms other UNet-based networks on scleral blood vessel image dataset. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 5 pages,4 figures

arXiv:2010.06248

Exploring Universal Speech Attributes for Speaker Verification with an Improved Cross-stitch Network

Authors: Jiajun Qi, Wu Guo, **g**g Shi, Yafeng Chen, Tan Liu

Abstract: The universal speech attributes for x-vector based speaker verification (SV) are addressed in this paper. The manner and place of articulation form the fundamental speech attribute unit (SAU), and then new speech attribute (NSA) units for acoustic modeling are generated by tied tri-SAU states. An improved cross-stitch network is adopted as a multitask learning (MTL) framework for integrating these… ▽ More The universal speech attributes for x-vector based speaker verification (SV) are addressed in this paper. The manner and place of articulation form the fundamental speech attribute unit (SAU), and then new speech attribute (NSA) units for acoustic modeling are generated by tied tri-SAU states. An improved cross-stitch network is adopted as a multitask learning (MTL) framework for integrating these universal speech attributes into the x-vector network training process. Experiments are conducted on common condition 5 (CC5) of the core-core and the 10 s-10 s tests of the NIST SRE10 evaluation set, and the proposed algorithm can achieve consistent improvements over the baseline x-vector on both these tasks. △ Less

Submitted 31 May, 2023; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: Not a particularly high-quality work, so we request withdrawal

arXiv:2007.13290 [pdf, ps, other]

doi 10.1016/j.sigpro.2020.107729

Deep Learning Methods for Solving Linear Inverse Problems: Research Directions and Paradigms

Authors: Yanna Bai, Wei Chen, Jie Chen, Weisi Guo

Abstract: The linear inverse problem is fundamental to the development of various scientific areas. Innumerable attempts have been carried out to solve different variants of the linear inverse problem in different applications. Nowadays, the rapid development of deep learning provides a fresh perspective for solving the linear inverse problem, which has various well-designed network architectures results in… ▽ More The linear inverse problem is fundamental to the development of various scientific areas. Innumerable attempts have been carried out to solve different variants of the linear inverse problem in different applications. Nowadays, the rapid development of deep learning provides a fresh perspective for solving the linear inverse problem, which has various well-designed network architectures results in state-of-the-art performance in many applications. In this paper, we present a comprehensive survey of the recent progress in the development of deep learning for solving various linear inverse problems. We review how deep learning methods are used in solving different linear inverse problems, and explore the structured neural network architectures that incorporate knowledge used in traditional methods. Furthermore, we identify open challenges and potential future directions along this research line. △ Less

Submitted 11 August, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

Comments: 60 pages; Publication in (Elsevier) Signal Processing, 2020

Journal ref: Signal Processing, 2020

arXiv:2007.01628 [pdf, other]

doi 10.1109/TIP.2021.3064433

HDR-GAN: HDR Image Reconstruction from Multi-Exposed LDR Images with Large Motions

Authors: Yuzhen Niu, Jianbin Wu, Wenxi Liu, Wenzhong Guo, Rynson W. H. Lau

Abstract: Synthesizing high dynamic range (HDR) images from multiple low-dynamic range (LDR) exposures in dynamic scenes is challenging. There are two major problems caused by the large motions of foreground objects. One is the severe misalignment among the LDR images. The other is the missing content due to the over-/under-saturated regions caused by the moving objects, which may not be easily compensated… ▽ More Synthesizing high dynamic range (HDR) images from multiple low-dynamic range (LDR) exposures in dynamic scenes is challenging. There are two major problems caused by the large motions of foreground objects. One is the severe misalignment among the LDR images. The other is the missing content due to the over-/under-saturated regions caused by the moving objects, which may not be easily compensated for by the multiple LDR exposures. Thus, it requires the HDR generation model to be able to properly fuse the LDR images and restore the missing details without introducing artifacts. To address these two problems, we propose in this paper a novel GAN-based model, HDR-GAN, for synthesizing HDR images from multi-exposed LDR images. To our best knowledge, this work is the first GAN-based approach for fusing multi-exposed LDR images for HDR reconstruction. By incorporating adversarial learning, our method is able to produce faithful information in the regions with missing content. In addition, we also propose a novel generator network, with a reference-based residual merging block for aligning large object motions in the feature domain, and a deep HDR supervision scheme for eliminating artifacts of the reconstructed HDR images. Experimental results demonstrate that our model achieves state-of-the-art reconstruction performance over the prior HDR methods on diverse scenes. △ Less

Submitted 3 July, 2020; originally announced July 2020.

arXiv:2006.03568 [pdf, other]

Graph Layer Security: Encrypting Information via Common Networked Physics

Authors: Zhuangkun Wei, Liang Wang, Schyler Chengyao Sun, Bin Li, Weisi Guo

Abstract: The proliferation of low-cost Internet of Things (IoT) devices has led to a race between wireless security and channel attacks. Traditional cryptography requires high-computational power and is not suitable for low-power IoT scenarios. Whist, recently developed physical layer security (PLS) can exploit common wireless channel state information (CSI), its sensitivity to channel estimation makes the… ▽ More The proliferation of low-cost Internet of Things (IoT) devices has led to a race between wireless security and channel attacks. Traditional cryptography requires high-computational power and is not suitable for low-power IoT scenarios. Whist, recently developed physical layer security (PLS) can exploit common wireless channel state information (CSI), its sensitivity to channel estimation makes them vulnerable from attacks. In this work, we exploit an alternative common physics shared between IoT transceivers: the monitored channel-irrelevant physical networked dynamics (e.g., water/oil/gas/electrical signal-flows). Leveraging this, we propose for the first time, graph layer security (GLS), by exploiting the dependency in physical dynamics among network nodes for information encryption and decryption. A graph Fourier transform (GFT) operator is used to characterize such dependency into a graph-bandlimted subspace, which allows the generations of channel-irrelevant cipher keys by maximizing the secrecy rate. We evaluate our GLS against designed active and passive attackers, using IEEE 39-Bus system. Results demonstrate that, GLS is not reliant on wireless CSI, and can combat attackers that have partial networked dynamic knowledge (realistic access to full dynamic and critical nodes remains challenging). We believe this novel GLS has widespread applicability in secure health monitoring and for Digital Twins in adversarial radio environments. △ Less

Submitted 23 May, 2022; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:2004.13198 [pdf, ps, other]

doi 10.1109/JSYST.2020.3036129

Uncertainty of Resilience in Complex Networks with Nonlinear Dynamics

Authors: Giannis Moutsinas, Mengbang Zou, Weisi Guo

Abstract: Resilience is a system's ability to maintain its function when perturbations and errors occur. Whilst we understand low-dimensional networked systems' behavior well, our understanding of systems consisting of a large number of components is limited. Recent research in predicting the network level resilience pattern has advanced our understanding of the coupling relationship between global network… ▽ More Resilience is a system's ability to maintain its function when perturbations and errors occur. Whilst we understand low-dimensional networked systems' behavior well, our understanding of systems consisting of a large number of components is limited. Recent research in predicting the network level resilience pattern has advanced our understanding of the coupling relationship between global network topology and local nonlinear component dynamics. However, when there is uncertainty in the model parameters, our understanding of how this translates to uncertainty in resilience is unclear for a large-scale networked system. Here we develop a polynomial chaos expansion method to estimate the resilience for a wide range of uncertainty distributions. By applying this method to case studies, we not only reveal the general resilience distribution with respect to the topology and dynamics sub-models, but also identify critical aspects to inform better monitoring to reduce uncertainty. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: 8pages, 7figures

arXiv:2004.09615 [pdf, other]

doi 10.1109/TSP.2020.3032408

Sampling and Inference of Networked Dynamics using Log-Koopman Nonlinear Graph Fourier Transform

Authors: Zhuangkun Wei, Bin Li, Chengyao Sun, Weisi Guo

Abstract: Networked nonlinear dynamics underpin the complex functionality of many engineering, social, biological, and ecological systems. Monitoring the networked dynamics via the minimum subset of nodes is essential for a variety of scientific and operational purposes. When there is a lack of a explicit model and networked signal space, traditional evolution analysis and non-convex methods are insufficien… ▽ More Networked nonlinear dynamics underpin the complex functionality of many engineering, social, biological, and ecological systems. Monitoring the networked dynamics via the minimum subset of nodes is essential for a variety of scientific and operational purposes. When there is a lack of a explicit model and networked signal space, traditional evolution analysis and non-convex methods are insufficient. An important data-driven state-of-the-art method use the Koopman operator to generate a linear evolution model for a vector-valued observable of original state-space. As a result, one can derive a sampling strategy via the linear evolution property of observable. However, current polynomial Koopman operators result in a large sampling space due to: (i) the large size of polynomial based observables ($O(N^2)$, $N$ number of nodes in network), and (ii) not factoring in the nonlinear dependency between observables. In this work, to achieve linear scaling ($O(N)$) and a small set of sampling nodes, we propose to combine a novel Log-Koopman operator and nonlinear Graph Fourier Transform (NL-GFT) scheme. First, the Log-Koopman operator is able to reduce the size of observables by transforming multiplicative poly-observable to logarithm summation. Second, a nonlinear GFT concept and sampling theory are provided to exploit the nonlinear dependence of observables for Koopman linearized evolution analysis. Combined, the sampling and reconstruction algorithms are designed and demonstrated on two established application areas. The results demonstrate that the proposed Log-Koopman NL-GFT scheme can (i) linearize unknown nonlinear dynamics using $O(N)$ observables, and (ii) achieve lower number of sampling nodes, compared with the state-of-the art polynomial Koopman linear evolution analysis. △ Less

Submitted 16 October, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2002.06049 [pdf]

An Adaptive X-vector Model for Text-independent Speaker Verification

Authors: Bin Gu, Wu Guo, Lirong Dai, Jun Du

Abstract: In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in cap… ▽ More In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in capturing speaker information. Moreover, we replace conventional batch normalization (BN) with adaptive batch normalization (ABN). By dynamically generating the scaling and shifting parameters in BN, ABN adapts models to the acoustic variability arising from various factors such as channel and environmental noises. Finally, we incorporate these two methods to further improve performance. Experiments are carried out on the speaker in the wild (SITW) and VOiCES databases. The results demonstrate that the proposed methods significantly outperform the original x-vector approach. △ Less

Submitted 14 February, 2020; originally announced February 2020.

Comments: 6 pages, 3 figures

arXiv:2002.05508 [pdf, other]

Neural Network Approximation of Graph Fourier Transforms for Sparse Sampling of Networked Flow Dynamics

Authors: Alessio Pagani, Zhuangkun Wei, Ricardo Silva, Weisi Guo

Abstract: Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation… ▽ More Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation to find the minimum set of essential monitoring points, but lack performance guarantees and a theoretical framework. Here, we first develop Graph Fourier Transform (GFT) operators to compress networked contamination spreading dynamics to identify the essential principle data collection points with inference performance guarantees. We then build autoencoder (AE) inspired neural networks (NN) to generalize the GFT sampling process and under-sample further from the initial sampling set, allowing a very small set of data points to largely reconstruct the contamination dynamics over real and artificial WDNs. Various sources of the contamination are tested and we obtain high accuracy reconstruction using around 5-10% of the sample set. This general approach of compression and under-sampled recovery via neural networks can be applied to a wide range of networked infrastructures to enable digital twins. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:2001.06790 [pdf]

High-speed and high-efficiency three-dimensional shape measurement based on Gray-coded light

Authors: Zhoujie Wu, Wenbo Guo, Yueyang Li, Yihang Liu, Qican Zhang

Abstract: Fringe projection profilometry has been increasingly sought and applied in dynamic three-dimensional (3D) shape measurement. In this work, a robust and high-efficiency 3D measurement based on Gray-code light is proposed. Unlike the traditional method, a novel tripartite phase unwrap** method is proposed to avoid the jump errors on the boundary of code words, which are mainly caused by the defocu… ▽ More Fringe projection profilometry has been increasingly sought and applied in dynamic three-dimensional (3D) shape measurement. In this work, a robust and high-efficiency 3D measurement based on Gray-code light is proposed. Unlike the traditional method, a novel tripartite phase unwrap** method is proposed to avoid the jump errors on the boundary of code words, which are mainly caused by the defocusing of the projector and the motion of the tested object. Subsequently, the time-overlap** coding strategy is presented to greatly increase the coding efficiency, decreasing the projected number in each group, e.g. from 7 (3 + 4) to 4 (3 + 1) for one restored 3D frame. Combination of two proposed techniques allows to reconstruct a pixel-wise and unambiguous 3D geometry of dynamic scenes with strong noise using every 4 projected patterns. The presented techniques preserve the high anti-noise ability of Gray-coded-based method while overcoming the drawbacks of jump errors and low coding efficiency. Experiments have demonstrated that the proposed method can achieve the robust and high-efficiency 3D shape measurement of high-speed dynamic scenes even polluted by strong noise. △ Less

Submitted 19 January, 2020; originally announced January 2020.

arXiv:2001.04585 [pdf]

Gaussian speaker embedding learning for text-independent speaker verification

Authors: Bin Gu, Wu Guo

Abstract: The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in text-independent speaker verification. Nevertheless, how to extract the x-vector appropriate for the PLDA backend is a key problem. In this paper, we propose a Ga… ▽ More The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in text-independent speaker verification. Nevertheless, how to extract the x-vector appropriate for the PLDA backend is a key problem. In this paper, we propose a Gaussian noise constrained network (GNCN) to extract xvector, which adopts a multi-task learning strategy with the primary task classifying the speakers and the auxiliary task just fitting the Gaussian noises. Experiments are carried out using the SITW database. The results demonstrate the effectiveness of our proposed method △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 5 pages, 3 figures

arXiv:2001.04584 [pdf]

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

Authors: Bin Gu, Wu Guo

Abstract: This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields. (2) A Baum-Welch statistics attention (BWS… ▽ More This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields. (2) A Baum-Welch statistics attention (BWSA) mechanism is applied in pooling-layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer. Experiments are carried out on the NIST SRE16 evaluation set. The results demonstrate the effectiveness of MSCNN and show the proposed BWSA can further improve the performance of the DNN embedding system △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 5 pages,2 figures

arXiv:2001.00129 [pdf]

Attentive batch normalization for lstm-based acoustic modeling of speech recognition

Authors: Fenglin Ding, Wu Guo, Lirong Dai, Jun Du

Abstract: Batch normalization (BN) is an effective method to accelerate model training and improve the generalization performance of neural networks. In this paper, we propose an improved batch normalization technique called attentive batch normalization (ABN) in Long Short Term Memory (LSTM) based acoustic modeling for automatic speech recognition (ASR). In the proposed method, an auxiliary network is used… ▽ More Batch normalization (BN) is an effective method to accelerate model training and improve the generalization performance of neural networks. In this paper, we propose an improved batch normalization technique called attentive batch normalization (ABN) in Long Short Term Memory (LSTM) based acoustic modeling for automatic speech recognition (ASR). In the proposed method, an auxiliary network is used to dynamically generate the scaling and shifting parameters in batch normalization, and attention mechanisms are introduced to improve their regularized performance. Furthermore, two schemes, frame-level and utterance-level ABN, are investigated. We evaluate our proposed methods on Mandarin and Uyghur ASR tasks, respectively. The experimental results show that the proposed ABN greatly improves the performance of batch normalization in terms of transcription accuracy for both languages. △ Less

Submitted 31 December, 2019; originally announced January 2020.

Comments: 5 pages,1 figure, submitted to ICASSP 2020

arXiv:1912.13307 [pdf]

Attention-based gated scaling adaptative acoustic model for ctc-based speech recognition

Authors: Fenglin Ding, Wu Guo, Lirong Dai, Jun Du

Abstract: In this paper, we propose a novel adaptive technique that uses an attention-based gated scaling (AGS) scheme to improve deep feature learning for connectionist temporal classification (CTC) acoustic modeling. In AGS, the outputs of each hidden layer of the main network are scaled by an auxiliary gate matrix extracted from the lower layer by using attention mechanisms. Furthermore, the auxiliary AG… ▽ More In this paper, we propose a novel adaptive technique that uses an attention-based gated scaling (AGS) scheme to improve deep feature learning for connectionist temporal classification (CTC) acoustic modeling. In AGS, the outputs of each hidden layer of the main network are scaled by an auxiliary gate matrix extracted from the lower layer by using attention mechanisms. Furthermore, the auxiliary AGS layer and the main network are jointly trained without requiring second-pass model training or additional speaker information, such as speaker code. On the Mandarin AISHELL-1 datasets, the proposed AGS yields a 7.94% character error rate (CER). To the best of our knowledge, this result is the best recognition accuracy achieved on this dataset by using an end-to-end framework. △ Less

Submitted 31 December, 2019; originally announced December 2019.

Comments: 5 pages,2 figures, submitted to ICASSP 2020

arXiv:1911.08021 [pdf, other]

doi 10.1109/TSP.2020.3009875

Parametric Sparse Bayesian Dictionary Learning for Multiple Sources Localization with Propagation Parameters Uncertainty and Nonuniform Noise

Authors: Kangyong You, Wenbin Guo, Tao Peng, Yueliang Liu, Peiliang Zuo, Wenbo Wang

Abstract: Received signal strength (RSS) based source localization method is popular due to its simplicity and low cost. However, this method is highly dependent on the propagation model which is not easy to be captured in practice. Moreover, most existing works only consider the single source and the identical measurement noise scenario, while in practice multiple co-channel sources may transmit simultaneo… ▽ More Received signal strength (RSS) based source localization method is popular due to its simplicity and low cost. However, this method is highly dependent on the propagation model which is not easy to be captured in practice. Moreover, most existing works only consider the single source and the identical measurement noise scenario, while in practice multiple co-channel sources may transmit simultaneously, and the measurement noise tends to be nonuniform. In this paper, we study the multiple co-channel sources localization (MSL) problem under unknown nonuniform noise, while jointly estimating the parametric propagation model. Specifically, we model the MSL problem as being parameterized by the unknown source locations and propagation parameters, and then reformulate it as a joint parametric sparsifying dictionary learning (PSDL) and sparse signal recovery (SSR) problem which is solved under the framework of sparse Bayesian learning with iterative parametric dictionary approximation. Furthermore, multiple snapshot measurements are utilized to improve the localization accuracy, and the Cramer-Rao lower bound (CRLB) is derived to analyze the theoretical estimation error bound. Comparing with the state-of-the-art sparsity-based MSL algorithms as well as CRLB, extensive simulations show the importance of jointly inferring the propagation parameters,and highlight the effectiveness and superiority of the proposed method. △ Less

Submitted 22 December, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: 12 pages, 9 figures

Showing 1–50 of 62 results for author: Guo, W