Search | arXiv e-print repository

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Map** Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.20279 [pdf, other]

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent extracted by 2D VAEs without quantization. The temporal compression is simply realized by uniform frame sampling which results in unsmooth motion between consecutive frames. Currently, there lacks of a commonly used continuous video (3D) VAE for latent diffusion-based video models in the research community. Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). The compatibility is achieved by the proposed novel latent space regularization, which involves formulating a regularization loss using the image VAE. Benefiting from the latent space compatibility, video models can be trained seamlessly from pre-trained T2I or video models in a truly spatio-temporally compressed latent space, rather than simply sampling video frames at equal intervals. With our CV-VAE, existing video models can generate four times more frames with minimal finetuning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed video VAE. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

arXiv:2405.18435 [pdf, other]

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2404.01949 [pdf]

Heuristic Optimization of Amplifier Reconfiguration Process for Autonomous Driving Optical Networks

Authors: Qizhi Qiu, Xiaomin Liu, Yihao Zhang, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We propose a heuristic-based optimization scheme for reliable optical amplifier reconfiguration process in ADON. In the experiment on a commercial testbed, the scheme prevents a 0.48-dB Q-factor degradation and outperforms 97.3% random solutions. We propose a heuristic-based optimization scheme for reliable optical amplifier reconfiguration process in ADON. In the experiment on a commercial testbed, the scheme prevents a 0.48-dB Q-factor degradation and outperforms 97.3% random solutions. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.10094 [pdf, other]

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Authors: Qianjiang Hu, Zhimin Zhang, Wei Hu

Abstract: Autonomous driving demands high-quality LiDAR data, yet the cost of physical LiDAR sensors presents a significant scaling-up challenge. While recent efforts have explored deep generative models to address this issue, they often consume substantial computational resources with slow generation speeds while suffering from a lack of realism. To address these limitations, we introduce RangeLDM, a novel… ▽ More Autonomous driving demands high-quality LiDAR data, yet the cost of physical LiDAR sensors presents a significant scaling-up challenge. While recent efforts have explored deep generative models to address this issue, they often consume substantial computational resources with slow generation speeds while suffering from a lack of realism. To address these limitations, we introduce RangeLDM, a novel approach for rapidly generating high-quality range-view LiDAR point clouds via latent diffusion models. We achieve this by correcting range-view data distribution for accurate projection from point clouds to range images via Hough voting, which has a critical impact on generative learning. We then compress the range images into a latent space with a variational autoencoder, and leverage a diffusion model to enhance expressivity. Additionally, we instruct the model to preserve 3D structural fidelity by devising a range-guided discriminator. Experimental results on KITTI-360 and nuScenes datasets demonstrate both the robust expressiveness and fast speed of our LiDAR point cloud generation. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.01860 [pdf, ps, other]

Outlier Accommodation for GNSS Precise Point Positioning using Risk-Averse State Estimation

Authors: Wang Hu, Jean-Bernard Uwineza, Jay A. Farrell

Abstract: Reliable and precise absolute positioning is necessary in the realm of Connected Automated Vehicles (CAV). Global Navigation Satellite Systems (GNSS) provides the foundation for absolute positioning. Recently enhanced Precise Point Positioning (PPP) technology now offers corrections for GNSS on a global scale, with the potential to achieve accuracy suitable for real-time CAV applications. However,… ▽ More Reliable and precise absolute positioning is necessary in the realm of Connected Automated Vehicles (CAV). Global Navigation Satellite Systems (GNSS) provides the foundation for absolute positioning. Recently enhanced Precise Point Positioning (PPP) technology now offers corrections for GNSS on a global scale, with the potential to achieve accuracy suitable for real-time CAV applications. However, in obstructed sky conditions, GNSS signals are often affected by outliers; therefore, addressing outliers is crucial. In GNSS applications, there are many more measurements available than are required to meet the specification. Therefore, selecting measurements to avoid outliers is of interest. The recently developed Risk-Averse Performance-Specified (RAPS) state estimation optimally selects measurements to minimize outlier risk while meeting a positive semi-definite constraint on performance; at present, the existing solution methods are not suitable for real-time computation and have not been demonstrated using challenging real-world data or in Real-time PPP (RT-PPP) applications. This article makes contributions in a few directions. First, it uses a diagonal performance specification, which reduces computational costs relative to the positive semi-definite constraint. Second, this article considers GNSS RT-PPP applications. Third, the experiments use real-world GNSS data collected in challenging environments. The RT-PPP experimental results show that among the compared methods: all achieve comparable performance in open-sky conditions, and all exceed the Society of Automotive Engineers (SAE) specification; however, in challenging environments, the diagonal RAPS approach shows improvement of 6-19% over traditional methods. Throughout, RAPS achieves the lowest estimation risk. △ Less

Submitted 13 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 7 pages,2 figures, Accepted by 2024 American Control Conference

arXiv:2401.12173 [pdf, other]

Waveform-Domain Complementary Signal Sets for Interrupted Sampling Repeater Jamming Suppression

Authors: Hanning Su, Qinglong Bao, Jiameng Pan, Fucheng Guo, Weidong Hu

Abstract: The interrupted-sampling repeater jamming (ISRJ) is coherent and has the characteristic of suppression and deception to degrade the radar detection capabilities. The study focuses on anti-ISRJ techniques in the waveform domain, primarily capitalizing on waveform design and and anti-jamming signal processing methods in the waveform domain. By exploring the relationship between waveform-domain adapt… ▽ More The interrupted-sampling repeater jamming (ISRJ) is coherent and has the characteristic of suppression and deception to degrade the radar detection capabilities. The study focuses on anti-ISRJ techniques in the waveform domain, primarily capitalizing on waveform design and and anti-jamming signal processing methods in the waveform domain. By exploring the relationship between waveform-domain adaptive matched filtering (WD-AMF) output and waveform-domain signals, we demonstrate that ISRJ can be effectively suppressed when the transmitted waveform exhibits waveform-domain complementarity. We introduce a phase-coded (PC) waveform set with waveform-domain complementarity and propose a method for generating such waveform sets of arbitrary code lengths. The performance of WD-AMF are further developed due to the designed waveforms, and simulations affirm the superior adaptive anti-jamming capabilities of the designed waveforms compared to traditional ones. Remarkably, this improved performance is achieved without the need for prior knowledge of ISRJ interference parameters at either the transmitter or receiver stages. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2310.04677 [pdf, other]

AG-CRC: Anatomy-Guided Colorectal Cancer Segmentation in CT with Imperfect Anatomical Knowledge

Authors: Rongzhao Zhang, Zhian Bai, Ruoying Yu, Wenrao Pang, Lingyun Wang, Lifeng Zhu, Xiaofan Zhang, Huan Zhang, Weiguo Hu

Abstract: When delineating lesions from medical images, a human expert can always keep in mind the anatomical structure behind the voxels. However, although high-quality (though not perfect) anatomical information can be retrieved from computed tomography (CT) scans with modern deep learning algorithms, it is still an open problem how these automatically generated organ masks can assist in addressing challe… ▽ More When delineating lesions from medical images, a human expert can always keep in mind the anatomical structure behind the voxels. However, although high-quality (though not perfect) anatomical information can be retrieved from computed tomography (CT) scans with modern deep learning algorithms, it is still an open problem how these automatically generated organ masks can assist in addressing challenging lesion segmentation tasks, such as the segmentation of colorectal cancer (CRC). In this paper, we develop a novel Anatomy-Guided segmentation framework to exploit the auto-generated organ masks to aid CRC segmentation from CT, namely AG-CRC. First, we obtain multi-organ segmentation (MOS) masks with existing MOS models (e.g., TotalSegmentor) and further derive a more robust organ of interest (OOI) mask that may cover most of the colon-rectum and CRC voxels. Then, we propose an anatomy-guided training patch sampling strategy by optimizing a heuristic gain function that considers both the proximity of important regions (e.g., the tumor or organs of interest) and sample diversity. Third, we design a novel self-supervised learning scheme inspired by the topology of tubular organs like the colon to boost the model performance further. Finally, we employ a masked loss scheme to guide the model to focus solely on the essential learning region. We extensively evaluate the proposed method on two CRC segmentation datasets, where substantial performance improvement (5% to 9% in Dice) is achieved over current state-of-the-art medical image segmentation models, and the ablation studies further evidence the efficacy of every proposed component. △ Less

Submitted 30 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: under review

arXiv:2309.12552 [pdf, other]

Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model

Authors: Hanjie Jiang, Ye Zhou, Hann Woei Ho, Wenjie Hu

Abstract: Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper prop… ▽ More Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper proposes a novel adaptive model predictive control (AMPC) strategy with an associated linear parameter varying (LPV) model for controlling the engine-driven DFLS. This LPV model is derived from a global network model, which is trained off-line with data obtained from a general mean value engine model for two-stroke aviation engines. Different network models, including multi-layer perceptron, Elman, and radial basis function (RBF), are evaluated and compared in this study. The results demonstrate that the RBF model exhibits higher prediction accuracy and robustness in the DFLS application. Based on the trained RBF model, the proposed AMPC approach constructs an associated network that directly outputs the LPV model parameters as an adaptive, robust, and efficient prediction model. The efficiency of the proposed approach is demonstrated through numerical simulations of a vertical take-off thrust preparation process for the DFLS. The simulation results indicate that the proposed AMPC method can effectively control the DFLS thrust with a relative error below 3.5%. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.06862 [pdf]

doi 10.1364/JOCN.499530

Building a digital twin of EDFA: a grey-box modeling approach

Authors: Yichen Liu, Xiaomin Liu, Yihao Zhang, Meng Cai, Mengfan Fu, Xueying Zhong, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: To enable intelligent and self-driving optical networks, high-accuracy physical layer models are required. The dynamic wavelength-dependent gain effects of non-constant-pump erbium-doped fiber amplifiers (EDFAs) remain a crucial problem in terms of modeling, as it determines optical-to-signal noise ratio as well as the magnitude of fiber nonlinearities. Black-box data-driven models have been widel… ▽ More To enable intelligent and self-driving optical networks, high-accuracy physical layer models are required. The dynamic wavelength-dependent gain effects of non-constant-pump erbium-doped fiber amplifiers (EDFAs) remain a crucial problem in terms of modeling, as it determines optical-to-signal noise ratio as well as the magnitude of fiber nonlinearities. Black-box data-driven models have been widely studied, but it requires a large size of data for training and suffers from poor generalizability. In this paper, we derive the gain spectra of EDFAs as a simple univariable linear function, and then based on it we propose a grey-box EDFA gain modeling scheme. Experimental results show that for both automatic gain control (AGC) and automatic power control (APC) EDFAs, our model built with 8 data samples can achieve better performance than the neural network (NN) based model built with 900 data samples, which means the required data size for modeling can be reduced by at least two orders of magnitude. Moreover, in the experiment the proposed model demonstrates superior generalizability to unseen scenarios since it is based on the underlying physics of EDFAs. The results indicate that building a customized digital twin of each EDFA in optical networks become feasible, which is essential especially for next generation multi-band network operations. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.03368 [pdf, other]

Waveform-Domain Adaptive Matched Filtering for Suppressing Interrupted-Sampling Repeater Jamming

Authors: Hanning Su, Qinglong Bao, Jiameng Pan, Fucheng Guo, Weidong Hu

Abstract: The inadequate adaptability to flexible interference scenarios remains an unresolved challenge in the majority of techniques utilized for mitigating interrupted-sampling repeater jamming (ISRJ). Matched filtering system based methods is desirable to incorporate anti-ISRJ measures based on prior ISRJ modeling, either preceding or succeeding the matched filtering. Due to the partial matching nature… ▽ More The inadequate adaptability to flexible interference scenarios remains an unresolved challenge in the majority of techniques utilized for mitigating interrupted-sampling repeater jamming (ISRJ). Matched filtering system based methods is desirable to incorporate anti-ISRJ measures based on prior ISRJ modeling, either preceding or succeeding the matched filtering. Due to the partial matching nature of ISRJ, its characteristics are revealed during the process of matched filtering. Therefore, this paper introduces an extended domain called the waveform domain within the matched filtering process. On this domain, an adaptive matched filtering model, known as the waveform-domain adaptive matched filtering (WD-AMF), is established to tackle the problem of ISRJ suppression without relying on a pre-existing ISRJ model. The output of the WD-AMF encompasses an adaptive filtering term and a compensation term. The adaptive filtering term encompasses the adaptive integration outcomes in the waveform domain, which are determined by an adaptive weighted function. This function, akin to a collection of bandpass filters, decomposes the integrated function into multiple components, some of which contain interference while others do not. The compensation term adheres to an integrated guideline for discerning the presence of signal components or noise within the integrated function. The integration results are then concatenated to reconstruct a compensated matched filter signal output. Simulations are conducted to showcase the exceptional capability of the proposed method in suppressing ISRJ in diverse interference scenarios, even in the absence of a pre-existing ISRJ model. △ Less

Submitted 13 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.01665 [pdf]

Multicarrier Modulation-Based Digital Radio-over-Fibre System Achieving Unequal Bit Protection with Over 10 dB SNR Gain

Authors: Yicheng Xu, Yixiao Zhu, Xiaobo Zeng, Mengfan Fu, Hexun Jiang, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We propose a multicarrier modulation-based digital radio-over-fibre system achieving unequal bit protection by bit and power allocation for subcarriers. A theoretical SNR gain of 16.1 dB is obtained in the AWGN channel and the simulation results show a 13.5 dB gain in the bandwidth-limited case. We propose a multicarrier modulation-based digital radio-over-fibre system achieving unequal bit protection by bit and power allocation for subcarriers. A theoretical SNR gain of 16.1 dB is obtained in the AWGN channel and the simulation results show a 13.5 dB gain in the bandwidth-limited case. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2303.15124 [pdf, other]

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

Authors: Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

Abstract: Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that… ▽ More Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that automatically completes visual contents without specifying masks for target areas in an image. Our proposed model includes a mask-free reconstruction network and an object-aware discriminator. The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents. The object-aware discriminator relies on the powerful recognition capabilities of the dense object detector to ensure that the markers of reconstructed images cannot be detected in any local regions. As a result, the reconstructed image can be close to the clean one as much as possible. Our proposed method is evaluated on different medical image datasets, covering multiple imaging modalities such as ultrasound (US), magnetic resonance imaging (MRI), and electron microscopy (EM), demonstrating that our method is effective and robust against various unknown missing region patterns. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2212.00532 [pdf, other]

EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks

Authors: Liyu Shi, Xiaoyan Li, Weiming Hu, Haoyuan Chen, **g Chen, Zizhen Fan, Minghe Gao, Yujie **g, Guotao Lu, Deguo Ma, Zhiyu Ma, Qingtao Meng, Dechao Tang, Hongzan Sun, Marcin Grzegorzek, Shouliang Qi, Yueyang Teng, Chen Li

Abstract: Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when comp… ▽ More Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients. △ Less

Submitted 6 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2210.10349 [pdf, other]

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Authors: Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu

Abstract: Symbolic music generation aims to generate music scores automatically. A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e.g., over 10,000 tokens), and the existing models have shortcomings in generating musical repetition structures. In this paper, we prop… ▽ More Symbolic music generation aims to generate music scores automatically. A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e.g., over 10,000 tokens), and the existing models have shortcomings in generating musical repetition structures. In this paper, we propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures (e.g., the previous 1st, 2nd, 4th and 8th bars, selected via similarity statistics); with the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost. The advantages are two-fold. First, it can capture both music structure-related correlations via the fine-grained attention, and other contextual information via the coarse-grained attention. Second, it is efficient and can model over 3X longer music sequences compared to its full-attention counterpart. Both objective and subjective experimental results demonstrate its ability to generate long music sequences with high quality and better structures. △ Less

Submitted 30 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted by the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2210.02448 [pdf]

TgDLF2.0: Theory-guided deep-learning for electrical load forecasting via Transformer and transfer learning

Authors: Jiaxin Gao, Wenbo Hu, Dongxiao Zhang, Yuntian Chen

Abstract: Electrical energy is essential in today's society. Accurate electrical load forecasting is beneficial for better scheduling of electricity generation and saving electrical energy. In this paper, we propose theory-guided deep-learning load forecasting 2.0 (TgDLF2.0) to solve this issue, which is an improved version of the theory-guided deep-learning framework for load forecasting via ensemble long… ▽ More Electrical energy is essential in today's society. Accurate electrical load forecasting is beneficial for better scheduling of electricity generation and saving electrical energy. In this paper, we propose theory-guided deep-learning load forecasting 2.0 (TgDLF2.0) to solve this issue, which is an improved version of the theory-guided deep-learning framework for load forecasting via ensemble long short-term memory (TgDLF). TgDLF2.0 introduces the deep-learning model Transformer and transfer learning on the basis of dividing the electrical load into dimensionless trends and local fluctuations, which realizes the utilization of domain knowledge, captures the long-term dependency of the load series, and is more appropriate for realistic scenarios with scarce samples. Cross-validation experiments on different districts show that TgDLF2.0 is approximately 16% more accurate than TgDLF and saves more than half of the training time. TgDLF2.0 with 50% weather noise has the same accuracy as TgDLF without noise, which proves its robustness. We also preliminarily mine the interpretability of Transformer in TgDLF2.0, which may provide future potential for better theory guidance. Furthermore, experiments demonstrate that transfer learning can accelerate convergence of the model in half the number of training epochs and achieve better performance. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2207.13326 [pdf, other]

Point Cloud Attacks in Graph Spectral Domain: When 3D Geometry Meets Graph Signal Processing

Authors: Daizong Liu, Wei Hu, Xin Li

Abstract: With the increasing attention in various 3D safety-critical applications, point cloud learning models have been shown to be vulnerable to adversarial attacks. Although existing 3D attack methods achieve high success rates, they delve into the data space with point-wise perturbation, which may neglect the geometric characteristics. Instead, we propose point cloud attacks from a new perspective -- t… ▽ More With the increasing attention in various 3D safety-critical applications, point cloud learning models have been shown to be vulnerable to adversarial attacks. Although existing 3D attack methods achieve high success rates, they delve into the data space with point-wise perturbation, which may neglect the geometric characteristics. Instead, we propose point cloud attacks from a new perspective -- the graph spectral domain attack, aiming to perturb graph transform coefficients in the spectral domain that corresponds to varying certain geometric structure. Specifically, leveraging on graph signal processing, we first adaptively transform the coordinates of points onto the spectral domain via graph Fourier transform (GFT) for compact representation. Then, we analyze the influence of different spectral bands on the geometric structure, based on which we propose to perturb the GFT coefficients via a learnable graph spectral filter. Considering the low-frequency components mainly contribute to the rough shape of the 3D object, we further introduce a low-frequency constraint to limit perturbations within imperceptible high-frequency components. Finally, the adversarial point cloud is generated by transforming the perturbed spectral representation back to the data domain via the inverse GFT. Experimental results demonstrate the effectiveness of the proposed attack in terms of both the imperceptibility and attack success rates. △ Less

Submitted 7 December, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). arXiv admin note: substantial text overlap with arXiv:2202.07261

arXiv:2207.05706 [pdf]

doi 10.1109/JLT.2022.3211869

Optical Field Recovery in Jones Space

Authors: Qi Wu, Yixiao Zhu, Hexun Jiang, Qunbi Zhuge, Weisheng Hu

Abstract: Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the… ▽ More Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity. Polarization division multiplexing (PDM) can further double both the spectral efficiency and the system capacity of these SP carrier-assisted direct detection (DD) schemes. However, the so-called polarization fading phenomenon induced by random polarization rotation is a fundamental obstacle which prevents SP carrier-assisted DD systems from polarization diversity. In this paper, we propose a receiver of Jones-space field recovery (JSFR) to realize polarization diversity with SP carrier-assisted DD schemes in Jones space. Different receiver structures and simplified recovery procedures for JSFR are explored theoretically. The proposed JSFR pushes the SP DD schemes towards PDM without extra optical signal-to-noise ratio (OSNR) penalty. In addition, the JSFR shows good tolerance to PMD since the optical field recovery is conducted before polarization recovery. In the concept-of-proof experiment, we demonstrate 448-Gb/s reception over 80-km single-mode fiber using the proposed JSFR based on 22 couplers. Furthermore, we qualitatively compare the optical field recovery in Jones space and Stokes space from the perspective of the modulation dimension. Qualitatively, we compare the optical field recovery in the Jones space and Stokes space from the perspective of the modulation dimension. △ Less

Submitted 13 July, 2022; v1 submitted 22 June, 2022; originally announced July 2022.

Comments: 8 pages and 9 figures

arXiv:2206.13774 [pdf, other]

Assessment of U.S. Department of Transportation Lane-Level Map for Connected Vehicle Applications

Authors: Wang Hu, David Oswald, Guoyuan Wu, Jay A. Farrell

Abstract: High-definition (Hi-Def) digital maps are an indispensable automated driving technology that is develo** rapidly. There are various commercial or governmental map products in the market. It is notable that the U.S. Department of Transportation (USDOT) map tool allows the user to create MAP and Signal Phase and Timing (SPaT) messages with free access. However, an analysis of the accuracy of this… ▽ More High-definition (Hi-Def) digital maps are an indispensable automated driving technology that is develo** rapidly. There are various commercial or governmental map products in the market. It is notable that the U.S. Department of Transportation (USDOT) map tool allows the user to create MAP and Signal Phase and Timing (SPaT) messages with free access. However, an analysis of the accuracy of this map tool is currently lacking in the literature. This paper provides such an analysis. The analysis manually selects 39 feature points within about 200 meters of the verified point and 55 feature points over longer distances from the verified point. All feature locations are surveyed using GNSS and mapped using the USDOT tool. Different error sources are evaluated to allow assessment of the USDOT map accuracy. In this investigation, The USDOT map tool is demonstrated to achieve 17 centimeters horizontal accuracy, which meets the lane-level map requirement. The maximum horizontal map error is less than 30 centimeters. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 6 pages, 6 figures

arXiv:2206.06077 [pdf]

Physics-informed EDFA Gain Model Based on Active Learning

Authors: Xiaomin Liu, Yuli Chen, Yihao Zhang, Yichen Liu, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We propose a physics-informed EDFA gain model based on the active learning method. Experimental results show that the proposed modelling method can reach a higher optimal accuracy and reduce ~90% training data to achieve the same performance compared with the conventional method. We propose a physics-informed EDFA gain model based on the active learning method. Experimental results show that the proposed modelling method can reach a higher optimal accuracy and reduce ~90% training data to achieve the same performance compared with the conventional method. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.12843 [pdf, other]

A Comparative Study of Gastric Histopathology Sub-size Image Classification: from Linear Regression to Visual Transformer

Authors: Weiming Hu, Haoyuan Chen, Wanli Liu, Xiaoyan Li, Hongzan Sun, Xinyu Huang, Marcin Grzegorzek, Chen Li

Abstract: Gastric cancer is the fifth most common cancer in the world. At the same time, it is also the fourth most deadly cancer. Early detection of cancer exists as a guide for the treatment of gastric cancer. Nowadays, computer technology has advanced rapidly to assist physicians in the diagnosis of pathological pictures of gastric cancer. Ensemble learning is a way to improve the accuracy of algorithms,… ▽ More Gastric cancer is the fifth most common cancer in the world. At the same time, it is also the fourth most deadly cancer. Early detection of cancer exists as a guide for the treatment of gastric cancer. Nowadays, computer technology has advanced rapidly to assist physicians in the diagnosis of pathological pictures of gastric cancer. Ensemble learning is a way to improve the accuracy of algorithms, and finding multiple learning models with complementarity types is the basis of ensemble learning. The complementarity of sub-size pathology image classifiers when machine performance is insufficient is explored in this experimental platform. We choose seven classical machine learning classifiers and four deep learning classifiers for classification experiments on the GasHisSDB database. Among them, classical machine learning algorithms extract five different image virtual features to match multiple classifier algorithms. For deep learning, we choose three convolutional neural network classifiers. In addition, we also choose a novel Transformer-based classifier. The experimental platform, in which a large number of classical machine learning and deep learning methods are performed, demonstrates that there are differences in the performance of different classifiers on GasHisSDB. Classical machine learning models exist for classifiers that classify Abnormal categories very well, while classifiers that excel in classifying Normal categories also exist. Deep learning models also exist with multiple models that can be complementarity. Suitable classifiers are selected for ensemble learning, when machine performance is insufficient. This experimental platform demonstrates that multiple classifiers are indeed complementarity and can improve the efficiency of ensemble learning. This can better assist doctors in diagnosis, improve the detection of gastric cancer, and increase the cure rate. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.02473

arXiv:2204.10704 [pdf, other]

SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite

Authors: Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, Wenbo Hu

Abstract: Cross-view image matching aims to match images of the same target scene acquired from different platforms. With the rapid development of drone technology, cross-view matching by neural network models has been a widely accepted choice for drone position or navigation. However, existing public datasets do not include images obtained by drones at different heights, and the types of scenes are relativ… ▽ More Cross-view image matching aims to match images of the same target scene acquired from different platforms. With the rapid development of drone technology, cross-view matching by neural network models has been a widely accepted choice for drone position or navigation. However, existing public datasets do not include images obtained by drones at different heights, and the types of scenes are relatively homogeneous, which yields issues in assessing a model's capability to adapt to complex and changing scenes. In this end, we present a new cross-view dataset called SUES-200 to address these issues. SUES-200 contains 24120 images acquired by the drone at four different heights and corresponding satellite view images of the same target scene. To the best of our knowledge, SUES-200 is the first public dataset that considers the differences generated in aerial photography captured by drones flying at different heights. In addition, we developed an evaluation for efficient training, testing and evaluation of cross-view matching models, under which we comprehensively analyze the performance of nine architectures. Then, we propose a robust baseline model for use with SUES-200. Experimental results show that SUES-200 can help the model to learn highly discriminative features of the height of the drone. △ Less

Submitted 21 January, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

arXiv:2204.04462 [pdf, ps, other]

doi 10.1109/TNNLS.2020.3028945

A3CLNN: Spatial, Spectral and Multiscale Attention ConvLSTM Neural Network for Multisource Remote Sensing Data Classification

Authors: Heng-Chao Li, Wen-Shuai Hu, Wei Li, Jun Li, Qian Du, Antonio Plaza

Abstract: The problem of effectively exploiting the information multiple data sources has become a relevant but challenging research topic in remote sensing. In this paper, we propose a new approach to exploit the complementarity of two data sources: hyperspectral images (HSIs) and light detection and ranging (LiDAR) data. Specifically, we develop a new dual-channel spatial, spectral and multiscale attentio… ▽ More The problem of effectively exploiting the information multiple data sources has become a relevant but challenging research topic in remote sensing. In this paper, we propose a new approach to exploit the complementarity of two data sources: hyperspectral images (HSIs) and light detection and ranging (LiDAR) data. Specifically, we develop a new dual-channel spatial, spectral and multiscale attention convolutional long short-term memory neural network (called dual-channel A3CLNN) for feature extraction and classification of multisource remote sensing data. Spatial, spectral and multiscale attention mechanisms are first designed for HSI and LiDAR data in order to learn spectral- and spatial-enhanced feature representations, and to represent multiscale information for different classes. In the designed fusion network, a novel composite attention learning mechanism (combined with a three-level fusion strategy) is used to fully integrate the features in these two data sources. Finally, inspired by the idea of transfer learning, a novel stepwise training strategy is designed to yield a final classification result. Our experimental results, conducted on several multisource remote sensing data sets, demonstrate that the newly proposed dual-channel A3CLNN exhibits better feature representation ability (leading to more competitive classification performance) than other state-of-the-art methods. △ Less

Submitted 9 April, 2022; originally announced April 2022.

Comments: 16 pages, 10 figures

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 747-761, Feb. 2022

arXiv:2202.13526 [pdf, other]

Sparse Graph Learning with Spectrum Prior for Deep Graph Convolutional Networks

Authors: ** Zeng, Yang Liu, Gene Cheung, Wei Hu

Abstract: A graph convolutional network (GCN) employs a graph filtering kernel tailored for data with irregular structures. However, simply stacking more GCN layers does not improve performance; instead, the output converges to an uninformative low-dimensional subspace, where the convergence rate is characterized by the graph spectrum -- this is the known over-smoothing problem in GCN. In this paper, we pro… ▽ More A graph convolutional network (GCN) employs a graph filtering kernel tailored for data with irregular structures. However, simply stacking more GCN layers does not improve performance; instead, the output converges to an uninformative low-dimensional subspace, where the convergence rate is characterized by the graph spectrum -- this is the known over-smoothing problem in GCN. In this paper, we propose a sparse graph learning algorithm incorporating a new spectrum prior to compute a graph topology that circumvents over-smoothing while preserving pairwise correlations inherent in data. Specifically, based on a spectral analysis of multilayer GCN output, we derive a spectrum prior for the graph Laplacian matrix $\mathbf{L}$ to robustify the model expressiveness against over-smoothing. Then, we formulate a sparse graph learning problem with the spectrum prior, solved efficiently via block coordinate descent (BCD). Moreover, we optimize the weight parameter trading off the fidelity term with the spectrum prior, based on data smoothness on the original graph learned without spectrum manipulation. The output $\mathbf{L}$ is then normalized for supervised GCN training. Experiments show that our proposal produced deeper GCNs and higher prediction accuracy for regression and classification tasks compared to competing schemes. △ Less

Submitted 2 November, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

arXiv:2202.12939 [pdf]

doi 10.1016/j.apenergy.2022.119876

Automated Extraction of Energy Systems Information from Remotely Sensed Data: A Review and Analysis

Authors: Simiao Ren, Wei Hu, Kyle Bradbury, Dylan Harrison-Atlas, Laura Malaguzzi Valeri, Brian Murray, Jordan M. Malof

Abstract: High quality energy systems information is a crucial input to energy systems research, modeling, and decision-making. Unfortunately, actionable information about energy systems is often of limited availability, incomplete, or only accessible for a substantial fee or through a non-disclosure agreement. Recently, remotely sensed data (e.g., satellite imagery, aerial photography) have emerged as a po… ▽ More High quality energy systems information is a crucial input to energy systems research, modeling, and decision-making. Unfortunately, actionable information about energy systems is often of limited availability, incomplete, or only accessible for a substantial fee or through a non-disclosure agreement. Recently, remotely sensed data (e.g., satellite imagery, aerial photography) have emerged as a potentially rich source of energy systems information. However, the use of these data is frequently challenged by its sheer volume and complexity, precluding manual analysis. Recent breakthroughs in machine learning have enabled automated and rapid extraction of useful information from remotely sensed data, facilitating large-scale acquisition of critical energy system variables. Here we present a systematic review of the literature on this emerging topic, providing an in-depth survey and review of papers published within the past two decades. We first taxonomize the existing literature into ten major areas, spanning the energy value chain. Within each research area, we distill and critically discuss major features that are relevant to energy researchers, including, for example, key challenges regarding the accessibility and reliability of the methods. We then synthesize our findings to identify limitations and trends in the literature as a whole, and discuss opportunities for innovation. These include the opportunity to extend the methods beyond electricity to broader energy systems and wider geographic areas; and the ability to expand the use of these methods in research and decision making as satellite data become cheaper and easier to access. We also find that there are persistent challenges: limited standardization and rigor of performance assessments; limited sharing of code, which would improve replicability; and a limited consideration of the ethics and privacy of data. △ Less

Submitted 2 October, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: This is only an Arxived version. For actual publication please refer to https://doi.org/10.1016/j.apenergy.2022.119876

Journal ref: Applied Energy, 326, 119876 (2022)

arXiv:2202.08552 [pdf, other]

EBHI:A New Enteroscope Biopsy Histopathological H&E Image Dataset for Image Classification Evaluation

Authors: Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, Haoyuan Chen, Wanli Liu, Yudong Yao, Hongzan Sun, Ning Xu, Xinyu Huang, Marcin Grzegorze

Abstract: Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal c… ▽ More Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal cancer, especially enteroscope biopsies, hinders the accurate evaluation of computer-aided diagnosis techniques. Methods: A new publicly available Enteroscope Biopsy Histopathological H&E Image Dataset (EBHI) is published in this paper. To demonstrate the effectiveness of the EBHI dataset, we have utilized several machine learning, convolutional neural networks and novel transformer-based classifiers for experimentation and evaluation, using an image with a magnification of 200x. Results: Experimental results show that the deep learning method performs well on the EBHI dataset. Traditional machine learning methods achieve maximum accuracy of 76.02% and deep learning method achieves a maximum accuracy of 95.37%. Conclusion: To the best of our knowledge, EBHI is the first publicly available colorectal histopathology enteroscope biopsy dataset with four magnifications and five types of images of tumor differentiation stages, totaling 5532 images. We believe that EBHI could attract researchers to explore new classification algorithms for the automated diagnosis of colorectal cancer, which could help physicians and patients in clinical settings. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.07261 [pdf, other]

Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks

Authors: Qianjiang Hu, Daizong Liu, Wei Hu

Abstract: With the maturity of depth sensors, point clouds have received increasing attention in various applications such as autonomous driving, robotics, surveillance, etc., while deep point cloud learning models have shown to be vulnerable to adversarial attacks. Existing attack methods generally add/delete points or perform point-wise perturbation over point clouds to generate adversarial examples in th… ▽ More With the maturity of depth sensors, point clouds have received increasing attention in various applications such as autonomous driving, robotics, surveillance, etc., while deep point cloud learning models have shown to be vulnerable to adversarial attacks. Existing attack methods generally add/delete points or perform point-wise perturbation over point clouds to generate adversarial examples in the data space, which may neglect the geometric characteristics of point clouds. Instead, we propose point cloud attacks from a new perspective -- Graph Spectral Domain Attack (GSDA), aiming to perturb transform coefficients in the graph spectral domain that corresponds to varying certain geometric structure. In particular, we naturally represent a point cloud over a graph, and adaptively transform the coordinates of points into the graph spectral domain via graph Fourier transform (GFT) for compact representation. We then analyze the influence of different spectral bands on the geometric structure of the point cloud, based on which we propose to perturb the GFT coefficients in a learnable manner guided by an energy constraint loss function. Finally, the adversarial point cloud is generated by transforming the perturbed spectral representation back to the data domain via the inverse GFT (IGFT). Experimental results demonstrate the effectiveness of the proposed GSDA in terms of both imperceptibility and attack success rates under a variety of defense strategies. The code is available at https://github.com/WoodwindHu/GSDA. △ Less

Submitted 26 December, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

arXiv:2201.12576 [pdf, other]

Scale-arbitrary Invertible Image Downscaling

Authors: **bo Xing, Wenbo Hu, Tien-Tsin Wong

Abstract: Conventional social media platforms usually downscale the HR images to restrict their resolution to a specific size for saving transmission/storage cost, which leads to the super-resolution (SR) being highly ill-posed. Recent invertible image downscaling methods jointly model the downscaling/upscaling problems and achieve significant improvements. However, they only consider fixed integer scale fa… ▽ More Conventional social media platforms usually downscale the HR images to restrict their resolution to a specific size for saving transmission/storage cost, which leads to the super-resolution (SR) being highly ill-posed. Recent invertible image downscaling methods jointly model the downscaling/upscaling problems and achieve significant improvements. However, they only consider fixed integer scale factors that cannot downscale HR images with various resolutions to meet the resolution restriction of social media platforms. In this paper, we propose a scale-Arbitrary Invertible image Downscaling Network (AIDN), to natively downscale HR images with arbitrary scale factors. Meanwhile, the HR information is embedded in the downscaled low-resolution (LR) counterparts in a nearly imperceptible form such that our AIDN can also restore the original HR images solely from the LR images. The key to supporting arbitrary scale factors is our proposed Conditional Resampling Module (CRM) that conditions the downscaling/upscaling kernels and sampling locations on both scale factors and image content. Extensive experimental results demonstrate that our AIDN achieves top performance for invertible downscaling with both arbitrary integer and non-integer scale factors. Code will be released upon publication. △ Less

Submitted 9 March, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.05502 [pdf]

doi 10.1109/JLT.2022.3168698

Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme

Authors: Hang Yang, Zekun Niu, Haochen Zhao, Shilin Xiao, Weisheng Hu, Lilin Yi

Abstract: The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as c… ▽ More The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as challenging in long-haul wavelength division multiplexing (WDM) optical fiber communication systems because it is extremely time-consuming. Here we propose a linear-nonlinear feature decoupling distributed (FDD) waveform modeling scheme to model long-haul WDM fiber channel, where the channel linear effects are modelled by the NLSE-derived model-driven methods and the nonlinear effects are modelled by the data-driven deep learning methods. Meanwhile, the proposed scheme only focuses on one-span fiber distance fitting, and then recursively transmits the model to achieve the required transmission distance. The proposed modeling scheme is demonstrated to have high accuracy, high computing speeds, and robust generalization abilities for different optical launch powers, modulation formats, channel numbers and transmission distances. The total running time of FDD waveform modeling scheme for 41-channel 1040-km fiber transmission is only 3 minutes versus more than 2 hours using SSFM for each input condition, which achieves a 98% reduction in computing time. Considering the multi-round optimization by adjusting system parameters, the complexity reduction is significant. The results represent a remarkable improvement in nonlinear fiber modeling and open up novel perspectives for solution of NLSE-like partial differential equations and optical fiber physics problems. △ Less

Submitted 16 May, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: 8 pages, 5 figures, 1 table, 30 references

arXiv:2111.15395 [pdf, other]

Two-dimensional flow field measurement of sediment-laden flow based on ultrasound image velocimetry

Authors: Weiliang Tao, Yan Liu, Zhimin Ma, Wenbin Hu

Abstract: This paper proposes a novel particle image velocimetry (PIV) technique to generate an instantaneous two-dimensional velocity field for sediment-laden fluid based on the optical flow algorithm of ultrasound imaging. In this paper, an ultrasonic PIV (UPIV) system is constructed by integrating a medical ultrasound instrument and an ultrasonic particle image velocimetry algorithm. The medical ultrasou… ▽ More This paper proposes a novel particle image velocimetry (PIV) technique to generate an instantaneous two-dimensional velocity field for sediment-laden fluid based on the optical flow algorithm of ultrasound imaging. In this paper, an ultrasonic PIV (UPIV) system is constructed by integrating a medical ultrasound instrument and an ultrasonic particle image velocimetry algorithm. The medical ultrasound instrument with a phased sensor array is used to acquire acoustic echo signals and generate two-dimensional underwater ultrasound images. Based on the optical flow field, the instantaneous velocity of the particles in water corresponding to the pixels in the ultrasonic particle images is derived from the grayscale change between adjacent images under the L-K local constraint, and finally, the two-dimensional flow field is obtained. Through multiple sets of experiments, the proposed algorithm is verified. The experimental results are compared with those of the conventional cross-correlation algorithms. The results show that the L-K optical flow method can not only obtain the underwater velocity field accurately but also has the advantages of good smoothness and extensive suitability, especially for the flow field measurement in sediment-laden fluid than conventional algorithms. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 18 pages, 11 figures, 2 tables, technology manuscript

arXiv:2111.10990 [pdf, other]

Imperceptible Transfer Attack and Defense on 3D Point Cloud Classification

Authors: Daizong Liu, Wei Hu

Abstract: Although many efforts have been made into attack and defense on the 2D image domain in recent years, few methods explore the vulnerability of 3D models. Existing 3D attackers generally perform point-wise perturbation over point clouds, resulting in deformed structures or outliers, which is easily perceivable by humans. Moreover, their adversarial examples are generated under the white-box setting,… ▽ More Although many efforts have been made into attack and defense on the 2D image domain in recent years, few methods explore the vulnerability of 3D models. Existing 3D attackers generally perform point-wise perturbation over point clouds, resulting in deformed structures or outliers, which is easily perceivable by humans. Moreover, their adversarial examples are generated under the white-box setting, which frequently suffers from low success rates when transferred to attack remote black-box models. In this paper, we study 3D point cloud attacks from two new and challenging perspectives by proposing a novel Imperceptible Transfer Attack (ITA): 1) Imperceptibility: we constrain the perturbation direction of each point along its normal vector of the neighborhood surface, leading to generated examples with similar geometric properties and thus enhancing the imperceptibility. 2) Transferability: we develop an adversarial transformation model to generate the most harmful distortions and enforce the adversarial examples to resist it, improving their transferability to unknown black-box models. Further, we propose to train more robust black-box 3D models to defend against such ITA attacks by learning more discriminative point cloud representations. Extensive evaluations demonstrate that our ITA attack is more imperceptible and transferable than state-of-the-arts and validate the superiority of our defense strategy. △ Less

Submitted 24 March, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

arXiv:2110.14763 [pdf, other]

doi 10.1109/TVT.2022.3187416

Using PPP Information to Implement a Global Real-Time Virtual Network DGNSS Approach

Authors: Wang Hu, Ashim Neupane, Jay A. Farrell

Abstract: Differential GNSS (DGNSS) has been demonstrated to provide reliable, high-quality range correction information enabling real-time navigation with centimeter to sub-meter accuracy, which is required for applications such as connected and autonomous vehicles. However, DGNSS requires a local reference station near each user. For a continental or global scale implementation, this information dissemina… ▽ More Differential GNSS (DGNSS) has been demonstrated to provide reliable, high-quality range correction information enabling real-time navigation with centimeter to sub-meter accuracy, which is required for applications such as connected and autonomous vehicles. However, DGNSS requires a local reference station near each user. For a continental or global scale implementation, this information dissemination approach would require a dense network of reference stations whose construction and maintenance would be prohibitively expensive. Precise Point Positioning affords more flexibility as a public service for GNSS receivers, but its State Space Representation format is not supported by most receivers in the field or on the market. This article proposes a novel Virtual Network DGNSS (VN-DGNSS) approach and an optimization algorithm that is key to its implementation. The approach capitalizes on the existing PPP infrastructure without the need for new physical reference stations. By connecting to public GNSS SSR data services, a VN-DGNSS server maintains current information about common-mode errors. Construction of the RTCM Observation Space Representation messages from this SSR information requires both the signal time-of-transmission and the satellite position at that time which are consistent with the time-of-reception for each client. This article presents an algorithm to determine these quantities. The results of real-time stationary and moving platform evaluations are included, using u-blox M8P and ZED-F9P receivers. The performance surpasses the SAE specification (68% of horizontal error <= 1.5 m and vertical error <= 3 m) and shows significantly better horizontal performance than GNSS Open Service. The moving tests also show better horizontal performance than the ZED-F9P receiver with SBAS enabled and achieve the lane-level accuracy (95% of horizontal errors less than 1 meter). △ Less

Submitted 28 June, 2022; v1 submitted 22 September, 2021; originally announced October 2021.

Comments: 14 pages, 8 tables, 4 figures, Code and data are available at https://github.com/Azurehappen/Virtual-Network-DGNSS-Project

Journal ref: in IEEE Transactions on Vehicular Technology, vol. 71, no. 10, pp. 10337-10349, Oct. 2022

arXiv:2108.06884 [pdf, other]

Seirios: Leveraging Multiple Channels for LoRaWAN Indoor and Outdoor Localization

Authors: Jun Liu, Jiayao Gao, Sanjay Jha, Wen Hu

Abstract: Localization is important for a large number of Internet of Things (IoT) endpoint devices connected by LoRaWAN. Due to the bandwidth limitations of LoRaWAN, existing localization methods without specialized hardware (e.g., GPS) produce poor performance. To increase the localization accuracy, we propose a super-resolution localization method, called Seirios, which features a novel algorithm to sync… ▽ More Localization is important for a large number of Internet of Things (IoT) endpoint devices connected by LoRaWAN. Due to the bandwidth limitations of LoRaWAN, existing localization methods without specialized hardware (e.g., GPS) produce poor performance. To increase the localization accuracy, we propose a super-resolution localization method, called Seirios, which features a novel algorithm to synchronize multiple non-overlapped communication channels by exploiting the unique features of the radio physical layer to increase the overall bandwidth. By exploiting both the original and the conjugate of the physical layer, Seirios can resolve the direct path from multiple reflectors in both indoor and outdoor environments. We design a Seirios prototype and evaluate its performance in an outdoor area of 100 m $\times$ 60 m, and an indoor area of 25 m $\times$ 15 m, which shows that Seirios can achieve a median error of 4.4 m outdoors (80% samples < 6.4 m), and 2.4 m indoors (80% samples < 6.1 m), respectively. The results show that Seirios produces 42% less localization error than the baseline approach. Our evaluation also shows that, different to previous studies in Wi-Fi localization systems that have wider bandwidth, time-of-fight (ToF) estimation is less effective for LoRaWAN localization systems with narrowband radio signals. △ Less

Submitted 15 August, 2021; originally announced August 2021.

Comments: MOBICOM 2021

arXiv:2107.11113 [pdf, ps, other]

OLR 2021 Challenge: Datasets, Rules and Baselines

Authors: Binling Wang, Wenxuan Hu, **g Li, Yiming Zhi, Zheng Li, Qingyang Hong, Lin Li, Dong Wang, Liming Song, Cheng Yang

Abstract: This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper. In addition to the Language Identification (LID) tasks, multilingual Automat… ▽ More This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper. In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time. The challenge this year focuses on more practical and challenging problems, with four tasks: (1) constrained LID, (2) unconstrained LID, (3) constrained multilingual ASR, (4) unconstrained multilingual ASR. Baselines for LID tasks and multilingual ASR tasks are provided, respectively. The LID baseline system is an extended TDNN x-vector model constructed with Pytorch. A transformer-based end-to-end model is provided as the multilingual ASR baseline system. These recipes will be online published, and available for participants to construct their own LID or ASR systems. The baseline results demonstrate that those tasks are rather challenging and deserve more effort to achieve better performance. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2006.03473, arXiv:1907.07626, arXiv:1806.00616, arXiv:1706.09742

arXiv:2107.06374 [pdf, ps, other]

Bilinear Control of Convection-Cooling: From Open-Loop to Closed-Loop

Authors: Weiwei Hu, Jun Liu, Zhu Wang

Abstract: This paper is concerned with a bilinear control problem for enhancing convection-cooling via an incompressible velocity field. Both optimal open-loop control and closed-loop feedback control designs are addressed. First and second order optimality conditions for characterizing the optimal solution are discussed. In particular, the method of instantaneous control is applied to establish the feedbac… ▽ More This paper is concerned with a bilinear control problem for enhancing convection-cooling via an incompressible velocity field. Both optimal open-loop control and closed-loop feedback control designs are addressed. First and second order optimality conditions for characterizing the optimal solution are discussed. In particular, the method of instantaneous control is applied to establish the feedback laws. Moreover, the construction of feedback laws is also investigated by directly utilizing the optimality system with appropriate numerical discretization schemes. Computationally, it is much easier to implement the closed-loop feedback control than the optimal open-loop control, as the latter requires to solve the state equations forward in time, coupled with the adjoint equations backward in time together with a nonlinear optimality condition. Rigorous analysis and numerical experiments are presented to demonstrate our ideas and validate the efficacy of the control designs. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 27 pages, 7 figures, 3 tables

MSC Class: 49M41; 35Q93

arXiv:2105.11689 [pdf, other]

Self-Supervised Graph Representation Learning via Topology Transformations

Authors: Xiang Gao, Wei Hu, Guo-Jun Qi

Abstract: We present the Topology Transformation Equivariant Representation learning, a general paradigm of self-supervised learning for node representations of graph data to enable the wide applicability of Graph Convolutional Neural Networks (GCNNs). We formalize the proposed model from an information-theoretic perspective, by maximizing the mutual information between topology transformations and node rep… ▽ More We present the Topology Transformation Equivariant Representation learning, a general paradigm of self-supervised learning for node representations of graph data to enable the wide applicability of Graph Convolutional Neural Networks (GCNNs). We formalize the proposed model from an information-theoretic perspective, by maximizing the mutual information between topology transformations and node representations before and after the transformations. We derive that maximizing such mutual information can be relaxed to minimizing the cross entropy between the applied topology transformation and its estimation from node representations. In particular, we seek to sample a subset of node pairs from the original graph and flip the edge connectivity between each pair to transform the graph topology. Then, we self-train a representation encoder to learn node representations by reconstructing the topology transformations from the feature representations of the original and transformed graphs. In experiments, we apply the proposed model to the downstream node classification, graph classification and link prediction tasks, and results show that the proposed method outperforms the state-of-the-art unsupervised approaches. △ Less

Submitted 2 December, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Accepted to IEEE Transactions on Knowledge and Data Engineering (TKDE)

arXiv:2105.08350 [pdf, other]

doi 10.1109/TIP.2021.3134466

Generic Reversible Visible Watermarking Via Regularized Graph Fourier Transform Coding

Authors: Wenfa Qi, Sirui Guo, Wei Hu

Abstract: Reversible visible watermarking (RVW) is an active copyright protection mechanism. It not only transparently superimposes copyright patterns on specific positions of digital images or video frames to declare the copyright ownership information, but also completely erases the visible watermark image and thus enables restoring the original host image without any distortion. However, existing RVW alg… ▽ More Reversible visible watermarking (RVW) is an active copyright protection mechanism. It not only transparently superimposes copyright patterns on specific positions of digital images or video frames to declare the copyright ownership information, but also completely erases the visible watermark image and thus enables restoring the original host image without any distortion. However, existing RVW algorithms mostly construct the reversible map** mechanism for a specific visible watermarking scheme, which is not versatile. Hence, we propose a generic RVW framework to accommodate various visible watermarking schemes. In particular, we obtain a reconstruction data packet -- the compressed difference image between the watermarked image and the original host image, which is embedded into the watermarked image via any conventional reversible data hiding method to facilitate the blind recovery of the host image. The key is to achieve compact compression of the difference image for efficient embedding of the reconstruction data packet. To this end, we propose regularized Graph Fourier Transform (GFT) coding, where the difference image is smoothed via the graph Laplacian regularizer for more efficient compression and then encoded by multi-resolution GFTs in an approximately optimal manner. Experimental results show that the proposed framework has much better versatility than state-of-the-art methods. Due to the small amount of auxiliary information to be embedded, the visual quality of the watermarked image is also higher. △ Less

Submitted 26 November, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: This manuscript is accepted to IEEE Transactions on Image Processing on November 21th 2021. It has 15 pages, 12 figures and 4 tables

arXiv:2104.06243 [pdf, other]

A State-of-the-art Survey of Artificial Neural Networks for Whole-slide Image Analysis:from Popular Convolutional Neural Networks to Potential Visual Transformers

Authors: Xintong Li, Weiming Hu, Chen Li, Tao Jiang, Hongzan Sun, Xiaoyan Li, Xinyu Huang, Marcin Grzegorzek

Abstract: To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we disc… ▽ More To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we discuss publicly available WSI datasets and evaluation metrics. These ANN architectures for WSI processing are divided into classical neural networks and deep neural networks(DNNs) and then analyzed. Finally, the application prospect of the analytical method in this field is discussed. The important potential method is Visual Transformers. △ Less

Submitted 26 February, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 22 pages, 38 figures. arXiv admin note: substantial text overlap with arXiv:2102.10553

arXiv:2104.02805 [pdf, other]

doi 10.1190/segam2019-3214404.1

First arrival picking using U-net with Lovasz loss and nearest point picking method

Authors: Pengyu Yuan, Wenyi Hu, Xuqing Wu, Jiefu Chen, Hien Van Nguyen

Abstract: We proposed a robust segmentation and picking workflow to solve the first arrival picking problem for seismic signal processing. Unlike traditional classification algorithm, image segmentation method can utilize the location information by outputting a prediction map which has the same size of the input image. A parameter-free nearest point picking algorithm is proposed to further improve the accu… ▽ More We proposed a robust segmentation and picking workflow to solve the first arrival picking problem for seismic signal processing. Unlike traditional classification algorithm, image segmentation method can utilize the location information by outputting a prediction map which has the same size of the input image. A parameter-free nearest point picking algorithm is proposed to further improve the accuracy of the first arrival picking. The algorithm is test on synthetic clean data, synthetic noisy data, synthetic picking-disconnected data and field data. It performs well on all of them and the picking deviation reaches as low as 4.8ms per receiver. The first arrival picking problem is formulated as the contour detection problem. Similar to \cite{wu2019semi}, we use U-net to perform the segmentation as it is proven to be state-of-the-art in many image segmentation tasks. Particularly, a Lovasz loss instead of the traditional cross-entropy loss is used to train the network for a better segmentation performance. Lovasz loss is a surrogate loss for Jaccard index or the so-called intersection-over-union (IoU) score, which is often one of the most used metrics for segmentation tasks. In the picking part, we use a novel nearest point picking (NPP) method to take the advantage of the coherence of the first arrival picking among adjacent receivers. Our model is tested and validated on both synthetic and field data with harmonic noises. The main contributions of this paper are as follows: 1. Used Lovasz loss to directly optimize the IoU for segmentation task. Improvement over the cross-entropy loss with regard to the segmentation accuracy is verified by the test result. 2. Proposed a nearest point picking post processing method to overcome any defects left by the segmentation output. 3. Conducted noise analysis and verified the model with both noisy synthetic and field datasets. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.12353 [pdf]

doi 10.1109/JLT.2021.3086301

An Interpretable Map** from a Communication System to a Neural Network for Optimal Transceiver-Joint Equalization

Authors: Zhiqun Zhai, Hexun Jiang, Mengfan Fu, Lei Liu, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: In this paper, we propose a scheme that utilizes the optimization ability of artificial intelligence (AI) for optimal transceiver-joint equalization in compensating for the optical filtering impairments caused by wavelength selective switches (WSS). In contrast to adding or replacing a certain module of existing digital signal processing (DSP), we exploit the similarity between a communication sys… ▽ More In this paper, we propose a scheme that utilizes the optimization ability of artificial intelligence (AI) for optimal transceiver-joint equalization in compensating for the optical filtering impairments caused by wavelength selective switches (WSS). In contrast to adding or replacing a certain module of existing digital signal processing (DSP), we exploit the similarity between a communication system and a neural network (NN). By map** a communication system to an NN, in which the equalization modules correspond to the convolutional layers and other modules can been regarded as static layers, the optimal transceiver-joint equalization coefficients can be obtained. In particular, the DSP structure of the communication system is not changed. Extensive numerical simulations are performed to validate the performance of the proposed method. For a 65 GBaud 16QAM signal, it can achieve a 0.76 dB gain when the number of WSSs is 16 with a -6 dB bandwidth of 73 GHz. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.09455 [pdf, other]

Prediction-assistant Frame Super-Resolution for Video Streaming

Authors: Wang Shen, Wenbo Bao, Guangtao Zhai, Charlie L Wang, Jerry W Hu, Zhiyong Gao

Abstract: Video frame transmission delay is critical in real-time applications such as online video gaming, live show, etc. The receiving deadline of a new frame must catch up with the frame rendering time. Otherwise, the system will buffer a while, and the user will encounter a frozen screen, resulting in unsatisfactory user experiences. An effective approach is to transmit frames in lower-quality under po… ▽ More Video frame transmission delay is critical in real-time applications such as online video gaming, live show, etc. The receiving deadline of a new frame must catch up with the frame rendering time. Otherwise, the system will buffer a while, and the user will encounter a frozen screen, resulting in unsatisfactory user experiences. An effective approach is to transmit frames in lower-quality under poor bandwidth conditions, such as using scalable video coding. In this paper, we propose to enhance video quality using lossy frames in two situations. First, when current frames are too late to receive before rendering deadline (i.e., lost), we propose to use previously received high-resolution images to predict the future frames. Second, when the quality of the currently received frames is low~(i.e., lossy), we propose to use previously received high-resolution frames to enhance the low-quality current ones. For the first case, we propose a small yet effective video frame prediction network. For the second case, we improve the video prediction network to a video enhancement network to associate current frames as well as previous frames to restore high-quality images. Extensive experimental results demonstrate that our method performs favorably against state-of-the-art algorithms in the lossy video streaming environment. △ Less

Submitted 17 March, 2021; originally announced March 2021.

arXiv:2103.04530 [pdf, other]

Weather Analogs with a Machine Learning Similarity Metric for Renewable Resource Forecasting

Authors: Weiming Hu, Guido Cervone, George Young, Luca Delle Monache

Abstract: The Analog Ensemble (AnEn) technique has been shown effective on several weather problems. Unlike previous weather analogs that are sought within a large spatial domain and an extended temporal window, AnEn strictly confines space and time, and independently generates results at each grid point within a short time window. AnEn can find similar forecasts that lead to accurate and calibrated ensembl… ▽ More The Analog Ensemble (AnEn) technique has been shown effective on several weather problems. Unlike previous weather analogs that are sought within a large spatial domain and an extended temporal window, AnEn strictly confines space and time, and independently generates results at each grid point within a short time window. AnEn can find similar forecasts that lead to accurate and calibrated ensemble forecasts. The central core of the AnEn technique is a similarity metric that sorts historical forecasts with respect to a new target prediction. A commonly used metric is Euclidean distance. However, a significant difficulty using this metric is the definition of the weights for all the parameters. Generally, feature selection and extensive weight search are needed. This paper proposes a novel definition of weather analogs through a Machine Learning (ML) based similarity metric. The similarity metric uses neural networks that are trained and instantiated to search for weather analogs. This new metric allows incorporating all variables without requiring a prior feature selection and weight optimization. Experiments are presented on the application of this new metric to forecast wind speed and solar irradiance. Results show that the ML metric generally outperforms the original metric. The ML metric has a better capability to correct for larger errors and to take advantage of a larger search repository. Spatial predictions using a learned metric also show the ability to define effective latent features that are transferable to other locations. △ Less

Submitted 8 March, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

Comments: 32 pages, 9 figures

arXiv:2102.13400 [pdf, other]

doi 10.1364/AO.424280

Panoramic annular SLAM with loop closure and global optimization

Authors: Hao Chen, Weijian Hu, Kailun Yang, Jian Bai, Kaiwei Wang

Abstract: In this paper, we propose panoramic annular simultaneous localization and map** (PA-SLAM), a visual SLAM system based on panoramic annular lens. A hybrid point selection strategy is put forward in the tracking front-end, which ensures repeatability of keypoints and enables loop closure detection based on the bag-of-words approach. Every detected loop candidate is verified geometrically and the… ▽ More In this paper, we propose panoramic annular simultaneous localization and map** (PA-SLAM), a visual SLAM system based on panoramic annular lens. A hybrid point selection strategy is put forward in the tracking front-end, which ensures repeatability of keypoints and enables loop closure detection based on the bag-of-words approach. Every detected loop candidate is verified geometrically and the $Sim(3)$ relative pose constraint is estimated to perform pose graph optimization and global bundle adjustment in the back-end. A comprehensive set of experiments on real-world datasets demonstrates that the hybrid point selection strategy allows reliable loop closure detection, and the accumulated error and scale drift have been significantly reduced via global optimization, enabling PA-SLAM to reach state-of-the-art accuracy while maintaining high robustness and efficiency. △ Less

Submitted 3 June, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: Accepted to Applied Optics. 12 pages, 11 figures, 3 tables

arXiv:2101.11442 [pdf]

Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data

Authors: Dicheng Chen, Wanqi Hu, Huiting Liu, Yirong Zhou, Tianyu Qiu, Yihui Huang, Zi Wang, Jiazheng Wang, Liangjie Lin, Zhigang Wu, Hao Chen, Xi Chen, Gen Yan, Di Guo, Jianzhong Lin, Xiaobo Qu

Abstract: Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a comm… ▽ More Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the map** from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cramér-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies. △ Less

Submitted 25 October, 2022; v1 submitted 26 January, 2021; originally announced January 2021.

arXiv:2012.12727 [pdf]

Low Complexity Component Nonlinear Distortions Mitigation Scheme for Probabilistically Shaped 64-QAM Signals

Authors: Yiwen Wu, Mengfan Fu, Huazhi Lun, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We propose a degenerated hierarchical look-up table (DH-LUT) scheme to compensate component nonlinearities. For probabilistically shaped 64-QAM signals, it achieves up to 2-dB SNR improvement, while the size of table is only 8.59% compared to the conventional LUT method. We propose a degenerated hierarchical look-up table (DH-LUT) scheme to compensate component nonlinearities. For probabilistically shaped 64-QAM signals, it achieves up to 2-dB SNR improvement, while the size of table is only 8.59% compared to the conventional LUT method. △ Less

Submitted 20 December, 2020; originally announced December 2020.

arXiv:2011.11896 [pdf]

doi 10.1109/JLT.2021.3067146

A Data-Fusion-Assisted Telemetry Layer for Autonomous Optical Networks

Authors: Xiaomin Liu, Huazhi Lun, Ruoxuan Gao, Meng Cai, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: For further improving the capacity and reliability of optical networks, a closed-loop autonomous architecture is preferred. Considering a large number of optical components in an optical network and many digital signal processing modules in each optical transceiver, massive real-time data can be collected. However, for a traditional monitoring structure, collecting, storing and processing a large… ▽ More For further improving the capacity and reliability of optical networks, a closed-loop autonomous architecture is preferred. Considering a large number of optical components in an optical network and many digital signal processing modules in each optical transceiver, massive real-time data can be collected. However, for a traditional monitoring structure, collecting, storing and processing a large size of data are challenging tasks. Moreover, strong correlations and similarities between data from different sources and regions are not properly considered, which may limit function extension and accuracy improvement. To address abovementioned issues, a data-fusion-assisted telemetry layer between the physical layer and control layer is proposed in this paper. The data fusion methodologies are elaborated on three different levels: Source Level, Space Level and Model Level. For each level, various data fusion algorithms are introduced and relevant works are reviewed. In addition, proof-of-concept use cases for each level are provided through simulations, where the benefits of the data-fusion-assisted telemetry layer are shown. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2010.12717 [pdf, other]

doi 10.1109/COMST.2021.3058333

Deep Learning for Radio-based Human Sensing: Recent Advances and Future Directions

Authors: Isura Nirmal, Abdelwahed Khamis, Mahbub Hassan, Wen Hu, Xiaoqing Zhu

Abstract: While decade-long research has clearly demonstrated the vast potential of radio frequency (RF) for many human sensing tasks, scaling this technology to large scenarios remained problematic with conventional approaches. Recently, researchers have successfully applied deep learning to take radio-based sensing to a new level. Many different types of deep learning models have been proposed to achieve… ▽ More While decade-long research has clearly demonstrated the vast potential of radio frequency (RF) for many human sensing tasks, scaling this technology to large scenarios remained problematic with conventional approaches. Recently, researchers have successfully applied deep learning to take radio-based sensing to a new level. Many different types of deep learning models have been proposed to achieve high sensing accuracy over a large population and activity set, as well as in unseen environments. Deep learning has also enabled detection of novel human sensing phenomena that were previously not possible. In this survey, we provide a comprehensive review and taxonomy of recent research efforts on deep learning based RF sensing. We also identify and compare several publicly released labeled RF sensing datasets that can facilitate such deep learning research. Finally, we summarize the lessons learned and discuss the current limitations and future directions of deep learning based RF sensing. △ Less

Submitted 7 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Journal ref: 23, 2021, 995-1019

arXiv:2009.02752 [pdf, other]

Simultaneous Energy Harvesting and Gait Recognition using Piezoelectric Energy Harvester

Authors: Dong Ma, Guohao Lan, Weitao Xu, Mahbub Hassan, Wen Hu

Abstract: Piezoelectric energy harvester, which generates electricity from stress or vibrations, is gaining increasing attention as a viable solution to extend battery life in wearables. Recent research further reveals that, besides generating energy, PEH can also serve as a passive sensor to detect human gait power-efficiently because its stress or vibration patterns are significantly influenced by the gai… ▽ More Piezoelectric energy harvester, which generates electricity from stress or vibrations, is gaining increasing attention as a viable solution to extend battery life in wearables. Recent research further reveals that, besides generating energy, PEH can also serve as a passive sensor to detect human gait power-efficiently because its stress or vibration patterns are significantly influenced by the gait. However, as PEHs are not designed for precise measurement of motion, achievable gait recognition accuracy remains low with conventional classification algorithms. The accuracy deteriorates further when the generated electricity is stored simultaneously. To classify gait reliably while simultaneously storing generated energy, we make two distinct contributions. First, we propose a preprocessing algorithm to filter out the effect of energy storage on PEH electricity signal. Second, we propose a long short-term memory (LSTM) network-based classifier to accurately capture temporal information in gait-induced electricity generation. We prototype the proposed gait recognition architecture in the form factor of an insole and evaluate its gait recognition as well as energy harvesting performance with 20 subjects. Our results show that the proposed architecture detects human gait with 12% higher recall and harvests up to 127% more energy while consuming 38% less power compared to the state-of-the-art. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 13 pages, 17 figures, and 2 tables

arXiv:2009.01424 [pdf, other]

doi 10.1145/3414685.3417764

Mononizing Binocular Videos

Authors: Wenbo Hu, Menghan Xia, Chi-Wing Fu, Tien-Tsin Wong

Abstract: This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it. Mono-nize means we purposely convert abinocular video into a regular monocular video with the stereo informationimplicitly encoded in a visual but nearly-imperceptible form. Hence, wecan impartially distribute and show the mononized video as an ordinarymonocular video. Unlike ordinary monocular v… ▽ More This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it. Mono-nize means we purposely convert abinocular video into a regular monocular video with the stereo informationimplicitly encoded in a visual but nearly-imperceptible form. Hence, wecan impartially distribute and show the mononized video as an ordinarymonocular video. Unlike ordinary monocular videos, we can restore from itthe original binocular video and show it on a stereoscopic display. To start,we formulate an encoding-and-decoding framework with the pyramidal de-formable fusion module to exploit long-range correspondences between theleft and right views, a quantization layer to suppress the restoring artifacts,and the compression noise simulation module to resist the compressionnoise introduced by modern video codecs. Our framework is self-supervised,as we articulate our objective function with loss terms defined on the input:a monocular term for creating the mononized video, an invertibility termfor restoring the original video, and a temporal term for frame-to-framecoherence. Further, we conducted extensive experiments to evaluate ourgenerated mononized videos and restored binocular videos for diverse typesof images and 3D movies. Quantitative results on both standard metrics anduser perception studies show the effectiveness of our method. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: 16 pages, 17 figures. Accepted in Siggraph Asia 2020

Journal ref: ACM Transactions on Graphics (SIGGRAPH Asia 2020 issue)

arXiv:2008.05750 [pdf, other]

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Authors: Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen

Abstract: Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with encoder-decoder architecture, is only suitable for offline ASR. It relies on an attention mechanism to learn alignments, and encodes input audio bidirectionally. The hig… ▽ More Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with encoder-decoder architecture, is only suitable for offline ASR. It relies on an attention mechanism to learn alignments, and encodes input audio bidirectionally. The high computation cost of Transformer decoding also limits its use in production streaming systems. To make Transformer suitable for streaming ASR, we explore Transducer framework as a streamable way to learn alignments. For audio encoding, we apply unidirectional Transformer with interleaved convolution layers. The interleaved convolution layers are used for modeling future context which is important to performance. To reduce computation cost, we gradually downsample acoustic input, also with the interleaved convolution layers. Moreover, we limit the length of history context in self-attention to maintain constant computation cost for each decoding step. We show that this architecture, named Conv-Transformer Transducer, achieves competitive performance on LibriSpeech dataset (3.6\% WER on test-clean) without external language models. The performance is comparable to previously published streamable Transformer Transducer and strong hybrid streaming ASR systems, and is achieved with smaller look-ahead window (140~ms), fewer parameters and lower frame rate. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: Accepted by INTERSPEECH 2020

Showing 1–50 of 76 results for author: Hu, W