-
Towards Unsupervised Speech Recognition Without Pronunciation Models
Authors:
Junrui Ni,
Liming Wang,
Yang Zhang,
Kaizhi Qian,
Heting Gao,
Mark Hasegawa-Johnson,
Chang D. Yoo
Abstract:
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro…
▽ More
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by proposing the removal of reliance on a phoneme lexicon. We explore a new research direction: word-level unsupervised ASR. Using a curated speech corpus containing only high-frequency English words, our system achieves a word error rate of nearly 20% without parallel transcripts or oracle word boundaries. Furthermore, we experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling. This innovative model surpasses the performance of previous unsupervised ASR models trained with direct distribution matching.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Authors:
Jianyuan Ni,
Hao Tang,
Syed Tousiful Haque,
Yan Yan,
Anne H. H. Ngu
Abstract:
The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep lear…
▽ More
The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep learning approaches or on a single sensor modality. In real life, our human interact with the world in a multi-sensory way, where diverse information sources are intricately processed and interpreted to accomplish a complex and unified sensing system. To give machines similar intelligence, multimodal machine learning, which merges data from various sources, has become a popular research area with recent advancements. In this study, we present a comprehensive survey from a novel perspective on how to leverage multimodal learning to WSHAR domain for newcomers and researchers. We begin by presenting the recent sensor modalities as well as deep learning approaches in HAR. Subsequently, we explore the techniques used in present multimodal systems for WSHAR. This includes inter-multimodal systems which utilize sensor modalities from both visual and non-visual systems and intra-multimodal systems that simply take modalities from non-visual systems. After that, we focus on current multimodal learning approaches that have applied to solve some of the challenges existing in WSHAR. Specifically, we make extra efforts by connecting the existing multimodal literature from other domains, such as computer vision and natural language processing, with current WSHAR area. Finally, we identify the corresponding challenges and potential research direction in current WSHAR area for further improvement.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
A Method for Target Detection Based on Mmw Radar and Vision Fusion
Authors:
Ming Zong,
Jiaying Wu,
Zhanyu Zhu,
**gen Ni
Abstract:
An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convoluti…
▽ More
An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convolutional One-Stage Network (FCOS) by introducing the modules of radar image processing branches, radar-vision fusion and path aggregation. The radar image processing branch mainly focuses on the image modeling based on the spatiotemporal calibration of millimeter-wave (mmw) radar and cameras, taking the conversion of radar point clouds to radar images. The fusion module extracts features of radar and optical images based on the principle of spatial attention stitching criterion. The path aggregation module enhances the reuse of feature layers, combining the positional information of shallow feature maps with deep semantic information, to obtain better detection performance for both large and small targets. Through the experimental analysis, the method proposed in this paper can effectively fuse the mmw radar and vision perceptions, showing good performance in traffic target detection.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Enhanced Q-Learning Approach to Finite-Time Reachability with Maximum Probability for Probabilistic Boolean Control Networks
Authors:
Hongyue Fan,
**gjie Ni,
Fangfei Li
Abstract:
In this paper, we investigate the problem of controlling probabilistic Boolean control networks (PBCNs) to achieve reachability with maximum probability in the finite time horizon. We address three questions: 1) finding control policies that achieve reachability with maximum probability under fixed, and particularly, varied finite time horizon, 2) leveraging prior knowledge to solve question 1) wi…
▽ More
In this paper, we investigate the problem of controlling probabilistic Boolean control networks (PBCNs) to achieve reachability with maximum probability in the finite time horizon. We address three questions: 1) finding control policies that achieve reachability with maximum probability under fixed, and particularly, varied finite time horizon, 2) leveraging prior knowledge to solve question 1) with faster convergence speed in scenarios where time is a variable framework, and 3) proposing an enhanced Q-learning (QL) method to efficiently address the aforementioned questions for large-scale PBCNs. For question 1), we demonstrate the applicability of QL method on the finite-time reachability problem. For question 2), considering the possibility of varied time frames, we incorporate transfer learning (TL) technique to leverage prior knowledge and enhance convergence speed. For question 3), an enhanced model-free QL approach that improves upon the traditional QL algorithm by introducing memory-efficient modifications to address these issues in large-scale PBCNs effectively. Finally, we apply the proposed method to two examples: a small-scale PBCN and a large-scale PBCN, demonstrating the effectiveness of our approach.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Detection of Small Targets in Sea Clutter Based on RepVGG and Continuous Wavelet Transform
Authors:
**gchen Ni,
Haoru Li,
Lilin Xu,
**g Liang
Abstract:
Constructing a high-performance target detector under the background of sea clutter is always necessary and important. In this work, we propose a RepVGGA0-CWT detector, where RepVGG is a residual network that gains a high detection accuracy. Different from traditional residual networks, RepVGG keeps an acceptable calculation speed. Giving consideration to both accuracy and speed, the RepVGGA0 is s…
▽ More
Constructing a high-performance target detector under the background of sea clutter is always necessary and important. In this work, we propose a RepVGGA0-CWT detector, where RepVGG is a residual network that gains a high detection accuracy. Different from traditional residual networks, RepVGG keeps an acceptable calculation speed. Giving consideration to both accuracy and speed, the RepVGGA0 is selected among all the variants of RepVGG. Also, continuous wavelet transform (CWT) is employed to extract the radar echoes' time-frequency feature effectively. In the tests, other networks (ResNet50, ResNet18 and AlexNet) and feature extraction methods (short-time Fourier transform (STFT), CWT) are combined to build detectors for comparison. The result of different datasets shows that the RepVGGA0-CWT detector performs better than those detectors in terms of low controllable false alarm rate, high training speed, high inference speed and low memory usage. This RepVGGA0-CWT detector is hardware-friendly and can be applied in real-time scenes for its high inference speed in detection.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Stochastic Deep Koopman Model for Quality Propagation Analysis in Multistage Manufacturing Systems
Authors:
Zhiyi Chen,
Harshal Maske,
Huanyi Shui,
Devesh Upadhyay,
Michael Hopka,
Joseph Cohen,
Xingjian Lai,
Xun Huan,
Jun Ni
Abstract:
The modeling of multistage manufacturing systems (MMSs) has attracted increased attention from both academia and industry. Recent advancements in deep learning methods provide an opportunity to accomplish this task with reduced cost and expertise. This study introduces a stochastic deep Koopman (SDK) framework to model the complex behavior of MMSs. Specifically, we present a novel application of K…
▽ More
The modeling of multistage manufacturing systems (MMSs) has attracted increased attention from both academia and industry. Recent advancements in deep learning methods provide an opportunity to accomplish this task with reduced cost and expertise. This study introduces a stochastic deep Koopman (SDK) framework to model the complex behavior of MMSs. Specifically, we present a novel application of Koopman operators to propagate critical quality information extracted by variational autoencoders. Through this framework, we can effectively capture the general nonlinear evolution of product quality using a transferred linear representation, thus enhancing the interpretability of the data-driven model. To evaluate the performance of the SDK framework, we carried out a comparative study on an open-source dataset. The main findings of this paper are as follows. Our results indicate that SDK surpasses other popular data-driven models in accuracy when predicting stagewise product quality within the MMS. Furthermore, the unique linear propagation property in the stochastic latent space of SDK enables traceability for quality evolution throughout the process, thereby facilitating the design of root cause analysis schemes. Notably, the proposed framework requires minimal knowledge of the underlying physics of production lines. It serves as a virtual metrology tool that can be applied to various MMSs, contributing to the ultimate goal of Zero Defect Manufacturing.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Reinforcement Learning Based Minimum State-flipped Control for the Reachability of Boolean Control Networks
Authors:
**gjie Ni,
Fangfei Li
Abstract:
To realize reachability as well as reduce control costs of Boolean Control Networks (BCNs) with state-flipped control, a reinforcement learning based method is proposed to obtain flip kernels and the optimal policy with minimal flip** actions to realize reachability. The method proposed is model-free and of low computational complexity. In particular, Q-learning (QL), fast QL, and small memory Q…
▽ More
To realize reachability as well as reduce control costs of Boolean Control Networks (BCNs) with state-flipped control, a reinforcement learning based method is proposed to obtain flip kernels and the optimal policy with minimal flip** actions to realize reachability. The method proposed is model-free and of low computational complexity. In particular, Q-learning (QL), fast QL, and small memory QL are proposed to find flip kernels. Fast QL and small memory QL are two novel algorithms. Specifically, fast QL, namely, QL combined with transfer-learning and special initial states, is of higher efficiency, and small memory QL is applicable to large-scale systems. Meanwhile, we present a novel reward setting, under which the optimal policy with minimal flip** actions to realize reachability is the one of the highest returns. Then, to obtain the optimal policy, we propose QL, and fast small memory QL for large-scale systems. Specifically, on the basis of the small memory QL mentioned before, the fast small memory QL uses a changeable reward setting to speed up the learning efficiency while ensuring the optimality of the policy. For parameter settings, we give some system properties for reference. Finally, two examples, which are a small-scale system and a large-scale one, are considered to verify the proposed method.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Deep Reinforcement Learning Based Optimal Infinite-Horizon Control of Probabilistic Boolean Control Networks
Authors:
**gjie Ni,
Fangfei Li,
Zheng-Guang Wu
Abstract:
In this paper, a deep reinforcement learning based method is proposed to obtain optimal policies for optimal infinite-horizon control of probabilistic Boolean control networks (PBCNs). Compared with the existing literatures, the proposed method is model-free, namely, the system model and the initial states needn't to be known. Meanwhile, it is suitable for large-scale PBCNs. First, we establish th…
▽ More
In this paper, a deep reinforcement learning based method is proposed to obtain optimal policies for optimal infinite-horizon control of probabilistic Boolean control networks (PBCNs). Compared with the existing literatures, the proposed method is model-free, namely, the system model and the initial states needn't to be known. Meanwhile, it is suitable for large-scale PBCNs. First, we establish the connection between deep reinforcement learning and optimal infinite-horizon control, and structure the problem into the framework of the Markov decision process. Then, PBCNs are defined as large-scale or small-scale, depending on whether the memory of the action-values exceeds the RAM of the computer. Based on the newly introduced definition, Q-learning (QL) and double deep Q-network (DDQN) are applied to the optimal infinite-horizon control of small-scale and large-scale PBCNs, respectively. Meanwhile, the optimal state feedback controllers are designed. Finally, two examples are presented, which are a small-scale PBCN with 3 nodes, and a large-scale one with 28 nodes. To verify the convergence of QL and DDQN, the optimal control policy and the optimal action-values, which are obtained from both the algorithms, are compared with the ones based on a model-based method named policy iteration. Meanwhile, the performance of QL is compared with DDQN in the small-scale PBCN.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Shapley-based Explainable AI for Clustering Applications in Fault Diagnosis and Prognosis
Authors:
Joseph Cohen,
Xun Huan,
Jun Ni
Abstract:
Data-driven artificial intelligence models require explainability in intelligent manufacturing to streamline adoption and trust in modern industry. However, recently developed explainable artificial intelligence (XAI) techniques that estimate feature contributions on a model-agnostic level such as SHapley Additive exPlanations (SHAP) have not yet been evaluated for semi-supervised fault diagnosis…
▽ More
Data-driven artificial intelligence models require explainability in intelligent manufacturing to streamline adoption and trust in modern industry. However, recently developed explainable artificial intelligence (XAI) techniques that estimate feature contributions on a model-agnostic level such as SHapley Additive exPlanations (SHAP) have not yet been evaluated for semi-supervised fault diagnosis and prognosis problems characterized by class imbalance and weakly labeled datasets. This paper explores the potential of utilizing Shapley values for a new clustering framework compatible with semi-supervised learning problems, loosening the strict supervision requirement of current XAI techniques. This broad methodology is validated on two case studies: a heatmap image dataset obtained from a semiconductor manufacturing process featuring class imbalance, and a benchmark dataset utilized in the 2021 Prognostics and Health Management (PHM) Data Challenge. Semi-supervised clustering based on Shapley values significantly improves upon clustering quality compared to the fully unsupervised case, deriving information-dense and meaningful clusters that relate to underlying fault diagnosis model predictions. These clusters can also be characterized by high-precision decision rules in terms of original feature values, as demonstrated in the second case study. The rules, limited to 1-2 terms utilizing original feature scales, describe 12 out of the 16 derived equipment failure clusters with precision exceeding 0.85, showcasing the promising utility of the explainable clustering framework for intelligent manufacturing applications.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Fault Prognosis of Turbofan Engines: Eventual Failure Prediction and Remaining Useful Life Estimation
Authors:
Joseph Cohen,
Xun Huan,
Jun Ni
Abstract:
In the era of industrial big data, prognostics and health management is essential to improve the prediction of future failures to minimize inventory, maintenance, and human costs. Used for the 2021 PHM Data Challenge, the new Commercial Modular Aero-Propulsion System Simulation dataset from NASA is an open-source benchmark containing simulated turbofan engine units flown under realistic flight con…
▽ More
In the era of industrial big data, prognostics and health management is essential to improve the prediction of future failures to minimize inventory, maintenance, and human costs. Used for the 2021 PHM Data Challenge, the new Commercial Modular Aero-Propulsion System Simulation dataset from NASA is an open-source benchmark containing simulated turbofan engine units flown under realistic flight conditions. Deep learning approaches implemented previously for this application attempt to predict the remaining useful life of the engine units, but have not utilized labeled failure mode information, impeding practical usage and explainability. To address these limitations, a new prognostics approach is formulated with a customized loss function to simultaneously predict the current health state, the eventual failing component(s), and the remaining useful life. The proposed method incorporates principal component analysis to orthogonalize statistical time-domain features, which are inputs into supervised regressors such as random forests, extreme random forests, XGBoost, and artificial neural networks. The highest performing algorithm, ANN-Flux, achieves AUROC and AUPR scores exceeding 0.95 for each classification. In addition, ANN-Flux reduces the remaining useful life RMSE by 38% for the same test split of the dataset compared to past work, with significantly less computational cost.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
Authors:
**jie Ni,
Yukun Ma,
Wen Wang,
Qian Chen,
Dianwen Ng,
Han Lei,
Trung Hieu Nguyen,
Chong Zhang,
Bin Ma,
Erik Cambria
Abstract:
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models. With knowledge distillation, these models may also benefit from the knowledge encoded by language models that are pre-trained on rich sources of texts. The distillation process, however, is challenging due to the modal disparity between textual and speech embedding spaces. This paper st…
▽ More
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models. With knowledge distillation, these models may also benefit from the knowledge encoded by language models that are pre-trained on rich sources of texts. The distillation process, however, is challenging due to the modal disparity between textual and speech embedding spaces. This paper studies metric-based distillation to align the embedding space of text and speech with only a small amount of data without modifying the model structure. Since the semantic and granularity gap between text and speech has been omitted in literature, which impairs the distillation, we propose the Prior-informed Adaptive knowledge Distillation (PAD) that adaptively leverages text/speech units of variable granularity and prior distributions to achieve better global and local alignments between text and speech pre-trained models. We evaluate on three spoken language understanding benchmarks to show that PAD is more effective in transferring linguistic knowledge than other metric-based distillation approaches.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition
Authors:
Dianwen Ng,
Ruixi Zhang,
Jia Qi Yip,
Zhao Yang,
**jie Ni,
Chong Zhang,
Yukun Ma,
Chongjia Ni,
Eng Siong Chng,
Bin Ma
Abstract:
Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world…
▽ More
Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world applications. In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow's redundancy-reduction principle. The new framework improves the HuBERT training algorithm by introducing auxiliary losses that drive the self- and cross-correlation matrix between pairwise noise-distorted embeddings towards identity matrix. This encourages the model to produce noise-agnostic speech representations. With this method, we report improved robustness in noisy environments, including unseen noises, without impairing the performance on the clean set.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Authors:
Kaizhi Qian,
Yang Zhang,
Heting Gao,
Junrui Ni,
Cheng-I Lai,
David Cox,
Mark Hasegawa-Johnson,
Shiyu Chang
Abstract:
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted va…
▽ More
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.
△ Less
Submitted 23 June, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Authors:
Heting Gao,
Junrui Ni,
Kaizhi Qian,
Yang Zhang,
Shiyu Chang,
Mark Hasegawa-Johnson
Abstract:
Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text examples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning lik…
▽ More
Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text examples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning like the text embeddings of the language model. Interested in exploring the possibility of transferring the few-shot learning ability to the audio-text setting, we propose a novel speech understanding framework, WavPrompt, where we finetune a wav2vec model to generate a sequence of audio embeddings understood by the language model. We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline. We conduct detailed ablation studies on different components and hyperparameters to empirically identify the best model configuration. In addition, we conduct a non-speech understanding experiment to show WavPrompt can extract more information than just the transcriptions. Code is available at https://github.com/Hertin/WavPrompt
△ Less
Submitted 13 April, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Authors:
Junrui Ni,
Liming Wang,
Heting Gao,
Kaizhi Qian,
Yang Zhang,
Shiyu Chang,
Mark Hasegawa-Johnson
Abstract:
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Develo** such a system can significantly improve the availability of speech techno…
▽ More
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Develo** such a system can significantly improve the availability of speech technology to languages without a large amount of parallel speech and text data. This paper proposes an unsupervised TTS system based on an alignment module that outputs pseudo-text and another synthesis module that uses pseudo-text for training and real text for inference. Our unsupervised system can achieve comparable performance to the supervised system in seven languages with about 10-20 hours of speech each. A careful study on the effect of text units and vocoders has also been conducted to better understand what factors may affect unsupervised TTS performance. The samples generated by our models can be found at https://cactuswiththoughts.github.io/UnsupTTS-Demo, and our code can be found at https://github.com/lwang114/UnsupTTS.
△ Less
Submitted 15 August, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
LDP-Net: An Unsupervised Pansharpening Network Based on Learnable Degradation Processes
Authors:
Jiahui Ni,
Zhimin Shao,
Zhongzhou Zhang,
Mingzheng Hou,
Jiliu Zhou,
Leyuan Fang,
Yi Zhang
Abstract:
Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been prop…
▽ More
Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been proposed for the pansharpening task. However, these methods usually has two main drawbacks: 1) requiring HRMS for supervised learning; and 2) simply ignoring the latent relation between the MS and PAN image and fusing them directly. To solve these problems, we propose a novel unsupervised network based on learnable degradation processes, dubbed as LDP-Net. A reblurring block and a graying block are designed to learn the corresponding degradation processes, respectively. In addition, a novel hybrid loss function is proposed to constrain both spatial and spectral consistency between the pansharpened image and the PAN and LRMS images at different resolutions. Experiments on Worldview2 and Worldview3 images demonstrate that our proposed LDP-Net can fuse PAN and LRMS images effectively without the help of HRMS samples, achieving promising performance in terms of both qualitative visual effects and quantitative metrics.
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
Autonomous Formula Racecar: Overall System Design and Experimental Validation
Authors:
Hanqing Tian,
Jun Ni,
Zirui Li,
Jibin Hu
Abstract:
This paper develops and summarizes the work of building the autonomous integrated system including perception system and vehicle dynamic controller for a formula student autonomous racecar. We propose a system framework combining X-by-wired modification, perception & motion planning and vehicle dynamic control as a template of FSAC racecar which can be easily replicated. A LIDAR-vision cooperating…
▽ More
This paper develops and summarizes the work of building the autonomous integrated system including perception system and vehicle dynamic controller for a formula student autonomous racecar. We propose a system framework combining X-by-wired modification, perception & motion planning and vehicle dynamic control as a template of FSAC racecar which can be easily replicated. A LIDAR-vision cooperating method of detecting traffic cone which is used as track mark is proposed. Detection algorithm of the racecar also implements a precise and high rate localization method which combines the GPS-INS data and LIDAR odometry. Besides, a track map including the location and color information of the cones is built simultaneously. Finally, the system and vehicle performance on a closed loop track is tested. This paper also briefly introduces the Formula Student Autonomous Competition (FSAC).
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Learning based Predictive Error Estimation and Compensator Design for Autonomous Vehicle Path Tracking
Authors:
Chaoyang Jiang,
Hanqing Tian,
Jibin Hu,
Jiankun Zhai,
Chao Wei,
Jun Ni
Abstract:
Model predictive control (MPC) is widely used for path tracking of autonomous vehicles due to its ability to handle various types of constraints. However, a considerable predictive error exists because of the error of mathematics model or the model linearization. In this paper, we propose a framework combining the MPC with a learning-based error estimator and a feedforward compensator to improve t…
▽ More
Model predictive control (MPC) is widely used for path tracking of autonomous vehicles due to its ability to handle various types of constraints. However, a considerable predictive error exists because of the error of mathematics model or the model linearization. In this paper, we propose a framework combining the MPC with a learning-based error estimator and a feedforward compensator to improve the path tracking accuracy. An extreme learning machine is implemented to estimate the model based predictive error from vehicle state feedback information. Offline training data is collected from a vehicle controlled by a model-defective regular MPC for path tracking in several working conditions, respectively. The data include vehicle state and the spatial error between the current actual position and the corresponding predictive position. According to the estimated predictive error, we then design a PID-based feedforward compensator. Simulation results via Carsim show the estimation accuracy of the predictive error and the effectiveness of the proposed framework for path tracking of an autonomous vehicle.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
Integrating global spatial features in CNN based Hyperspectral/SAR imagery classification
Authors:
Fan Zhang,
MinChao Yan,
Chen Hu,
Jun Ni,
Fei Ma
Abstract:
The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. Th…
▽ More
The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. This paper proposed a novel method to take into the information of remote sensing image, i.e., geographic latitude-longitude information. In addition, a dual-branch convolutional neural network (CNN) classification method is designed in combination with the global information to mine the pixel features of the image. Then, the features of the two neural networks are fused with another fully neural network to realize the classification of remote sensing images. Finally, two remote sensing images are used to verify the effectiveness of our method, including hyperspectral imaging (HSI) and polarimetric synthetic aperture radar (PolSAR) imagery. The result of the proposed method is superior to the traditional single-channel convolutional neural network.
△ Less
Submitted 15 June, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Pseudo-Labeling for Small Lesion Detection on Diabetic Retinopathy Images
Authors:
Qilei Chen,
** Liu,
**g Ni,
Yu Cao,
Benyuan Liu,
Honggang Zhang
Abstract:
Diabetic retinopathy (DR) is a primary cause of blindness in working-age people worldwide. About 3 to 4 million people with diabetes become blind because of DR every year. Diagnosis of DR through color fundus images is a common approach to mitigate such problem. However, DR diagnosis is a difficult and time consuming task, which requires experienced clinicians to identify the presence and signific…
▽ More
Diabetic retinopathy (DR) is a primary cause of blindness in working-age people worldwide. About 3 to 4 million people with diabetes become blind because of DR every year. Diagnosis of DR through color fundus images is a common approach to mitigate such problem. However, DR diagnosis is a difficult and time consuming task, which requires experienced clinicians to identify the presence and significance of many small features on high resolution images. Convolutional Neural Network (CNN) has proved to be a promising approach for automatic biomedical image analysis recently. In this work, we investigate lesion detection on DR fundus images with CNN-based object detection methods. Lesion detection on fundus images faces two unique challenges. The first one is that our dataset is not fully labeled, i.e., only a subset of all lesion instances are marked. Not only will these unlabeled lesion instances not contribute to the training of the model, but also they will be mistakenly counted as false negatives, leading the model move to the opposite direction. The second challenge is that the lesion instances are usually very small, making them difficult to be found by normal object detectors. To address the first challenge, we introduce an iterative training algorithm for the semi-supervised method of pseudo-labeling, in which a considerable number of unlabeled lesion instances can be discovered to boost the performance of the lesion detector. For the small size targets problem, we extend both the input size and the depth of feature pyramid network (FPN) to produce a large CNN feature map, which can preserve the detail of small lesions and thus enhance the effectiveness of the lesion detector. The experimental results show that our proposed methods significantly outperform the baselines.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.