-
Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images
Authors:
Yuanyuan Peng,
Aidi Lin,
Meng Wang,
Tian Lin,
Ke Zou,
Yinglin Cheng,
Tingkun Shi,
Xulong Liao,
Lixia Feng,
Zhen Liang,
Xinjian Chen,
Huazhu Fu,
Haoyu Chen
Abstract:
Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE…
▽ More
Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning
Authors:
Xuezhi Niu,
Kaige Tan,
Lei Feng
Abstract:
This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the propos…
▽ More
This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in develo** sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization
Authors:
Kaige Tan,
Xuezhi Niu,
Qinglei Ji,
Lei Feng,
Martin Törngren
Abstract:
This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of model…
▽ More
This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of modeling discrepancies, we implement a multi-fidelity BO approach, combining data from both simulation and physical experiments throughout training and optimization. This strategy enables the adaptive refinement of the gait pattern and ensures a smooth transition from simulation to real-world deployment for the controller. Moreover, we integrate a computational task off-loading architecture by edge computing, which reduces the onboard computational and memory overhead, to improve real-time control performance and facilitate an effective online learning process. The proposed approach successfully achieves optimal walking gait design for physical deployment with high efficiency, effectively addressing challenges related to the reality gap in soft robotics.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications
Authors:
Zhixiang Yang,
Hongyang Du,
Dusit Niyato,
Xudong Wang,
Yu Zhou,
Lei Feng,
Fanqin Zhou,
Wen**g Li,
Xuesong Qiu
Abstract:
With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present…
▽ More
With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present an integration of self-supervised learning (SSL) into wireless networks. SSL leverages large volumes of unlabeled data to train models, enhancing scalability, adaptability, and generalization. This paper offers a comprehensive overview of SSL, categorizing its application scenarios in wireless network optimization and presenting a case study on its impact on semantic communication. Our findings highlight the potentials of SSL to significantly improve wireless network performance without extensive labeled data, paving the way for more intelligent and efficient communication systems.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images
Authors:
Xinying Wang,
Zhixiong Huang,
Sifan Zhang,
Jiawen Zhu,
Lin Feng
Abstract:
Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l…
▽ More
Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-linear computational efficiency and superior performance, prompting our investigation into its potential for SR problem. To this end, we propose the Gradient-guided Mamba for Spectral Reconstruction from RGB Images, dubbed GMSR-Net. GMSR-Net is a lightweight model characterized by a global receptive field and linear computational complexity. Its core comprises multiple stacked Gradient Mamba (GM) blocks, each featuring a tri-branch structure. In addition to benefiting from efficient global feature representation by Mamba block, we further innovatively introduce spatial gradient attention and spectral gradient attention to guide the reconstruction of spatial and spectral cues. GMSR-Net demonstrates a significant accuracy-efficiency trade-off, achieving state-of-the-art performance while markedly reducing the number of parameters and computational burdens. Compared to existing approaches, GMSR-Net slashes parameters and FLOPS by substantial margins of 10 times and 20 times, respectively. Code is available at https://github.com/wxy11-27/GMSR.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
DeepEMC-T2 Map**: Deep Learning-Enabled T2 Map** Based on Echo Modulation Curve Modeling
Authors:
Haoyang Pei,
Timothy M. Shepherd,
Yao Wang,
Fang Liu,
Daniel K Sodickson,
Noam Ben-Eliezer,
Li Feng
Abstract:
Purpose: Echo modulation curve (EMC) modeling can provide accurate and reproducible quantification of T2 relaxation times. The standard EMC-T2 map** framework, however, requires sufficient echoes and cumbersome pixel-wise dictionary-matching steps. This work proposes a deep learning version of EMC-T2 map**, called DeepEMC-T2 map**, to efficiently estimate accurate T2 maps from fewer echoes w…
▽ More
Purpose: Echo modulation curve (EMC) modeling can provide accurate and reproducible quantification of T2 relaxation times. The standard EMC-T2 map** framework, however, requires sufficient echoes and cumbersome pixel-wise dictionary-matching steps. This work proposes a deep learning version of EMC-T2 map**, called DeepEMC-T2 map**, to efficiently estimate accurate T2 maps from fewer echoes without a dictionary.
Methods: DeepEMC-T2 map** was developed using a modified U-Net to estimate both T2 and Proton Density (PD) maps directly from multi-echo spin-echo (MESE) images. The modified U-Net employs several new features to improve the accuracy of T2/PD estimation. MESE datasets from 68 subjects were used for training and evaluation of the DeepEMC-T2 map** technique. Multiple experiments were conducted to evaluate the impact of the proposed new features on DeepEMC-T2 map**.
Results: DeepEMC-T2 map** achieved T2 estimation errors ranging from 3%-12% in different T2 ranges and 0.8%-1.7% for PD estimation with 10/7/5/3 echoes, which yielded more accurate parameter estimation than standard EMC-T2 map**. The new features proposed in DeepEMC-T2 map** enabled improved parameter estimation. The use of a larger echo spacing with fewer echoes can maintain the accuracy of T2 and PD estimations while reducing the number of 180-degree refocusing pulses.
Conclusions: DeepEMC-T2 map** enables simplified, efficient, and accurate T2 quantification directly from MESE images without a time-consuming dictionary-matching step and requires fewer echoes. This allows for increased volumetric coverage and/or decreased SAR by reducing the number of 180-degree refocusing pulses.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Eliminating Quantization Errors in Classification-Based Sound Source Localization
Authors:
Linfeng Feng,
Xiao-Lei Zhang,
Xuelong Li
Abstract:
Sound Source Localization (SSL) involves estimating the Direction of Arrival (DOA) of sound sources. Since the DOA estimation output space is continuous, regression might be more suitable for DOA, offering higher precision. However, in practice, classification often outperforms regression, exhibiting greater robustness to interference. Conversely, classification's drawback is inherent quantization…
▽ More
Sound Source Localization (SSL) involves estimating the Direction of Arrival (DOA) of sound sources. Since the DOA estimation output space is continuous, regression might be more suitable for DOA, offering higher precision. However, in practice, classification often outperforms regression, exhibiting greater robustness to interference. Conversely, classification's drawback is inherent quantization error. Within the classification paradigm, the DOA output space is discretized into intervals, each treated as a class. These classes show strong inter-class correlations, being inherently ordered, with higher similarity as intervals grow closer. Nevertheless, this has not been fully exploited. To address this, we propose an Unbiased Label Distribution (ULD) to eliminate quantization error in training targets. Furthermore, we tailor two loss functions for the soft label family: Negative Log Absolute Error (NLAE) and Mean Squared Error without activation (MSE(wo)). Finally, we introduce Weighted Adjacent Decoding (WAD) to overcome quantization error during model prediction decoding. Experimental results demonstrate our approach surpasses classification quantization limits, achieving state-of-the-art performance. Our code and supplementary materials are available at https://github.com/linfeng-feng/ULD.
△ Less
Submitted 28 January, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications
Authors:
Pian Yu,
Shuyang Dong,
Shili Sheng,
Lu Feng,
Marta Kwiatkowska
Abstract:
Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework…
▽ More
Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis
Authors:
Di Guo,
Si** Li,
Jun Liu,
Zhangren Tu,
Tianyu Qiu,
**g**g Xu,
Liubin Feng,
Donghai Lin,
Qing Hong,
Mei** Lin,
Yanqin Lin,
Xiaobo Qu
Abstract:
Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep l…
▽ More
Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Soft Label Coding for End-to-end Sound Source Localization With Ad-hoc Microphone Arrays
Authors:
Linfeng Feng,
Yijun Gong,
Xiao-Lei Zhang
Abstract:
Recently, an end-to-end two-dimensional sound source localization algorithm with ad-hoc microphone arrays formulates the sound source localization problem as a classification problem. The algorithm divides the target indoor space into a set of local areas, and predicts the local area where the speaker locates. However, the local areas are encoded by one-hot code, which may lose the connections bet…
▽ More
Recently, an end-to-end two-dimensional sound source localization algorithm with ad-hoc microphone arrays formulates the sound source localization problem as a classification problem. The algorithm divides the target indoor space into a set of local areas, and predicts the local area where the speaker locates. However, the local areas are encoded by one-hot code, which may lose the connections between the local areas due to quantization errors. In this paper, we propose a new soft label coding method, named label smoothing, for the classification-based two-dimensional sound source location with ad-hoc microphone arrays. The core idea is to take the geometric connection between the classes into the label coding process.The first one is named static soft label coding (SSLC), which modifies the one-hot codes into soft codes based on the distances between the local areas. Because SSLC is handcrafted which may not be optimal, the second one, named dynamic soft label coding (DSLC), further rectifies SSLC, by learning the soft codes according to the statistics of the predictions produced by the classification-based localization model in the training stage. Experimental results show that the proposed methods can effectively improve the localization accuracy.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Joint Spectrum and Power Allocation for V2X Communications with Imperfect CSI
Authors:
Peng Wang,
Weihua Wu,
Jiayi Liu,
Guanhua Chai,
Li Feng
Abstract:
In Vehicle-to-Everything (V2X) communication, the high mobility of vehicles generates the Doppler shift which leads to channel uncertainties. Moreover, the reasons for channel uncertainties also include the finite channel feedback, channels state information (CSI) loss and latency. With this concern, we formulate a joint spectrum and power allocation problem for V2X communication with imperfect CS…
▽ More
In Vehicle-to-Everything (V2X) communication, the high mobility of vehicles generates the Doppler shift which leads to channel uncertainties. Moreover, the reasons for channel uncertainties also include the finite channel feedback, channels state information (CSI) loss and latency. With this concern, we formulate a joint spectrum and power allocation problem for V2X communication with imperfect CSI. Specifically, the sum capacity of cellular user equipments (CUEs) is maximized subject to the minimum Signal-to-Interference-and-Noise Ratio (SINR) requirements of CUEs and the outage probability constraints of vehicular user equipments (VUEs). Then, two different robust resource allocation approaches are designed to solve the problem. One is Bernstein Approximation-based Robust Resource Allocation approach. More specifically, Bernstein approximations are employed to convert the chance constraint into a calculable constraint, and Bisection search method is proposed to obtain the optimal allocation solution with low complexity. Then, for further reducing the computational complexity, Self-learning Robust Resource Allocation approach, which includes a learning method and an analytical map** method, is proposed as the second approach. The learning method is devised to learn the uncertainty set which transforms the chance constraint into calculable constraints, and the analytical map** method is proposed to obtain closed-form solutions of the resource allocation problem. Finally, the simulation results prove that the proposed approaches can improve the capacity of all CUEs effectively whilst ensuring the reliability of the channel.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
xURLLC-Aware Service Provisioning in Vehicular Networks: A Semantic Communication Perspective
Authors:
Le Xia,
Yao Sun,
Dusit Niyato,
Daquan Feng,
Lei Feng,
Muhammad Ali Imran
Abstract:
Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency req…
▽ More
Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency requirements. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to simultaneously realize efficient service provisioning for multiple users in vehicle-to-vehicle networks. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks in alignment with the next-generation ultra-reliable and low-latency communication (xURLLC) requirements. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee and computational complexity. Numerical results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, user knowledge matching degree and knowledge preference satisfaction compared with two benchmarks.
△ Less
Submitted 23 September, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
In situ Biological Particle Analyzer based on Digital Inline Holography
Authors:
Delaney Sanborn,
Ruichen He,
Lei Feng,
Jiarong Hong
Abstract:
Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time,…
▽ More
Obtaining in situ measurements of biological microparticles is crucial for both scientific research and numerous industrial applications (e.g., early detection of harmful algal blooms, monitoring yeast during fermentation). However, existing methods are limited to offer timely diagnostics of these particles with sufficient accuracy and information. Here, we introduce a novel method for real-time, in situ analysis using machine learning assisted digital inline holography (DIH). Our machine learning model uses a customized YOLO v5 architecture specialized for the detection and classification of small biological particles. We demonstrate the effectiveness of our method in the analysis of 10 plankton species with equivalent high accuracy and significantly reduced processing time compared to previous methods. We also applied our method to differentiate yeast cells under four metabolic states and from two strains. Our results show that the proposed method can accurately detect and differentiate cellular and subcellular features related to metabolic states and strains. This study demonstrates the potential of machine learning driven DIH approach as a sensitive and versatile diagnostic tool for real-time, in situ analysis of both biotic and abiotic particles. This method can be readily deployed in a distributive manner for scientific research and manufacturing on an industrial scale.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
Towards Develo** Safety Assurance Cases for Learning-Enabled Medical Cyber-Physical Systems
Authors:
Maryam Bagheri,
Josephine Lamp,
Xugui Zhou,
Lu Feng,
Homa Alemzadeh
Abstract:
Machine Learning (ML) technologies have been increasingly adopted in Medical Cyber-Physical Systems (MCPS) to enable smart healthcare. Assuring the safety and effectiveness of learning-enabled MCPS is challenging, as such systems must account for diverse patient profiles and physiological dynamics and handle operational uncertainties. In this paper, we develop a safety assurance case for ML contro…
▽ More
Machine Learning (ML) technologies have been increasingly adopted in Medical Cyber-Physical Systems (MCPS) to enable smart healthcare. Assuring the safety and effectiveness of learning-enabled MCPS is challenging, as such systems must account for diverse patient profiles and physiological dynamics and handle operational uncertainties. In this paper, we develop a safety assurance case for ML controllers in learning-enabled MCPS, with an emphasis on establishing confidence in the ML-based predictions. We present the safety assurance case in detail for Artificial Pancreas Systems (APS) as a representative application of learning-enabled MCPS, and provide a detailed analysis by implementing a deep neural network for the prediction in APS. We check the sufficiency of the ML data and analyze the correctness of the ML-based prediction using formal verification. Finally, we outline open research problems based on our experience in this paper.
△ Less
Submitted 19 December, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays
Authors:
Shupei Liu,
Linfeng Feng,
Yijun Gong,
Chengdong Liang,
Chen Zhang,
Xiao-Lei Zhang,
Xuelong Li
Abstract:
While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distri…
▽ More
While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10.
△ Less
Submitted 1 April, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
4D Real-Time GRASP MRI at Sub-Second Temporal Resolution
Authors:
Li Feng
Abstract:
Intra-frame motion blurring, as a major challenge in free-breathing dynamic MRI, can be reduced if high temporal resolution can be achieved. To address this challenge, this work proposes a highly-accelerated 4D (3D+time) real-time MRI framework with sub-second temporal resolution combining standard stack-of-stars golden-angle radial sampling and tailored GRASP-Pro (Golden-angle RAdial Sparse Paral…
▽ More
Intra-frame motion blurring, as a major challenge in free-breathing dynamic MRI, can be reduced if high temporal resolution can be achieved. To address this challenge, this work proposes a highly-accelerated 4D (3D+time) real-time MRI framework with sub-second temporal resolution combining standard stack-of-stars golden-angle radial sampling and tailored GRASP-Pro (Golden-angle RAdial Sparse Parallel) reconstruction. Specifically, 4D real-time MRI acquisition is performed continuously without motion gating or sorting. The k-space centers in stack-of-stars radial data are organized to guide estimation of a temporal basis, with which GRASP-Pro reconstruction is employed to enforce joint low-rank subspace and sparsity constraints. This new basis estimation strategy is the new feature proposed for subspace-based reconstruction in this work to achieve high temporal resolution (e.g., sub-second/3D volume). It does not require sequence modification to acquire additional navigation data, is compatible with commercially available stack-of-stars sequences, and does not need an intermediate reconstruction step. The proposed 4D real-time MRI approach was tested in abdominal motion phantom, free-breathing abdominal MRI, and dynamic contrast-enhanced MRI (DCE-MRI). With the ability to acquire each 3D image in less than one second, intra-frame respiratory blurring can be intrinsically reduced for body applications with our approach, which also eliminates the need for motion detection and motion compensation.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
An Instrumented Wheel-On-Limb System of Planetary Rovers for Wheel-Terrain Interactions: System Conception and Preliminary Design
Authors:
Lihang Feng,
Xu Jiang,
Aiguo Song
Abstract:
Understanding the wheel-terrain interaction is of great importance to improve the maneuverability and traversability of the rovers. A well-developed sensing device carried by the rover would greatly facilitate the complex risk-reducing operations on sandy terrains. In this paper, an instrumented wheel-on-limb (WOL) system of planetary rovers for wheel-terrain interaction characterization is presen…
▽ More
Understanding the wheel-terrain interaction is of great importance to improve the maneuverability and traversability of the rovers. A well-developed sensing device carried by the rover would greatly facilitate the complex risk-reducing operations on sandy terrains. In this paper, an instrumented wheel-on-limb (WOL) system of planetary rovers for wheel-terrain interaction characterization is presented. Assuming the function of a passive suspension of the wheel, the WOL system allows itself to follow the terrain contour, and keep the wheel remain lowered onto the ground during rover motion including climbing and descending, as well as deploy and place the wheel on the ground before a drive commanding. The system concept, functional requirements, and pre-design work, as well as the system integration are presented.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks
Authors:
Wei Peng,
Li Feng,
Guoying Zhao,
Fang Liu
Abstract:
The inherent slow imaging speed of Magnetic Resonance Image (MRI) has spurred the development of various acceleration methods, typically through heuristically undersampling the MRI measurement domain known as k-space. Recently, deep neural networks have been applied to reconstruct undersampled k-space data and have shown improved reconstruction performance. While most of these methods focus on des…
▽ More
The inherent slow imaging speed of Magnetic Resonance Image (MRI) has spurred the development of various acceleration methods, typically through heuristically undersampling the MRI measurement domain known as k-space. Recently, deep neural networks have been applied to reconstruct undersampled k-space data and have shown improved reconstruction performance. While most of these methods focus on designing novel reconstruction networks or new training strategies for a given undersampling pattern, e.g., Cartesian undersampling or Non-Cartesian sampling, to date, there is limited research aiming to learn and optimize k-space sampling strategies using deep neural networks. This work proposes a novel optimization framework to learn k-space sampling trajectories by considering it as an Ordinary Differential Equation (ODE) problem that can be solved using neural ODE. In particular, the sampling of k-space data is framed as a dynamic system, in which neural ODE is formulated to approximate the system with additional constraints on MRI physics. In addition, we have also demonstrated that trajectory optimization and image reconstruction can be learned collaboratively for improved imaging efficiency and reconstruction performance. Experiments were conducted on different in-vivo datasets (e.g., brain and knee images) acquired with different sequences. Initial results have shown that our proposed method can generate better image quality in accelerated MRI than conventional undersampling schemes in Cartesian and Non-Cartesian acquisitions.
△ Less
Submitted 12 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Authors:
Lingyun Feng,
Jianwei Yu,
Deng Cai,
Songxiang Liu,
Haitao Zheng,
Yan Wang
Abstract:
Language understanding in speech-based systems have attracted much attention in recent years with the growing demand for voice interface applications. However, the robustness of natural language understanding (NLU) systems to errors introduced by automatic speech recognition (ASR) is under-examined. %To facilitate the research on ASR-robust general language understanding, In this paper, we propose…
▽ More
Language understanding in speech-based systems have attracted much attention in recent years with the growing demand for voice interface applications. However, the robustness of natural language understanding (NLU) systems to errors introduced by automatic speech recognition (ASR) is under-examined. %To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics. Based on the proposed benchmark, we systematically investigate the effect of ASR error on NLU tasks in terms of noise intensity, error type and speaker variants. We further purpose two ways, correction-based method and data augmentation-based method to improve robustness of the NLU systems. Extensive experimental results and analysises show that the proposed methods are effective to some extent, but still far from human performance, demonstrating that NLU under ASR error is still very challenging and requires further research.
△ Less
Submitted 16 March, 2022; v1 submitted 30 August, 2021;
originally announced August 2021.
-
NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Authors:
**** Gu,
Haoming Cai,
Chao Dong,
Jimmy S. Ren,
Yu Qiao,
Shuhang Gu,
Radu Timofte,
Manri Cheon,
Sungjun Yoon,
Byungyeon Kang,
Junwoo Lee,
Qing Zhang,
Haiyang Guo,
Yi Bin,
Yuqing Hou,
Hengliang Luo,
**gyu Guo,
Zirui Wang,
Hai Wang,
Wenming Yang,
Qingyan Bai,
Shuwei Shi,
Weihao Xia,
Mingdeng Cao,
Jiahao Wang
, et al. (25 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o…
▽ More
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.
△ Less
Submitted 28 June, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
A Sparse Model-inspired Deep Thresholding Network for Exponential Signal Reconstruction -- Application in Fast Biological Spectroscopy
Authors:
Zi Wang,
Di Guo,
Zhangren Tu,
Yihui Huang,
Yirong Zhou,
Jian Wang,
Liubin Feng,
Donghai Lin,
Yongfu You,
Tatiana Agback,
Vladislav Orekhov,
Xiaobo Qu
Abstract:
The non-uniform sampling is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partial sampled exponentials is highly expected in general signal processing and many applications. Deep learning has shown astonishing potential in this field but many existing problems, such as lack of robustness and explainability, greatly…
▽ More
The non-uniform sampling is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partial sampled exponentials is highly expected in general signal processing and many applications. Deep learning has shown astonishing potential in this field but many existing problems, such as lack of robustness and explainability, greatly limit its applications. In this work, by combining merits of the sparse model-based optimization method and data-driven deep learning, we propose a deep learning architecture for spectra reconstruction from undersampled data, called MoDern. It follows the iterative reconstruction in solving a sparse model to build the neural network and we elaborately design a learnable soft-thresholding to adaptively eliminate the spectrum artifacts introduced by undersampling. Extensive results on both synthetic and biological data show that MoDern enables more robust, high-fidelity, and ultra-fast reconstruction than the state-of-the-art methods. Remarkably, MoDern has a small number of network parameters and is trained on solely synthetic data while generalizing well to biological data in various scenarios. Furthermore, we extend it to an open-access and easy-to-use cloud computing platform (XCloud-MoDern), contributing a promising strategy for further development of biological applications.
△ Less
Submitted 17 January, 2022; v1 submitted 29 December, 2020;
originally announced December 2020.
-
Predictive Monitoring with Logic-Calibrated Uncertainty for Cyber-Physical Systems
Authors:
Meiyi Ma,
John Stankovic,
Ezio Bartocci,
Lu Feng
Abstract:
Predictive monitoring -- making predictions about future states and monitoring if the predicted states satisfy requirements -- offers a promising paradigm in supporting the decision making of Cyber-Physical Systems (CPS). Existing works of predictive monitoring mostly focus on monitoring individual predictions rather than sequential predictions. We develop a novel approach for monitoring sequentia…
▽ More
Predictive monitoring -- making predictions about future states and monitoring if the predicted states satisfy requirements -- offers a promising paradigm in supporting the decision making of Cyber-Physical Systems (CPS). Existing works of predictive monitoring mostly focus on monitoring individual predictions rather than sequential predictions. We develop a novel approach for monitoring sequential predictions generated from Bayesian Recurrent Neural Networks (RNNs) that can capture the inherent uncertainty in CPS, drawing on insights from our study of real-world CPS datasets. We propose a new logic named \emph{Signal Temporal Logic with Uncertainty} (STL-U) to monitor a flowpipe containing an infinite set of uncertain sequences predicted by Bayesian RNNs. We define STL-U strong and weak satisfaction semantics based on if all or some sequences contained in a flowpipe satisfy the requirement. We also develop methods to compute the range of confidence levels under which a flowpipe is guaranteed to strongly (weakly) satisfy an STL-U formula. Furthermore, we develop novel criteria that leverage STL-U monitoring results to calibrate the uncertainty estimation in Bayesian RNNs. Finally, we evaluate the proposed approach via experiments with real-world datasets and a simulated smart city case study, which show very encouraging results of STL-U based predictive monitoring approach outperforming baselines.
△ Less
Submitted 24 July, 2021; v1 submitted 31 October, 2020;
originally announced November 2020.
-
Spatiotemporal Flexible Sparse Reconstruction for Rapid Dynamic Contrast-enhanced MRI
Authors:
Yuhan Hu,
Xinlin Zhang,
Li Feng,
Dicheng Chen,
Zhi** Yan,
Xiaoyong Shen,
Gen Yan,
Lin Ou-yang,
Xiaobo Qu
Abstract:
Dynamic Contrast-enhanced magnetic resonance imaging (DCE-MRI) is a tissue perfusion imaging technique. Some versatile free-breathing DCE-MRI techniques combining compressed sensing (CS) and parallel imaging with golden-angle radial sampling have been developed to improve motion robustness with high spatial and temporal resolution. These methods have demonstrated good diagnostic performance in cli…
▽ More
Dynamic Contrast-enhanced magnetic resonance imaging (DCE-MRI) is a tissue perfusion imaging technique. Some versatile free-breathing DCE-MRI techniques combining compressed sensing (CS) and parallel imaging with golden-angle radial sampling have been developed to improve motion robustness with high spatial and temporal resolution. These methods have demonstrated good diagnostic performance in clinical setting, but the reconstruction quality will degrade at high acceleration rates and overall reconstruction time remains long. In this paper, we proposed a new parallel CS reconstruction model for DCE-MRI that enforces flexible weighted sparse constraint along both spatial and temporal dimensions. Weights were introduced to flexibly adjust the importance of time and space sparsity, and we derived a fast thresholding algorithm which was proven to be simple and efficient for solving the proposed reconstruction model. Results on in vivo liver DCE datasets show that the proposed method outperforms the state-of-the-art methods in terms of visual image quality assessment and reconstruction speed without introducing significant temporal blurring.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Automate Obstructive Sleep Apnea Diagnosis Using Convolutional Neural Networks
Authors:
Longlong Feng,
Xu Wang
Abstract:
Identifying sleep problem severity from overnight polysomnography (PSG) recordings plays an important role in diagnosing and treating sleep disorders such as the Obstructive Sleep Apnea (OSA). This analysis traditionally is done by specialists manually through visual inspections, which can be tedious, time-consuming, and is prone to subjective errors. One of the solutions is to use Convolutional N…
▽ More
Identifying sleep problem severity from overnight polysomnography (PSG) recordings plays an important role in diagnosing and treating sleep disorders such as the Obstructive Sleep Apnea (OSA). This analysis traditionally is done by specialists manually through visual inspections, which can be tedious, time-consuming, and is prone to subjective errors. One of the solutions is to use Convolutional Neural Networks (CNN) where the convolutional and pooling layers behave as feature extractors and some fully-connected (FCN) layers are used for making final predictions for the OSA severity. In this paper, a CNN architecture with 1D convolutional and FCN layers for classification is presented. The PSG data for this project are from the Cleveland Children's Sleep and Health Study database and classification results confirm the effectiveness of the proposed CNN method. The proposed 1D CNN model achieves excellent classification results without manually preprocesssing PSG signals such as feature extraction and feature reduction.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
Reinforcement Learning to Optimize the Logistics Distribution Routes of Unmanned Aerial Vehicle
Authors:
Linfei Feng
Abstract:
Path planning methods for the unmanned aerial vehicle (UAV) in goods delivery have drawn great attention from industry and academics because of its flexibility which is suitable for many situations in the "Last Kilometer" between customer and delivery nodes. However, the complicated situation is still a problem for traditional combinatorial optimization methods. Based on the state-of-the-art Reinf…
▽ More
Path planning methods for the unmanned aerial vehicle (UAV) in goods delivery have drawn great attention from industry and academics because of its flexibility which is suitable for many situations in the "Last Kilometer" between customer and delivery nodes. However, the complicated situation is still a problem for traditional combinatorial optimization methods. Based on the state-of-the-art Reinforcement Learning (RL), this paper proposed an improved method to achieve path planning for UAVs in complex surroundings: multiple no-fly zones. The improved approach leverages the attention mechanism and includes the embedding mechanism as the encoder and three different widths of beam search (i.e.,~1, 5, and 10) as the decoders. Policy gradients are utilized to train the RL model for obtaining the optimal strategies during inference. The results show the feasibility and efficiency of the model applying in this kind of complicated situation. Comparing the model with the results obtained by the optimization solver OR-tools, it improves the reliability of the distribution system and has a guiding significance for the broad application of UAVs.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Implicit Higher-Order Moment Matching Technique for Model Reduction of Quadratic-bilinear Systems
Authors:
Mian Muhammad Arsalan Asif,
Mian Ilyas Ahmad,
Peter Benner,
Lihong Feng,
Tatjana Stykel
Abstract:
We propose a projection based multi-moment matching method for model order reduction of quadratic-bilinear systems. The goal is to construct a reduced system that ensures higher-order moment matching for the multivariate transfer functions appearing in the input-output representation of the nonlinear system. An existing technique achieves this for the first two multivariate transfer functions, in…
▽ More
We propose a projection based multi-moment matching method for model order reduction of quadratic-bilinear systems. The goal is to construct a reduced system that ensures higher-order moment matching for the multivariate transfer functions appearing in the input-output representation of the nonlinear system. An existing technique achieves this for the first two multivariate transfer functions, in what is called the symmetric form of the multivariate transfer functions. We extend this framework to an equivalent and simplified form, the regular form, which allows us to show moment matching for the first three multivariate transfer functions. Numerical results for three benchmark examples of quadratic-bilinear systems show that the proposed framework exhibits better performance with reduced computational cost in comparison to existing techniques.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
SaSTL: Spatial Aggregation Signal Temporal Logic for Runtime Monitoring in Smart Cities
Authors:
Meiyi Ma,
Ezio Bartocci,
Eli Lifland,
John Stankovic,
Lu Feng
Abstract:
We present SaSTL -- a novel Spatial Aggregation Signal Temporal Logic -- for the efficient runtime monitoring of safety and performance requirements in smart cities. We first describe a study of over 1,000 smart city requirements, some of which can not be specified using existing logic such as Signal Temporal Logic (STL) and its variants. To tackle this limitation, we develop two new logical opera…
▽ More
We present SaSTL -- a novel Spatial Aggregation Signal Temporal Logic -- for the efficient runtime monitoring of safety and performance requirements in smart cities. We first describe a study of over 1,000 smart city requirements, some of which can not be specified using existing logic such as Signal Temporal Logic (STL) and its variants. To tackle this limitation, we develop two new logical operators in SaSTL to augment STL for expressing spatial aggregation and spatial counting characteristics that are commonly found in real city requirements. We also develop efficient monitoring algorithms that can check a SaSTL requirement in parallel over multiple data streams (e.g., generated by multiple sensors distributed spatially in a city). We evaluate our SaSTL monitor by applying to two case studies with large-scale real city sensing data (e.g., up to 10,000 sensors in one requirement). The results show that SaSTL has a much higher coverage expressiveness than other spatial-temporal logics, and with a significant reduction of computation time for monitoring requirements. We also demonstrate that the SaSTL monitor can help improve the safety and performance of smart cities via simulated experiments.
△ Less
Submitted 14 December, 2021; v1 submitted 6 August, 2019;
originally announced August 2019.
-
Closed Loop Load Model Identification Using Small Disturbance Data
Authors:
Shangyuan Li,
Li Feng,
Deqiang Gan,
Zhen Wang,
Wei Bao,
Hao Xu
Abstract:
Load model identification using small disturbance data is studied. It is proved that the individual load to be identified and the rest of the system forms a closed-loop system. Then, the impacts of disturbances entering the feedforward channel (internal disturbance) and feedback channel (external disturbance) on relationship between load inputs and outputs are examined analytically. It is found ou…
▽ More
Load model identification using small disturbance data is studied. It is proved that the individual load to be identified and the rest of the system forms a closed-loop system. Then, the impacts of disturbances entering the feedforward channel (internal disturbance) and feedback channel (external disturbance) on relationship between load inputs and outputs are examined analytically. It is found out that relationship between load inputs and outputs is not determined by load itself (feedforward transfer function) only, but also related with equivalent network matrix (feedback transfer function). Thus, load identification is closed loop identification essentially and the impact of closed loop identification cannot be neglected when using small disturbance data to identify load parameters. Closed loop load model identification can be solved by prediction error method (PEM). Implementation of PEM based on a Kalman filtering formulation is detailed. Identification results using simulated data demonstrates the correctness and significance of theoretical analysis.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Bottom-up Broadcast Neural Network For Music Genre Classification
Authors:
Caifeng Liu,
Lin Feng,
Guochao Liu,
Huibing Wang,
Shenglan Liu
Abstract:
Music genre recognition based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not a…
▽ More
Music genre recognition based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audios and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the long contextual information into considerations, which transfers more suitable information for the decision-making layer. Various experiments on several benchmark datasets, including GTZAN, Ballroom, and Extended Ballroom, have verified the excellent performances of the proposed neural network. Codes and model will be available at "ttps://github.com/CaifengLiu/music-genre-classification".
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
SANTIS: Sampling-Augmented Neural neTwork with Incoherent Structure for MR image reconstruction
Authors:
Fang Liu,
Lihua Chen,
Richard Kijowski,
Li Feng
Abstract:
Deep learning holds great promise in the reconstruction of undersampled Magnetic Resonance Imaging (MRI) data, providing new opportunities to escalate the performance of rapid MRI. In existing deep learning-based reconstruction methods, supervised training is performed using artifact-free reference images and their corresponding undersampled pairs. The undersampled images are generated by a fixed…
▽ More
Deep learning holds great promise in the reconstruction of undersampled Magnetic Resonance Imaging (MRI) data, providing new opportunities to escalate the performance of rapid MRI. In existing deep learning-based reconstruction methods, supervised training is performed using artifact-free reference images and their corresponding undersampled pairs. The undersampled images are generated by a fixed undersampling pattern in the training, and the trained network is then applied to reconstruct new images acquired with the same pattern in the inference. While such a training strategy can maintain a favorable reconstruction for a pre-selected undersampling pattern, the robustness of the trained network against any discrepancy of undersampling schemes is typically poor. We developed a novel deep learning-based reconstruction framework called SANTIS for efficient MR image reconstruction with improved robustness against sampling pattern discrepancy. SANTIS uses a data cycle-consistent adversarial network combining efficient end-to-end convolutional neural network map**, data fidelity enforcement and adversarial training for reconstructing accelerated MR images more faithfully. A training strategy employing sampling augmentation with extensive variation of undersampling patterns was further introduced to promote the robustness of the trained network. Compared to conventional reconstruction and standard deep learning methods, SANTIS achieved consistent better reconstruction performance, with lower errors, greater image sharpness and higher similarity with respect to the reference regardless of the undersampling patterns during inference. This novel concept behind SANTIS can particularly be useful towards improving the robustness of deep learning-based image reconstruction against discrepancy between training and evaluation, which is currently an important but less studied open question.
△ Less
Submitted 8 December, 2018;
originally announced December 2018.
-
Music Genre Classification with Paralleling Recurrent Convolutional Neural Network
Authors:
Lin Feng,
Shenlan Liu,
Jianing Yao
Abstract:
Deep learning has been demonstrated its effectiveness and efficiency in music genre classification. However, the existing achievements still have several shortcomings which impair the performance of this classification task. In this paper, we propose a hybrid architecture which consists of the paralleling CNN and Bi-RNN blocks. They focus on spatial features and temporal frame orders extraction re…
▽ More
Deep learning has been demonstrated its effectiveness and efficiency in music genre classification. However, the existing achievements still have several shortcomings which impair the performance of this classification task. In this paper, we propose a hybrid architecture which consists of the paralleling CNN and Bi-RNN blocks. They focus on spatial features and temporal frame orders extraction respectively. Then the two outputs are fused into one powerful representation of musical signals and fed into softmax function for classification. The paralleling network guarantees the extracting features robust enough to represent music. Moreover, the experiments prove our proposed architecture improve the music genre classification performance and the additional Bi-RNN block is a supplement for CNNs.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.