-
MusicScore: A Dataset for Music Score Modeling and Generation
Authors:
Yuheng Lin,
Zheqi Dai,
Qiuqiang Kong
Abstract:
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are ma…
▽ More
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are mainly designed for optical music recognition (OMR). There is a lack of research on creating a large-scale benchmark dataset for music modeling and generation. In this work, we propose MusicScore, a large-scale music score dataset collected and processed from the International Music Score Library Project (IMSLP). MusicScore consists of image-text pairs, where the image is a page of a music score and the text is the metadata of the music. The metadata of MusicScore is extracted from the general information section of the IMSLP pages. The metadata includes rich information about the composer, instrument, piece style, and genre of the music pieces. MusicScore is curated into small, medium, and large scales of 400, 14k, and 200k image-text pairs with varying diversity, respectively. We build a score generation system based on a UNet diffusion model to generate visually readable music scores conditioned on text descriptions to benchmark the MusicScore dataset for music score generation. MusicScore is released to the public at https://huggingface.co/datasets/ZheqiDAI/MusicScore.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Performance Analysis of Hybrid Cellular and Cell-free MIMO Network
Authors:
Zhuoyin Dai,
**gran Xu,
Xiaoli Xu,
Ruoguang Li,
Yong Zeng
Abstract:
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio…
▽ More
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communication network and collaborate with the existing cellular base stations (BSs). To further explore the performance limits of hybrid cellular and cell-free networks, this paper develops a hybrid network model based on stochastic geometric toolkits, which reveals the coupling of the signal and interference from both the cellular and cell-free networks. Specifically, the conjugate beamforming is applied in hybrid cellular and cell-free networks, which enables user equipment (UE) to benefit from both cellular BSs and cell-free APs. The aggregate signal received from the hybrid network is approximated via moment matching, and coverage probability is characterized by deriving the Laplace transform of the interference. The analysis of signal strength and coverage probability is verified by extensive simulations.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments
Authors:
Kehan Long,
Yinzhuang Yi,
Zhirui Dai,
Sylvia Herbert,
Jorge Cortés,
Nikolay Atanasov
Abstract:
We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dyn…
▽ More
We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dynamic operational conditions. By leveraging recent advances in distributionally robust optimization, we develop a distributionally robust control barrier function (DR-CBF) constraint that directly processes range sensor data to impose safety constraints. Coupling this with a control Lyapunov function (CLF) for path tracking, we demonstrate that our CLF-DR-CBF control synthesis method achieves safe, efficient, and robust navigation in uncertain dynamic environments. We demonstrate the effectiveness of our approach in simulated and real autonomous robot navigation experiments, marking a substantial advancement in real-time safety guarantees for mobile robots.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Prototy** and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map
Authors:
Zhuoyin Dai,
Di Wu,
Zhenjun Dong,
Kun Li,
Dingyang Ding,
Sihan Wang,
Yong Zeng
Abstract:
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te…
▽ More
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, termed beam index map (BIM). To this end, a general CKM construction method is first presented, and an indoor BIM is constructed offline to learn the candidate transmit and receive beam index pairs for each grid in the experimental area. Furthermore, based on the location information of the receiver (or the dynamic obstacles) from the ultra-wide band (UWB) positioning system, the established BIM is used to achieve training-free beam alignment by directly providing the beam indexes for the transmitter and receiver. Three typical scenarios are considered in the experiment, including quasi-static environment with line-of-sight (LoS) link, quasistatic environment without LoS link and dynamic environment. Besides, the receiver orientation measured from the gyroscope is also used to help CKM predict more accurate beam indexes. The experiment results show that compared with the benchmark location-based beam alignment strategy, the CKM-based beam alignment strategy can achieve much higher received power, which is close to that achieved by exhaustive beam search, but with significantly reduced training overhead.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
Authors:
Yuhang He,
Zhuangzhuang Dai,
Long Chen,
Niki Trigoni,
Andrew Markham
Abstract:
In this paper, we study an underexplored, yet important and challenging problem: counting the number of distinct sounds in raw audio characterized by a high degree of polyphonicity. We do so by systematically proposing a novel end-to-end trainable neural network (which we call DyDecNet, consisting of a dyadic decomposition front-end and backbone network), and quantifying the difficulty level of co…
▽ More
In this paper, we study an underexplored, yet important and challenging problem: counting the number of distinct sounds in raw audio characterized by a high degree of polyphonicity. We do so by systematically proposing a novel end-to-end trainable neural network (which we call DyDecNet, consisting of a dyadic decomposition front-end and backbone network), and quantifying the difficulty level of counting depending on sound polyphonicity. The dyadic decomposition front-end progressively decomposes the raw waveform dyadically along the frequency axis to obtain time-frequency representation in multi-stage, coarse-to-fine manner. Each intermediate waveform convolved by a parent filter is further processed by a pair of child filters that evenly split the parent filter's carried frequency response, with the higher-half child filter encoding the detail and lower-half child filter encoding the approximation. We further introduce an energy gain normalization to normalize sound loudness variance and spectrum overlap, and apply it to each intermediate parent waveform before feeding it to the two child filters. To better quantify sound counting difficulty level, we further design three polyphony-aware metrics: polyphony ratio, max polyphony and mean polyphony. We test DyDecNet on various datasets to show its superiority, and we further show dyadic decomposition network can be used as a general front-end to tackle other acoustic tasks.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Facilitating Battery Swap** Services for Freight Trucks with Spatial-Temporal Demand Prediction
Authors:
Linyu Liu,
Zhen Dai,
Shiji Song,
Xiaocheng Li,
Guanting Chen
Abstract:
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swap** services emerge as an attractive solution for these trucks. This paper empl…
▽ More
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swap** services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swap** services favors mobile battery-swap** stations, but as the system matures, fixed-location stations are preferred.
△ Less
Submitted 23 May, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Edge Enhanced Image Style Transfer via Transformers
Authors:
Chiyu Zhang,
Jun Yang,
Zaiyan Dai,
Peng Cao
Abstract:
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient sty…
▽ More
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
MIMO Symbiotic Radio with Massive Passive Devices: Asymptotic Analysis and Precoding Optimization
Authors:
**gran Xu,
Zhuoyin Dai,
Yong Zeng
Abstract:
Symbiotic radio has emerged as a promising technology for spectrum- and energy-efficient wireless communications, where the passive secondary backscatter devices (BDs) reuse not only the spectrum but also the power of the active primary users to transmit their own information. In return, the primary communication links can be enhanced by the additional multipaths created by the BDs. This is known…
▽ More
Symbiotic radio has emerged as a promising technology for spectrum- and energy-efficient wireless communications, where the passive secondary backscatter devices (BDs) reuse not only the spectrum but also the power of the active primary users to transmit their own information. In return, the primary communication links can be enhanced by the additional multipaths created by the BDs. This is known as the mutualism relationship of symbiotic radio. However, due to the severe double-fading attenuation of the passive backscattering links, the enhancement of the primary link provided by one single BD is extremely limited. To address this issue and enable full mutualism of symbiotic radio, in this paper, we study multiple-input multiple output (MIMO) symbiotic radio communication systems with massive BDs. We first derive the achievable rates of the primary active communication and secondary passive communication, and then consider the asymptotic regime as the number of BDs goes large, for which closed-form expressions are derived to reveal the relationship between the primary and secondary communication rates. Furthermore, the precoding optimization problem is studied to maximize the primary communication rate while guaranteeing that the secondary communication rate is no smaller than a certain threshold. Simulation results are provided to validate our theoretical studies.
△ Less
Submitted 27 June, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Authors:
Ziqian Dai,
Jianwei Yu,
Yan Wang,
Nuo Chen,
Yanyao Bian,
Guangzhi Li,
Deng Cai,
Dong Yu
Abstract:
Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This…
▽ More
Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This model is pre-trained on text and speech data separately and jointly fine-tuned on TTS data in a triplet format: {speech, text, prosody}. The experimental results on both automatic evaluation and human evaluation demonstrate that: 1) the proposed text-speech prosody annotation framework significantly outperforms text-only baselines; 2) the quality of automatic prosodic boundary annotations is comparable to human annotations; 3) TTS systems trained with model-annotated boundaries are slightly better than systems that use manual ones.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Rate-Region Characterization and Channel Estimation for Cell-Free Symbiotic Radio Communications
Authors:
Zhuoyin Dai,
Ruoguang Li,
**gran Xu,
Yong Zeng,
Shi **
Abstract:
Cell-free massive MIMO and symbiotic radio communication have been recently proposed as the promising beyond fifth-generation (B5G) networking architecture and transmission technology, respectively. To reap the benefits of both, this paper studies cell-free symbiotic radio communication systems, where a number of cell-free access points (APs) cooperatively send primary information to a receiver, a…
▽ More
Cell-free massive MIMO and symbiotic radio communication have been recently proposed as the promising beyond fifth-generation (B5G) networking architecture and transmission technology, respectively. To reap the benefits of both, this paper studies cell-free symbiotic radio communication systems, where a number of cell-free access points (APs) cooperatively send primary information to a receiver, and simultaneously support the passive backscattering communication of the secondary backscatter device (BD). We first derive the achievable communication rates of the active primary user and passive secondary user under the assumption of perfect channel state information (CSI), based on which the transmit beamforming of the cellfree APs is optimized to characterize the achievable rate-region of cell-free symbiotic communication systems. Furthermore, to practically acquire the CSI of the active and passive channels, we propose an efficient channel estimation method based on two-phase uplink-training, and the achievable rate-region taking into account CSI estimation errors are further characterized. Simulation results are provided to show the effectiveness of our proposed beamforming and channel estimation methods.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Deep Odometry Systems on Edge with EKF-LoRa Backend for Real-Time Positioning in Adverse Environment
Authors:
Zhuangzhuang Dai,
Muhamad Risqi U. Saputra,
Chris Xiaoxuan Lu,
Andrew Markham,
Niki Trigoni
Abstract:
Ubiquitous positioning for pedestrian in adverse environment has served a long standing challenge. Despite dramatic progress made by Deep Learning, multi-sensor deep odometry systems yet pose a high computational cost and suffer from cumulative drifting errors over time. Thanks to the increasing computational power of edge devices, we propose a novel ubiquitous positioning solution by integrating…
▽ More
Ubiquitous positioning for pedestrian in adverse environment has served a long standing challenge. Despite dramatic progress made by Deep Learning, multi-sensor deep odometry systems yet pose a high computational cost and suffer from cumulative drifting errors over time. Thanks to the increasing computational power of edge devices, we propose a novel ubiquitous positioning solution by integrating state-of-the-art deep odometry models on edge with an EKF (Extended Kalman Filter)-LoRa backend. We carefully compare and select three sensor modalities, i.e., an Inertial Measurement Unit (IMU), a millimetre-wave (mmWave) radar, and a thermal infrared camera, and realise their deep odometry inference engines which runs in real-time. A pipeline of deploying deep odometry considering accuracy, complexity, and edge platform is proposed. We design a LoRa link for positional data backhaul and projecting aggregated positions of deep odometry into the global frame. We find that a simple EKF based fusion module is sufficient for generic positioning calibration with over 34% accuracy gains against any standalone deep odometry system. Extensive tests in different environments validate the efficiency and efficacy of our proposed positioning system.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
DeepAoANet: Learning Angle of Arrival from Software Defined Radios with Deep Neural Networks
Authors:
Zhuangzhuang Dai,
Yuhang He,
Tran Vu,
Niki Trigoni,
Andrew Markham
Abstract:
Direction finding and positioning systems based on RF signals are significantly impacted by multipath propagation, particularly in indoor environments. Existing algorithms (e.g MUSIC) perform poorly in resolving Angle of Arrival (AoA) in the presence of multipath or when operating in a weak signal regime. We note that digitally sampled RF frontends allow for the easy analysis of signals, and their…
▽ More
Direction finding and positioning systems based on RF signals are significantly impacted by multipath propagation, particularly in indoor environments. Existing algorithms (e.g MUSIC) perform poorly in resolving Angle of Arrival (AoA) in the presence of multipath or when operating in a weak signal regime. We note that digitally sampled RF frontends allow for the easy analysis of signals, and their delayed components. Low-cost Software-Defined Radio (SDR) modules enable Channel State Information (CSI) extraction across a wide spectrum, motivating the design of an enhanced Angle-of-Arrival (AoA) solution. We propose a Deep Learning approach to deriving AoA from a single snapshot of the SDR multichannel data. We compare and contrast deep-learning based angle classification and regression models, to estimate up to two AoAs accurately. We have implemented the inference engines on different platforms to extract AoAs in real-time, demonstrating the computational tractability of our approach. To demonstrate the utility of our approach we have collected IQ (In-phase and Quadrature components) samples from a four-element Universal Linear Array (ULA) in various Light-of-Sight (LOS) and Non-Line-of-Sight (NLOS) environments, and published the dataset. Our proposed method demonstrates excellent reliability in determining number of im**ing signals and realized mean absolute AoA errors less than $2^{\circ}$.
△ Less
Submitted 9 December, 2021; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Cell-Free Symbiotic Radio: Channel Estimation Method and Achievable Rate Analysis
Authors:
Zhuoyin Dai,
Ruoguang Li,
**gran Xu,
Yong Zeng,
Shi **
Abstract:
Cell-free massive MIMO and symbiotic radio are promising beyond 5G (B5G) networking architecture and transmission technology, respectively. This paper studies cell-free symbiotic radio systems, where a number of distributed access points (APs) cooperatively send primary information to a receiver, and simultaneously support the backscattering communication of the secondary backscatter device (BD).…
▽ More
Cell-free massive MIMO and symbiotic radio are promising beyond 5G (B5G) networking architecture and transmission technology, respectively. This paper studies cell-free symbiotic radio systems, where a number of distributed access points (APs) cooperatively send primary information to a receiver, and simultaneously support the backscattering communication of the secondary backscatter device (BD). An efficient two-phase uplink-training based channel estimation method is proposed to estimate the direct-link channel and cascaded backscatter channel, and the achievable primary and secondary communication rates taking into account the channel estimation errors are derived. Furthermore, to achieve a flexible trade-off between the primary and secondary communication rates, we propose a low-complexity weighted-maximal-ratio transmission (weighted-MRT) beamforming scheme, which only requires local processing at each AP without having to exchange the estimated channel state information. Simulation results are provided to show the impact of the channel training lengths on the performance of the cell-free symbiotic radio systems.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Enabling Full Mutualism for Symbiotic Radio with Massive Backscatter Devices
Authors:
**gran Xu,
Zhuoyin Dai,
Yong Zeng
Abstract:
Symbiotic radio is a promising technology to achieve spectrum- and energy-efficient wireless communications, where the secondary backscatter device (BD) leverages not only the spectrum but also the power of the primary signals for its own information transmission. In return, the primary communication link can be enhanced by the additional multipaths created by the BD. This is known as the mutualis…
▽ More
Symbiotic radio is a promising technology to achieve spectrum- and energy-efficient wireless communications, where the secondary backscatter device (BD) leverages not only the spectrum but also the power of the primary signals for its own information transmission. In return, the primary communication link can be enhanced by the additional multipaths created by the BD. This is known as the mutualism relationship of symbiotic radio. However, as the backscattering link is much weaker than the direct link due to double attenuations, the improvement of the primary link brought by one single BD is extremely limited. To address this issue and enable full mutualism of symbiotic radio, in this paper, we study symbiotic radio with massive number of BDs. For symbiotic radio multiple access channel (MAC) with successive interference cancellation (SIC), we first derive the achievable rate of both the primary and secondary communications, based on which a receive beamforming optimization problem is formulated and solved. Furthermore, considering the asymptotic regime of massive number of BDs, closed-form expressions are derived for the primary and the secondary communication rates, both of which are shown to be increasing functions of the number of BDs. This thus demonstrates that the mutualism relationship of symbiotic radio can be fully exploited with massive BD access.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Deep Attention-based Representation Learning for Heart Sound Classification
Authors:
Zhao Ren,
Kun Qian,
Fengquan Dong,
Zhenyu Dai,
Yoshiharu Yamamoto,
Björn W. Schuller
Abstract:
Cardiovascular diseases are the leading cause of deaths and severely threaten human health in daily life. On the one hand, there have been dramatically increasing demands from both the clinical practice and the smart home application for monitoring the heart status of subjects suffering from chronic cardiovascular diseases. On the other hand, experienced physicians who can perform an efficient aus…
▽ More
Cardiovascular diseases are the leading cause of deaths and severely threaten human health in daily life. On the one hand, there have been dramatically increasing demands from both the clinical practice and the smart home application for monitoring the heart status of subjects suffering from chronic cardiovascular diseases. On the other hand, experienced physicians who can perform an efficient auscultation are still lacking in terms of number. Automatic heart sound classification leveraging the power of advanced signal processing and machine learning technologies has shown encouraging results. Nevertheless, human hand-crafted features are expensive and time-consuming. To this end, we propose a novel deep representation learning method with an attention mechanism for heart sound classification. In this paradigm, high-level representations are learnt automatically from the recorded heart sound data. Particularly, a global attention pooling layer improves the performance of the learnt representations by estimating the contribution of each unit in feature maps. The Heart Sounds Shenzhen (HSS) corpus (170 subjects involved) is used to validate the proposed method. Experimental results validate that, our approach can achieve an unweighted average recall of 51.2% for classifying three categories of heart sounds, i. e., normal, mild, and moderate/severe annotated by cardiologists with the help of Echocardiography.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Line Outage Identification Based on AC Power Flow and Synchronized Measurements
Authors:
Zhen Dai,
Joseph Euzebe Tate
Abstract:
This paper proposes a method of identifying single line outages in power systems based on phasor measurement unit (PMU) measurements and ac power flow models. In addition to the main identification algorithm, a rejection filter is introduced so that the preliminary identified results can be further processed and categorized into three types: correctly identified, misidentified and inconclusive (in…
▽ More
This paper proposes a method of identifying single line outages in power systems based on phasor measurement unit (PMU) measurements and ac power flow models. In addition to the main identification algorithm, a rejection filter is introduced so that the preliminary identified results can be further processed and categorized into three types: correctly identified, misidentified and inconclusive (including correct-filtered and misidentified-filtered). The methods are systematically tested using test systems of various sizes for various PMU placements, and the results show that the proposed identification algorithm has a high identification accuracy and the proposed rejection filter is able to reduce the misidentified rate without significantly increasing the number of inconclusive cases.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Accurate Prostate Cancer Detection and Segmentation on Biparametric MRI using Non-local Mask R-CNN with Histopathological Ground Truth
Authors:
Zhenzhen Dai,
Ivan Jambor,
Pekka Taimen,
Milan Pantelic,
Mohamed Elshaikh,
Craig Rogers,
Otto Ettala,
Peter Boström,
Hannu Aronen,
Harri Merisaari,
Ning Wen
Abstract:
Purpose: We aimed to develop deep machine learning (DL) models to improve the detection and segmentation of intraprostatic lesions (IL) on bp-MRI by using whole amount prostatectomy specimen-based delineations. We also aimed to investigate whether transfer learning and self-training would improve results with small amount labelled data.
Methods: 158 patients had suspicious lesions delineated on…
▽ More
Purpose: We aimed to develop deep machine learning (DL) models to improve the detection and segmentation of intraprostatic lesions (IL) on bp-MRI by using whole amount prostatectomy specimen-based delineations. We also aimed to investigate whether transfer learning and self-training would improve results with small amount labelled data.
Methods: 158 patients had suspicious lesions delineated on MRI based on bp-MRI, 64 patients had ILs delineated on MRI based on whole mount prostatectomy specimen sections, 40 patients were unlabelled. A non-local Mask R-CNN was proposed to improve the segmentation accuracy. Transfer learning was investigated by fine-tuning a model trained using MRI-based delineations with prostatectomy-based delineations. Two label selection strategies were investigated in self-training. The performance of models was evaluated by 3D detection rate, dice similarity coefficient (DSC), 95 percentile Hausdrauff (95 HD, mm) and true positive ratio (TPR).
Results: With prostatectomy-based delineations, the non-local Mask R-CNN with fine-tuning and self-training significantly improved all evaluation metrics. For the model with the highest detection rate and DSC, 80.5% (33/41) of lesions in all Gleason Grade Groups (GGG) were detected with DSC of 0.548[0.165], 95 HD of 5.72[3.17] and TPR of 0.613[0.193]. Among them, 94.7% (18/19) of lesions with GGG > 2 were detected with DSC of 0.604[0.135], 95 HD of 6.26[3.44] and TPR of 0.580[0.190].
Conclusion: DL models can achieve high prostate cancer detection and segmentation accuracy on bp-MRI based on annotations from histologic images. To further improve the performance, more data with annotations of both MRI and whole amount prostatectomy specimens are required.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
MeshMVS: Multi-View Stereo Guided Mesh Reconstruction
Authors:
Rakesh Shrestha,
Zhiwen Fan,
Qingkun Su,
Zuozhuo Dai,
Siyu Zhu,
** Tan
Abstract:
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry…
▽ More
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry information explicitly by using the features from intermediate depth representations of multi-view stereo and regularizing the 3D shapes against these depth images. First, our system predicts a coarse 3D volume from the color images by probabilistically merging voxel occupancy grids from the prediction of individual views. Then the depth images from multi-view stereo along with the rendered depth images of the coarse shape are used as a contrastive input whose features guide the refinement of the coarse shape through a series of graph convolution networks. Notably, we achieve superior results than state-of-the-art multi-view shape generation methods with 34% decrease in Chamfer distance to ground truth and 14% increase in F1-score on ShapeNet dataset.Our source code is available at https://git.io/Jmalg
△ Less
Submitted 11 April, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Weak Radio Frequency Signal Detection Based on Piezo-Opto-Electro-Mechanical System: Architecture Design and Sensitivity Prediction
Authors:
Shanchi Wu,
Chen Gong,
Chengjie Zuo,
Shangbin Li,
Junyu Zhang,
Zhongbin Dai,
Kai Yang,
Ming Zhao,
Rui Ni,
Zhengyuan Xu,
**kang Zhu
Abstract:
We propose a novel radio-frequency (RF) receiving architecture based on micro-electro-mechanical system (MEMS) and optical coherent detection module. The architecture converts the received electrical signal into mechanical vibration through the piezoelectric effect and adopts an optical detection module to detect the mechanical vibration. We analyze the response function of piezoelectric film to a…
▽ More
We propose a novel radio-frequency (RF) receiving architecture based on micro-electro-mechanical system (MEMS) and optical coherent detection module. The architecture converts the received electrical signal into mechanical vibration through the piezoelectric effect and adopts an optical detection module to detect the mechanical vibration. We analyze the response function of piezoelectric film to an RF signal, the noise limited sensitivity of the optical detection module and the system transfer function in the frequency domain. Finally, we adopt simple on-off keying (OOK) modulation with bandwidth 1 kHz and carrier frequency 1 GHz, to numerically evaluate the detection sensitivity. The result shows that, considering the main noise sources in wireless channel and circuits, the signal detection sensitivity can reach around -160 dBm with a 50 $Ω$ impedance. Such sensitivity significantly outperforms that of the currently deployed Long Term Evolution (LTE) system, when normalizing the transmission bandwidth also to 1 kHz.
△ Less
Submitted 8 October, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Improvement of Multiparametric MR Image Segmentation by Augmenting the Data with Generative Adversarial Networks for Glioma Patients
Authors:
Eric Carver,
Zhenzhen Dai,
Evan Liang,
James Snyder,
Ning Wen
Abstract:
Every year thousands of patients are diagnosed with a glioma, a type of malignant brain tumor. Physicians use MR images as a key tool in the diagnosis and treatment of these patients. Neural networks show great potential to aid physicians in the medical image analysis. This study investigates the use of varying amounts of synthetic brain T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weigh…
▽ More
Every year thousands of patients are diagnosed with a glioma, a type of malignant brain tumor. Physicians use MR images as a key tool in the diagnosis and treatment of these patients. Neural networks show great potential to aid physicians in the medical image analysis. This study investigates the use of varying amounts of synthetic brain T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) MR images created by a generative adversarial network to overcome the lack of annotated medical image data in training separate 2D U-Nets to segment enhancing tumor, peritumoral edema, and necrosis (non-enhancing tumor core) regions on gliomas. These synthetic MR images were assessed quantitively (SSIM=0.79) and qualitatively by a physician who found that the synthetic images seem stronger for delineation of structural boundaries but struggle more when gradient is significant, (e.g. edema signal in T2 modalities). Multiple 2D U-Nets were trained with original BraTS data and differing subsets of a quarter, half, three-quarters, and all synthetic MR images. There was not an obvious correlation between the improvement of values of the metrics in separate validation dataset for each structure and amount of synthetic data added, there is a strong correlation between the amount of synthetic data added and the number of best overall validation metrics. In summary, this study showed ability to generate high quality synthetic Flair, T2, T1, and T1CE MR images using the GAN. Using the synthetic MR images showed encouraging results to improve the U-Net segmentation performance which has the potential to address the scarcity of readily available medical images.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions
Authors:
Georgios Exarchakis,
Jörg Bornschein,
Abdul-Saboor Sheikh,
Zhenwen Dai,
Marc Henniges,
Jakob Drefs,
Jörg Lücke
Abstract:
ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are e…
▽ More
ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.