Search | arXiv e-print repository

Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and planar wavefront model. Considering the hybrid digital-analog structure inherent to modular arrays, we formulate a joint analog-digital beamforming design problem based on the communication spectral efficiency and sensing signal-to-clutter-plus-noise ratio (SCNR). By exploring the structural similarity of the communication and sensing channels, it is proved that the optimal transmit covariance matrix lies in the subspace spanned by the subarray response vectors, yielding a closed-form solution for the optimal analog beamformer. Consequently, the joint design problem is transformed into a low-dimensional rank-constrained digital beamformer optimization. We first propose a manifold optimization method that directly optimizes the digital beamformer on the rank-constrained Stiefel manifold. Additionally, we develop an semidefinite relaxation (SDR)-based approach that relaxes the rank constraint and employ the randomization technique to obtain a near-optimal solution. Simulation results demonstrate the effectiveness of the proposed modular XL-MIMO ISAC framework and algorithms. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.19338 [pdf, other]

Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, kee** imaging dose minimum, and maintaining treatment veracity. △ Less

Submitted 1 April, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures and tables

arXiv:2405.09022 [pdf, other]

doi 10.1109/JIOT.2024.3413687

Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems

Authors: Chunwei Meng, Zhiqing Wei, Dingyou Ma, Wanli Ni, Liyan Su, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi… ▽ More Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.07472 [pdf, other]

doi 10.1109/LWC.2024.3406577

Cramer-Rao Bounds for Near-Field Sensing: A Generic Modular Architecture

Authors: Chunwei Meng, Dingyou Ma, Xu Chen, Zhiyong Feng, Yuanwei Liu

Abstract: A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are deriv… ▽ More A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are derived based on the hybrid spherical and planar wave model (HSPM). Simulation results validate the accuracy of the derived closed-form CRBs and demonstrate that: i) The HSPM with varying angles of arrival (AoAs) between subarrays can reduce the CRB for range estimation compared to the traditional HSPM with shared AoA; and ii) The proposed generic modular architecture with subarrays positioned closer to the edges can significantly reduce the CRBs compared to the traditional modular architecture with uniform subarray layout, when the array aperture is fixed. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2402.15725 [pdf, other]

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Authors: Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Abstract: Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various s… ▽ More Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks. Specifically, we propose a novel pre-training method, text-guided HuBERT, or T-HuBERT, which performs self-supervised learning over speech to derive phoneme-like discrete representations. And these phoneme-like pseudo-label sequences are firstly derived from speech via the generative adversarial networks (GAN) to be statistically similar to those from additional unpaired textual data. In this way, we build a bridge between unpaired speech and text in an unsupervised manner. Extensive experiments demonstrate the significant superiority of our proposed method over various strong baselines, which achieves up to 15.3% relative Word Error Rate (WER) reduction on the LibriSpeech dataset. △ Less

Submitted 28 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: 5 pages, 1 figures,5 tables, submit to IEEE Signal Processing Letters(SPL)

arXiv:2311.10416 [pdf, other]

Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems

Authors: Xinyu Xiao, Zhennan Zhou, Bin Dong, Dingjiong Ma, Li Zhou, Jie Sun

Abstract: Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model… ▽ More Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model. However, these models are often limited to a specific symbol rate and channel number, necessitating retraining for different settings, their performance declines significantly under high-speed and high-power conditions. We introduce Meta-DSP, a novel data-driven nonlinear compensation model based on meta-learning that processes multi-modal data across diverse transmission rates, power levels, and channel numbers. This not only enhances signal quality but also substantially reduces the complexity of the nonlinear processing algorithm. Our model delivers a 0.7 dB increase in the Q-factor over Electronic Dispersion Compensation (EDC), and compared to DBP, it curtails computational complexity by a factor of ten while retaining comparable performance. From the perspective of the entire signal processing system, the core idea of Meta-DSP can be employed in any segment of the overall communication system to enhance the model's scalability and generalization performance. Our research substantiates Meta-DSP's proficiency in addressing the critical parameters defining optical communication networks. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2309.09627 [pdf, other]

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

Authors: Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conv… ▽ More We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score. △ Less

Submitted 20 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024. Demo page: lesterphillip.github.io/icassp2024_el_sie

arXiv:2308.08313 [pdf, other]

ECPC-IDS:A benchmark endometrail cancer PET/CT image dataset for evaluation of semantic segmentation and detection of hypermetabolic regions

Authors: Dechao Tang, Tianming Du, Deguo Ma, Zhiyu Ma, Hongzan Sun, Marcin Grzegorzek, Huiyan Jiang, Chen Li

Abstract: Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving… ▽ More Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving the accuracy and objectivity of diagnosis, as well as reducing the workload of doctors. However, the absence of publicly available endometrial cancer image datasets restricts the application of computer-assisted diagnostic techniques.In this paper, a publicly available Endometrial Cancer PET/CT Image Dataset for Evaluation of Semantic Segmentation and Detection of Hypermetabolic Regions (ECPC-IDS) are published. Specifically, the segmentation section includes PET and CT images, with a total of 7159 images in multiple formats. In order to prove the effectiveness of segmentation methods on ECPC-IDS, five classical deep learning semantic segmentation methods are selected to test the image segmentation task. The object detection section also includes PET and CT images, with a total of 3579 images and XML files with annotation information. Six deep learning methods are selected for experiments on the detection task.This study conduct extensive experiments using deep learning-based semantic segmentation and object detection methods to demonstrate the differences between various methods on ECPC-IDS. As far as we know, this is the first publicly available dataset of endometrial cancer with a large number of multiple images, including a large amount of information required for image and target detection. ECPC-IDS can aid researchers in exploring new algorithms to enhance computer-assisted technology, benefiting both clinical doctors and patients greatly. △ Less

Submitted 11 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 14 pages,6 figures

arXiv:2308.08172 [pdf, other]

AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation

Authors: Zhiyu Ma, Chen Li, Tianming Du, Le Zhang, Dechao Tang, Deguo Ma, Shanchuan Huang, Yan Liu, Yihao Sun, Zhihao Chen, ** Yuan, Qianqing Nie, Marcin Grzegorzek, Hongzan Sun

Abstract: Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentati… ▽ More Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATTCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks. Results: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATTCT-IDS reveals three adipose distributions in the subject population. Conclusion: AATTCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: \url{https://figshare.com/articles/dataset/AATTCT-IDS/23807256}. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 17 pages, 7 figures

arXiv:2308.05489 [pdf, other]

doi 10.1109/JSTARS.2022.3218369

SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network

Authors: Chenwei Wang, Jifang Pei, Xiaoyu Liu, Yulin Huang, Deqing Mao, Yin Zhang, Jianyu Yang

Abstract: Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR image… ▽ More Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR images' azimuths. This network mainly contains three parts: generator, discriminator, and predictor. Through the proposed specific network structure, the generator can extract and fuse the optimal target features from two input SAR target images to generate SAR target image. Then a similarity discriminator and an azimuth predictor are designed. The similarity discriminator can differentiate the generated SAR target images from the real SAR images to ensure the accuracy of the generated, while the azimuth predictor measures the difference of azimuth between the generated and the desired to ensure the azimuth controllability of the generated. Therefore, the proposed network can generate precise SAR images, and their azimuths can be controlled well by the inputs of the deep network, which can generate the target images in different azimuths to solve the small sample problem to some degree and benefit the researches of SAR images. Extensive experimental results show the superiority of the proposed method in azimuth controllability and accuracy of SAR target image generation. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2305.15636 [pdf]

Channelized analog microwave short-time Fourier transform in the optical domain with improved measurement performance

Authors: Xiaowei Li, Taixia Shi, Dong Ma, Yang Chen

Abstract: In this article, analog microwave short-time Fourier transform (STFT) with improved measurement performance is implemented in the optical domain by employing stimulated Brillouin scattering (SBS) and channelization. By jointly using three optical frequency combs and filter- and SBS-based frequency-to-time map** (FTTM), the time-frequency information of the signal under test (SUT) in different fr… ▽ More In this article, analog microwave short-time Fourier transform (STFT) with improved measurement performance is implemented in the optical domain by employing stimulated Brillouin scattering (SBS) and channelization. By jointly using three optical frequency combs and filter- and SBS-based frequency-to-time map** (FTTM), the time-frequency information of the signal under test (SUT) in different frequency intervals is measured in different channels. Then, by using the channel label introduced through subcarriers after photodetection, the obtained low-speed electrical pulses in different channels mixed in the time domain are distinguished and the time-frequency information of the SUT in different channels is respectively obtained and spliced to implement the STFT. For the first time, channelization measurement technology is introduced in the STFT system based on frequency swee** and FTTM, greatly reducing the frequency-sweep range of the required frequency-sweep signal to the analysis bandwidth divided by the number of channels. In addition, channelization can also be used to improve the time and frequency resolution of the STFT system. A proof-of-concept experiment is performed. 12-GHz and 10-GHz analysis bandwidth is implemented by using a 4-GHz frequency-sweep signal and 3 channels and a 2-GHz frequency-sweep signal and 5 channels. Measurement performance improvement is also demonstrated. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 18 pages, 9 figures, 1 table

arXiv:2301.00504 [pdf]

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Authors: Timothy T. Yu, Da Ma, Jayden Cole, Myeong ** Ju, Mirza F. Beg, Marinko V. Sarunic

Abstract: Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subs… ▽ More Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture. △ Less

Submitted 1 January, 2023; originally announced January 2023.

arXiv:2212.00532 [pdf, other]

EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks

Authors: Liyu Shi, Xiaoyan Li, Weiming Hu, Haoyuan Chen, **g Chen, Zizhen Fan, Minghe Gao, Yujie **g, Guotao Lu, Deguo Ma, Zhiyu Ma, Qingtao Meng, Dechao Tang, Hongzan Sun, Marcin Grzegorzek, Shouliang Qi, Yueyang Teng, Chen Li

Abstract: Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when comp… ▽ More Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients. △ Less

Submitted 6 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.01079 [pdf, other]

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

Authors: Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda

Abstract: Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to o… ▽ More Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. Despite the imperfect synthetic data, we show the effectiveness of this on electrolaryngeal speech datasets, with improvements of 6.1% over the baseline that did not use imperfect synthetic speech. Results show how the intermediate fine-tuning stage focuses on learning the high-level inherent features of the imperfect synthetic data rather than the low-level features such as intelligibility. △ Less

Submitted 30 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to ICASSP 2023

arXiv:2210.10314 [pdf, other]

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda

Abstract: Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insuffici… ▽ More Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2208.14635 [pdf, other]

Segmentation-guided Domain Adaptation and Data Harmonization of Multi-device Retinal Optical Coherence Tomography using Cycle-Consistent Generative Adversarial Networks

Authors: Shuo Chen, Da Ma, Sieun Lee, Timothy T. L. Yu, Gavin Xu, Donghuan Lu, Karteek Popuri, Myeong ** Ju, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of reti… ▽ More Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of retinal layer thicknesses and the retinal layer deformation due to the presence of the fluid are highly correlated with multiple epidemic eye diseases like Diabetic Retinopathy(DR) and Age-related Macular Degeneration (AMD). However, these images are acquired from different devices, which have different intensity distribution, or in other words, belong to different imaging domains. This paper proposes a segmentation-guided domain-adaptation method to adapt images from multiple devices into single image domain, where the state-of-art pre-trained segmentation model is available. It avoids the time consumption of manual labelling for the upcoming new dataset and the re-training of the existing network. The semantic consistency and global feature consistency of the network will minimize the hallucination effect that many researchers reported regarding Cycle-Consistent Generative Adversarial Networks(CycleGAN) architecture. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 16 pages, 10 figures

arXiv:2208.09143 [pdf]

Photonics-enabled wavelet-like transform via nonlinear optical frequency swee** and stimulated Brillouin scattering-based frequency-to-time map**

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: A photonics-enabled wavelet-like transform system, characterized by multi-resolution time-frequency analysis, is proposed based on a typical stimulated Brillouin scattering (SBS) pump-probe setup using an optical nonlinear frequency-sweep signal. In the pump path, a continuous-wave optical signal is injected into an SBS medium to generate an SBS gain. In the probe path, a periodic nonlinear freque… ▽ More A photonics-enabled wavelet-like transform system, characterized by multi-resolution time-frequency analysis, is proposed based on a typical stimulated Brillouin scattering (SBS) pump-probe setup using an optical nonlinear frequency-sweep signal. In the pump path, a continuous-wave optical signal is injected into an SBS medium to generate an SBS gain. In the probe path, a periodic nonlinear frequency-sweep optical signal with a time-varying chirp rate is generated, which is then modulated at a Mach-Zehnder modulator (MZM) by the electrical signal under test (SUT). The optical signal from the MZM is selectively amplified by the SBS gain and converted back to the electrical domain using a low-speed photodetector, implementing the periodic SBS-based frequency-to-time map** (FTTM). The frequency-domain information corresponding to different periods is mapped to the time domain via the FTTM in the form of low-speed electrical pulses, which is then spliced to analyze the time-frequency relationship of the SUT in real-time. The time-varying chirp rate in each sweep period makes the signals with different frequencies have different frequency resolutions in the FTTM process, which is very similar to the characteristics of the wavelet transform, so we call it wavelet-like transform. An experiment is carried out. Multi-resolution time-frequency analysis of a variety of RF signals is carried out in a 4-GHz bandwidth limited only by the equipment. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: 9 pages, 6 figures

arXiv:2208.04871 [pdf]

Breaking the accuracy and resolution limitation of filter- and frequency-to-time map**-based time and frequency acquisition methods by broadening the filter bandwidth

Authors: Pengcheng Zuo, Dong Ma, Xiaowei Li, Yang Chen

Abstract: In this paper, the filter- and frequency-to-time map** (FTTM)-based photonics-assisted time and frequency acquisition methods are comprehensively analyzed and the accuracy and resolution limitation in the fast sweep scenario is broken by broadening the filter bandwidth. It is found that when the sweep speed is very fast, the width of the generated pulse via FTTM is mainly determined by the impul… ▽ More In this paper, the filter- and frequency-to-time map** (FTTM)-based photonics-assisted time and frequency acquisition methods are comprehensively analyzed and the accuracy and resolution limitation in the fast sweep scenario is broken by broadening the filter bandwidth. It is found that when the sweep speed is very fast, the width of the generated pulse via FTTM is mainly determined by the impulse response of the filter. In this case, appropriately increasing the filter bandwidth can significantly reduce the pulse width, so as to improve the measurement accuracy and resolution. FTTM-based short-time Fourier transform (STFT) and microwave frequency measurement using the stimulated Brillouin scattering (SBS) effect is demonstrated by comparing the results with and without SBS gain spectrum broadening and the improvement of measurement accuracy and frequency resolution is well confirmed. The frequency measurement accuracy of the system is improved by around 25 times compared with the former work using a similar sweep speed, while the frequency resolution of the STFT is also much improved compared with our former results. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 18 pages, 11 figures

arXiv:2207.01175 [pdf]

doi 10.1109/LPT.2022.3225547

Photonics-based short-time Fourier transform without high-frequency electronic devices and equipment

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: A photonics-based short-time Fourier transform (STFT) system is proposed and experimentally demonstrated based on stimulated Brillouin scattering (SBS) without using high-frequency electronic devices and equipment. The wavelength of a distributed feedback laser diode is periodically swept by using a low-speed periodic sawtooth/triangular driving current. The periodic frequency-sweep optical signal… ▽ More A photonics-based short-time Fourier transform (STFT) system is proposed and experimentally demonstrated based on stimulated Brillouin scattering (SBS) without using high-frequency electronic devices and equipment. The wavelength of a distributed feedback laser diode is periodically swept by using a low-speed periodic sawtooth/triangular driving current. The periodic frequency-sweep optical signal is modulated by the signal under test (SUT) and then injected into a section of SBS medium. The optical signal from another laser diode as the pump wave is reversely injected into the SBS medium. After simply detecting the forward transmission optical signals in a low-speed photodetector, the STFT of the SUT can be implemented. The system is characterized by the absence of any high-frequency electronic devices or equipment. An experiment is performed. The STFT of a variety of RF signals is carried out in a 4-GHz bandwidth. The dynamic frequency resolution is demonstrated to be around 60 MHz. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: 8 pages, 5 figures

arXiv:2204.04579 [pdf, other]

doi 10.1121/10.0015792

Inferring Pitch from Coarse Spectral Features

Authors: Danni Ma, Neville Ryant, Mark Liberman

Abstract: Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis pre… ▽ More Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis predicts; and in fact, quasi-periodic signals from a single voice source are often incompletely characterized by an attempt to define a single time-varying F0. In this paper, we find strong support for the existence of covariates for pitch in aspects of relatively coarse spectra, in which an overtone series is not available. Thus linear regression can predict the pitch of simple vocalizations, produced by an articulatory synthesizer or by human, from single frames of such coarse spectra. Across speakers, and in more complex vocalizations, our experiments indicate that the covariates are not quite so simple, though apparently still available for more sophisticated modeling. On this basis, we propose that the field needs a better way of thinking about speech pitch, just as celestial mechanics requires us to go beyond Newton's point mass approximations to heavenly bodies. △ Less

Submitted 26 August, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

arXiv:2203.05707 [pdf]

doi 10.3233/JAD-220021

Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer's Disease

Authors: Ghazal Mirabnahrazam, Da Ma, Sieun Lee, Karteek Popuri, Hyunwoo Lee, Jiguo Cao, Lei Wang, James E Galvin, Mirza Faisal Beg, the Alzheimer's Disease Neuroimaging Initiative

Abstract: Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We use… ▽ More Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We used feature selection and ensemble learning classifier to develop an image/genotype-based DAT score that represents a subject's likelihood of develo** DAT in the future. Three feature types were used: MRI only, genetic only, and combined multimodal data. We used a novel data stratification method to better represent different stages of DAT. Using a pre-defined 0.5 threshold on DAT scores, we predicted whether or not a subject would develop DAT in the future. Results: Our results on Alzheimer's Disease Neuroimaging Initiative (ADNI) database showed that dementia scores using genetic data could better predict future DAT progression for currently normal control subjects (Accuracy=0.857) compared to MRI (Accuracy=0.143), while MRI can better characterize subjects with stable mild cognitive impairment (Accuracy=0.614) compared to genetics (Accuracy=0.356). Combining MRI and genetic data showed improved classification performance in the remaining stratified groups. Conclusion: MRI and genetic data can contribute to DAT prediction in different ways. MRI data reflects anatomical changes in the brain, while genetic data can detect the risk of DAT progression prior to the symptomatic onset. Combining information from multimodal data in the right way can improve prediction performance. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Journal ref: J Alzheimers Dis 1 Jan. (2022) 1-21

arXiv:2202.09954 [pdf, other]

doi 10.1109/TCOMM.2022.3201931

Theoretical Analysis of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of… ▽ More Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques, and also drive their cost in terms of computational complexity. To achieve this goal, we first analyze the encoding performance of a DNN-based transmitter and compare it to a traditional one. And then, we theoretically analyze the performance of DNN-based estimator and compare it with traditional estimators. Third, we investigate and validate how information is flown in a DNN-based communication system under the information theoretic concepts. Our analysis develops a concise way to open the "black box" of DNNs in physical layer communication, which can be applied to support the design of DNN-based intelligent communication techniques and help to provide explainable performance assessment. △ Less

Submitted 26 August, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: 15 pages, 13 figures, has been accepted for publication in IEEE Transactions on Communications. arXiv admin note: substantial text overlap with arXiv:2106.01124

Journal ref: IEEE Transactions on Communications, 2022

arXiv:2201.11285 [pdf]

doi 10.1364/OL.455019

Time-varying microwave photonic filter for arbitrary waveform signal-to-noise ratio improvement

Authors: Dong Ma, Yang Chen

Abstract: A time-varying microwave photonic filter (TV-MPF) based on stimulated Brillouin scattering (SBS) is proposed and utilized to suppress the in-band noise of broadband arbitrary microwave waveforms, thereby improving the signal-to-noise ratio (SNR). The filter-controlling signal is designed according to the signal to be filtered and drives the TV-MPF so that the passband of the filter is always align… ▽ More A time-varying microwave photonic filter (TV-MPF) based on stimulated Brillouin scattering (SBS) is proposed and utilized to suppress the in-band noise of broadband arbitrary microwave waveforms, thereby improving the signal-to-noise ratio (SNR). The filter-controlling signal is designed according to the signal to be filtered and drives the TV-MPF so that the passband of the filter is always aligned with the frequencies of the signal to be filtered. By continuously tracking the signal spectral component, the TV-MPF only retains the spectral components of the signal and filters out the noise other than the spectral component of the signal at the current time, so as to improve the in-band SNR of the signal to be filtered. An experiment is performed. A variety of signals with different formats and in-band SNRs are used to test the noise suppression capability of the TV-MPF, and the waveform mean-square error is calculated to quantify the improvement of the signal, demonstrating the excellent adaptability of the proposed TV-MPF to different kinds of signals. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 8 pages, 5 figures

arXiv:2201.08741 [pdf]

doi 10.3389/fnimg.2022.1023481

Improving Across-Dataset Brain Tissue Segmentation Using Transformer

Authors: Vishwanatha M. Rao, Zihan Wan, Soroush Arabshahi, David J. Ma, Pin-Yu Lee, Ye Tian, Xuzhe Zhang, Andrew F. Laine, Jia Guo

Abstract: Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentati… ▽ More Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentation tool. Despite the recent success of deep convolutional neural networks (CNNs) for brain tissue segmentation, many such solutions do not generalize well to new datasets, which is critical for a reliable solution. Transformers have demonstrated success in natural image segmentation and have recently been applied to 3D medical image segmentation tasks due to their ability to capture long-distance relationships in the input where the local receptive fields of CNNs struggle. This study introduces a novel CNN-Transformer hybrid architecture designed for brain tissue segmentation. We validate our model's performance across four multi-site T1w MRI datasets, covering different vendors, field strengths, scan parameters, time points, and neuropsychiatric conditions. In all situations, our model achieved the greatest generality and reliability. Out method is inherently robust and can serve as a valuable tool for brain-related T1w MRI studies. The code for the TABS network is available at: https://github.com/raovish6/TABS. △ Less

Submitted 31 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

ACM Class: I.4.6

arXiv:2201.07438 [pdf, other]

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Authors: Dabiao Ma, Yitong Zhang, Meng Li, Feng Ye

Abstract: Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model… ▽ More Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice. △ Less

Submitted 4 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.13438 [pdf]

doi 10.1109/JLT.2022.3174552

Short-time Fourier transform based on stimulated Brillouin scattering

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: In this paper, all-optical short-time Fourier transform (STFT) based on stimulated Brillouin scattering (SBS) is proposed and further used for real-time time-frequency analysis of different radio frequency (RF) signals. In the proposed all-optical STFT system, SBS not only provides a band-pass filter for implementing the window function in conjunction with a periodic frequency-sweep optical signal… ▽ More In this paper, all-optical short-time Fourier transform (STFT) based on stimulated Brillouin scattering (SBS) is proposed and further used for real-time time-frequency analysis of different radio frequency (RF) signals. In the proposed all-optical STFT system, SBS not only provides a band-pass filter for implementing the window function in conjunction with a periodic frequency-sweep optical signal but also obtains the frequency domain information in different time windows through the generated waveform via frequency-to-time map** (FTTM). A periodic frequency-sweep optical signal is generated and then modulated at a Mach-Zehnder modulator by the electrical signal under test (SUT). During different sweep periods, the fixed Brillouin gain functions as a bandpass filter to select a specific range of the spectrum, which is equivalent to applying a sliding window function to the corresponding section of the temporal signal with the help of the sweep optical signal. At the same time, after the optical signal is selectively amplified by the SBS gain and converted back to the electrical domain, SBS also implements the real-time FTTM, which can be utilized to obtain the frequency domain information corresponding to different time windows through the generated waveforms via the FTTM. The frequency domain information corresponding to different time windows is formed and spliced to analyze the time-frequency relationship of the SUT in real-time. An experiment is performed. STFTs of a variety of RF signals are carried out in a 12-GHz bandwidth limited only by the equipment, and the dynamic frequency resolution is better than 60 MHz. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: 18 pages, 9 figures, 1 table

arXiv:2111.02667 [pdf, other]

doi 10.1109/TAP.2022.3177533

Physics Assisted Deep Learning for Indoor Imaging using Phaseless Wi-Fi Measurements

Authors: Samruddhi Deshmukh, Amartansh Dubey, Dingfei Ma, Qifeng Chen, Ross Murch

Abstract: A physics assisted deep learning framework to perform accurate indoor imaging using phaseless Wi-Fi measurements is proposed. It is able to image objects that are large (compared to wavelength) and have high permittivity values, that existing radio frequency (RF) inverse scattering techniques find very challenging, making it suitable for indoor RF imaging. The technique utilizes a Rytov based inve… ▽ More A physics assisted deep learning framework to perform accurate indoor imaging using phaseless Wi-Fi measurements is proposed. It is able to image objects that are large (compared to wavelength) and have high permittivity values, that existing radio frequency (RF) inverse scattering techniques find very challenging, making it suitable for indoor RF imaging. The technique utilizes a Rytov based inverse scattering model with a deep learning framework. The inverse scattering model is based on an extended Rytov approximation (xRA) that pre-reconstructs the RF measurements. Under strong scattering conditions, this pre-reconstruction is related to the actual permittivity profile by a non-linear function, which is learned by a modified U-Net model to obtain the permittivity profile of the object. Thus, our proposed approach not only reconstructs the shape of objects, but also estimates their permittivity values accurately. We demonstrate its imaging performance using simulations as well as experimental results in an actual indoor environment using 2.4 GHz Wi-Fi phaseless measurements. For incident wavelength $λ_0$, the proposed framework can reconstruct objects with relative permittivity as high as 77 and electrical size as large as $40 λ$, where $λ=λ_0/\sqrt{77}$. This is in contrast to existing phaseless imaging techniques which cannot reconstruct permittivity values beyond 3 or 4. Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 14 pages, 10 figures. This work has been submitted to IEEE for possible publication

arXiv:2110.12857 [pdf]

doi 10.1364/AO.450247

Photonics-assisted microwave pulse detection and frequency measurement based on pulse replication and frequency-to-time map**

Authors: Pengcheng Zuo, Dong Ma, Qingbo Liu, Lizhong Jiang, Yang Chen

Abstract: A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time map** (FTTM) is utili… ▽ More A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time map** (FTTM) is utilized to measure the carrier frequency of the microwave pulse. A sweep optical carrier is generated and modulated by the unknown microwave pulse and a continuous-wave single-frequency reference, generating two different frequency sweep optical signals, which are combined and used as the probe wave to detect a fixed Brillouin gain spectrum. When the optical signal is detected in a photodetector, FTTM is realized and the frequency of the microwave pulse can be determined. An experiment is performed. For a fiber loop containing a 210-m fiber, pulse replication and FTTM of the pulses with a PRI of 20 μs and pulse width of 1.20, 1.00, 0.85, and 0.65 μs are realized. Under a certain sweep frequency chirp rate of 0.978 THz/s, the measurement errors are below {\pm}12 and {\pm}5 MHz by using one pair of pulses and multiple pairs of pulses, respectively. The influence of the sweep frequency chirp rate and pulse width on the measurement error has also been studied. To a certain extent, the faster the frequency sweep, the greater the frequency measurement error. For a specific sweep frequency chirp rate, the measurement error is almost unaffected by the pulse width to be measured. △ Less

Submitted 25 September, 2021; originally announced October 2021.

Comments: 13 pages, 8 figures

arXiv:2109.05627 [pdf, other]

Differential Diagnosis of Frontotemporal Dementia and Alzheimer's Disease using Generative Adversarial Network

Authors: Da Ma, Donghuan Lu, Karteek Popuri, Mirza Faisal Beg

Abstract: Frontotemporal dementia and Alzheimer's disease are two common forms of dementia and are easily misdiagnosed as each other due to their similar pattern of clinical symptoms. Differentiating between the two dementia types is crucial for determining disease-specific intervention and treatment. Recent development of Deep-learning-based approaches in the field of medical image computing are delivering… ▽ More Frontotemporal dementia and Alzheimer's disease are two common forms of dementia and are easily misdiagnosed as each other due to their similar pattern of clinical symptoms. Differentiating between the two dementia types is crucial for determining disease-specific intervention and treatment. Recent development of Deep-learning-based approaches in the field of medical image computing are delivering some of the best performance for many binary classification tasks, although its application in differential diagnosis, such as neuroimage-based differentiation for multiple types of dementia, has not been explored. In this study, a novel framework was proposed by using the Generative Adversarial Network technique to distinguish FTD, AD and normal control subjects, using volumetric features extracted at coarse-to-fine structural scales from Magnetic Resonance Imaging scans. Experiments of 10-folds cross-validation on 1,954 images achieved high accuracy. With the proposed framework, we have demonstrated that the combination of multi-scale structural features and synthetic data augmentation based on generative adversarial network can improve the performance of challenging tasks such as differentiating Dementia sub-types. △ Less

Submitted 29 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

arXiv:2109.03904 [pdf]

doi 10.1016/j.optcom.2022.128228

Time-frequency analysis of microwave signals based on stimulated Brillouin scattering

Authors: Dong Ma, Pengcheng Zuo, Yang Chen

Abstract: A novel photonic approach to the time-frequency analysis of microwave signals is proposed based on the stimulated Brillouin scattering (SBS)-assisted frequency-to-time map** (FTTM). Two types of time-frequency analysis links, namely parallel SBS link and time-division SBS link are proposed. The parallel SBS link can be utilized to perform real-time time-frequency analysis of microwave signal, wh… ▽ More A novel photonic approach to the time-frequency analysis of microwave signals is proposed based on the stimulated Brillouin scattering (SBS)-assisted frequency-to-time map** (FTTM). Two types of time-frequency analysis links, namely parallel SBS link and time-division SBS link are proposed. The parallel SBS link can be utilized to perform real-time time-frequency analysis of microwave signal, which provides a promising solution for real-time time-frequency analysis, especially when it is combined with the photonic integration technique. A simulation is made to verify its feasibility by analyzing signals in multiple formats. The time-division SBS link has a simpler and reconfigurable structure, which can realize an ultra-high-resolution time-frequency analysis for periodic signals using the time segmentation and accumulation technique. An experiment is performed for the time-division SBS link. The multi-dimensional reconfigurability of the system is experimentally studied. An analysis bandwidth of 3.9 GHz, an analysis frequency up to 20 GHz, and a frequency resolution of 15 MHz are demonstrated, respectively. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: 17 pages, 10 figures, 1 table

arXiv:2107.10701 [pdf, other]

Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech

Authors: Duo Ma, Nana Hou, Van Tung Pham, Haihua Xu, Eng Siong Chng

Abstract: To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component… ▽ More To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component here doesn't need to perform pre-training and fine-tuning processes separately. Through analysis, we found that the success of the proposed method lies in the following aspects. Firstly, multitask learning is essential, that is the SE network is not only learning to produce more Intelligent speech, it is also aimed to generate speech that is beneficial to recognition. Secondly, we also found speech phase preserved from noisy speech is critical for improving ASR performance. Thirdly, we propose a dual channel data augmentation training method to obtain further improvement.Specifically, we combine the clean and enhanced speech to train the whole system. We evaluate the proposed method on the RATS English data set, achieving a relative WER reduction of 4.6% with the joint training method, and up to a relative WER reduction of 11.2% with the proposed data augmentation method. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 7pages,3figures,Submitted to APSIPA2021

arXiv:2107.02345 [pdf, other]

Domain Adaptation via CycleGAN for Retina Segmentation in Optical Coherence Tomography

Authors: Ricky Chen, Timothy T. Yu, Gavin Xu, Da Ma, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. A… ▽ More With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. Additionally, researchers with developed tools benefit from the addition of open-sourced data, but are limited by the difference in domains. Herewith, we investigated the implementation of a Cycle-Consistent Generative Adversarial Networks (CycleGAN) for the domain adaptation of Optical Coherence Tomography (OCT) volumes. This study was done in collaboration with the Biomedical Optics Research Group and Functional & Anatomical Imaging & Shape Analysis Lab at Simon Fraser University. In this study, we investigated a learning-based approach of adapting the domain of a publicly available dataset, UK Biobank dataset (UKB). To evaluate the performance of domain adaptation, we utilized pre-existing retinal layer segmentation tools developed on a different set of RETOUCH OCT data. This study provides insight on state-of-the-art tools for domain adaptation compared to traditional processing techniques as well as a pipeline for adapting publicly available retinal data to the domains previously used by our collaborators. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 10 pages, 6 figures, 1 table

ACM Class: I.4.0

arXiv:2106.14671 [pdf, other]

doi 10.1109/JSTSP.2021.3118219

FRaC: FMCW-Based Joint Radar-Communications System via Index Modulation

Authors: Dingyou Ma, Nir Shlezinger, Tianyao Huang, Yimin Liu, Yonina C. Eldar

Abstract: Dual function radar communications (DFRC) systems are attractive technologies for autonomous vehicles, which utilize electromagnetic waves to constantly sense the environment while simultaneously communicating with neighbouring devices. An emerging approach to implement DFRC systems is to embed information in radar waveforms via index modulation (IM). Implementation of DFRC schemes in vehicular sy… ▽ More Dual function radar communications (DFRC) systems are attractive technologies for autonomous vehicles, which utilize electromagnetic waves to constantly sense the environment while simultaneously communicating with neighbouring devices. An emerging approach to implement DFRC systems is to embed information in radar waveforms via index modulation (IM). Implementation of DFRC schemes in vehicular systems gives rise to strict constraints in terms of cost, power efficiency, and hardware complexity. In this paper, we extend IM-based DFRC systems to utilize sparse arrays and frequency modulated continuous waveforms (FMCWs), which are popular in automotive radar for their simplicity and low hardware complexity. The proposed FMCW-based radar-communications system (FRaC) operates at reduced cost and complexity by transmitting with a reduced number of radio frequency modules, combined with narrowband FMCW signalling. This is achieved via array sparsification in transmission, formulating a virtual multiple-input multiple-output array by combining the signals in one coherent processing interval, in which the narrowband waveforms are transmitted in a randomized manner. Performance analysis and numerical results show that the proposed radar scheme achieves similar resolution performance compared with a wideband radar system operating with a large receive aperture, while requiring less hardware overhead. For the communications subsystem, FRaC achieves higher rates and improved error rates compared to dual-function signalling based on conventional phase modulation. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 16 pages

arXiv:2106.08147 [pdf, other]

doi 10.1117/12.2530688

Perceptually-inspired super-resolution of compressed videos

Authors: Di Ma, Mariana Afonso, Fan Zhang, David R. Bull

Abstract: Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks… ▽ More Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. These approaches are usually trained to minimise pixel-based losses such as Mean-Squared Error (MSE), despite the fact that this type of loss metric does not correlate well with subjective opinions. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial up-sampling of compressed video using a modified CNN model, which has been trained using a generative adversarial network (GAN) on compressed content with perceptual loss functions. The proposed method was integrated with HEVC HM 16.20, and has been evaluated on the JVET Common Test Conditions (UHD test sequences) using the Random Access configuration. The results show evident perceptual quality improvement over the original HM 16.20, with an average bitrate saving of 35.6% (Bjøntegaard Delta measurement) based on a perceptual quality metric, VMAF. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.01124 [pdf, other]

Opening the Black Box of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantit… ▽ More Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques and their cost in terms of computational complexity. We further investigate and also experimentally validate how information is flown in a DNN-based communication system under the information theoretic concepts. △ Less

Submitted 18 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 6 pages, 5 figures, to be presented in the IEEE Wireless Communications and Networking Conference (WCNC) 2022 Workshop on Machine Learning for Communications: Future Large Scale MIMO and AI-Native Air-Interface

arXiv:2105.11594 [pdf]

A Fast MR Fingerprinting Simulator for Direct Error Estimation and Sequence Optimization

Authors: Siyuan Hu, Stephen Jordan, Rasim Boyacioglu, Ignacio Rozada, Matthias Troyer, Mark Griswold, Debra McGivney, Dan Ma

Abstract: MR Fingerprinting is a novel quantitative MR technique that could simultaneously provide multiple tissue property maps. When optimizing MRF scans, modeling undersampling errors and field imperfections in cost functions will make the optimization results more practical and robust. However, this process is computationally expensive and impractical for sequence optimization algorithms when MRF signal… ▽ More MR Fingerprinting is a novel quantitative MR technique that could simultaneously provide multiple tissue property maps. When optimizing MRF scans, modeling undersampling errors and field imperfections in cost functions will make the optimization results more practical and robust. However, this process is computationally expensive and impractical for sequence optimization algorithms when MRF signal evolutions need to be generated for each optimization iteration. Here, we introduce a fast MRF simulator to simulate aliased images from actual scan scenarios including undersampling and system imperfections, which substantially reduces computational time and allows for direct error estimation and efficient sequence optimization. By constraining the total number of tissues present in a brain phantom, MRF signals from highly undersampled scans can be simulated as the product of the spatial response functions based on sampling patterns and sequence-dependent temporal functions. During optimization, the spatial response function is independent of sequence design and does not need to be recalculated. We evaluate the performance and computational speed of the proposed approach by simulations and in vivo experiments. We also demonstrate the power of applying the simulator in MRF sequence optimization. The simulation results from the proposed method closely approximate the signals and MRF maps from in vivo scans, with 158 times shorter processing time than the conventional simulation method using Non-uniform Fourier transform. Incorporating the proposed simulator in the MRF optimization framework makes direct estimation of undersampling errors during the optimization process feasible, and provide optimized MRF sequences that are robust against undersampling factors and system inhomogeneity. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 10 pages, 7 figures

arXiv:2103.16051 [pdf, ps, other]

Reduced Dynamics and Control for an Autonomous Bicycle

Authors: Jiaming Xiong, Bo Li, Ruihan Yu, Daolin Ma, Wei Wang, Caishan Liu

Abstract: In this paper, we propose the reduced model for the full dynamics of a bicycle and analyze its nonlinear behavior under a proportional control law for steering. Based on the Gibbs-Appell equations for the Whipple bicycle, we obtain a second-order nonlinear ordinary differential equation (ODE) that governs the bicycle's controlled motion. Two types of equilibrium points for the governing equation a… ▽ More In this paper, we propose the reduced model for the full dynamics of a bicycle and analyze its nonlinear behavior under a proportional control law for steering. Based on the Gibbs-Appell equations for the Whipple bicycle, we obtain a second-order nonlinear ordinary differential equation (ODE) that governs the bicycle's controlled motion. Two types of equilibrium points for the governing equation are found, which correspond to the bicycle's uniform straight forward and circular motions, respectively. By applying the Hurwitz criterion to the linearized equation, we find that the steer coefficient must be negative, consistent with the human's intuition of turning toward a fall. Under this condition, a critical angular velocity of the rear wheel exists, above which the uniform straight forward motion is stable, and slightly below which a pair of symmetrical stable uniform circular motions will occur. These theoretical findings are verified by both numerical simulations and experiments performed on a powered autonomous bicycle. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Journal ref: ICRA 2021

arXiv:2103.10363 [pdf, other]

doi 10.1109/PCS50896.2021.9477460

A Subjective Study on Videos at Various Bit Depths

Authors: Alex Mackin, Di Ma, Fan Zhang, David Bull

Abstract: Bit depth adaptation, where the bit depth of a video sequence is reduced before transmission and up-sampled during display, can potentially reduce data rates with limited impact on perceptual quality. In this context, we conducted a subjective study on a UHD video database, BVI-BD, to explore the relationship between bit depth and visual quality. In this work, three bit depth adaptation methods ar… ▽ More Bit depth adaptation, where the bit depth of a video sequence is reduced before transmission and up-sampled during display, can potentially reduce data rates with limited impact on perceptual quality. In this context, we conducted a subjective study on a UHD video database, BVI-BD, to explore the relationship between bit depth and visual quality. In this work, three bit depth adaptation methods are investigated, including linear scaling, error diffusion, and a novel adaptive Gaussian filtering approach. The results from a subjective experiment indicate that above a critical bit depth, bit depth adaptation has no significant impact on perceptual quality, while reducing the amount information that is required to be transmitted. Below the critical bit depth, advanced adaptation methods can be used to retain `good' visual quality (on average) down to around 2 bits per color channel for the outlined experimental setup - a large reduction compared to the typically used 8 bits per color channel. A selection of image quality metrics were subsequently bench-marked on the subjective data, and analysis indicates that a bespoke quality metric is required for bit depth adaptation. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: 5 pages; 7 figures; 1 table

arXiv:2101.04538 [pdf, ps, other]

doi 10.1063/5.0079234

Polarized hyperspectral imaging with single fiber bundle via incoherent light transmission matrix approach

Authors: Yitong Li, Zhengbo Zhu, Ze Li, Donglin Ma

Abstract: The scattering of multispectral incoherent light is a common and unfavorable signal scrambling in natural scenes. However, the blurred light spot due to scattering still holds lots of information remaining to be explored. Former methods failed to recover the polarized hyperspectral information from scattered incoherent light or relied on additional dispersion elements. Here we put forward the tran… ▽ More The scattering of multispectral incoherent light is a common and unfavorable signal scrambling in natural scenes. However, the blurred light spot due to scattering still holds lots of information remaining to be explored. Former methods failed to recover the polarized hyperspectral information from scattered incoherent light or relied on additional dispersion elements. Here we put forward the transmission matrix (TM) approach for extended objects under incoherent illumination by speculating the unknown TM through experimentally calibrated or digitally emulated ways. Employing a fiber bundle as a powerful imaging and dispersion element, we recover the spatial information in 252 polarized-spectral channels from a single speckle, thus achieving single-shot, high-resolution, broadband hyperspectral imaging for two polarization states with the cheap, compact, fiber-bundle-only system. Based on the scattering principle itself, our method not only greatly improves the robustness of the TM approach to retrieve the input spectral information, but also reveals the feasibility to explore the polarized spatio-spectral information from blurry speckles only with the help of simple optical setups. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:2011.09190 [pdf, other]

CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the repre… ▽ More We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC video coding test model (HM16.20) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets. △ Less

Submitted 26 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

arXiv:2010.13007 [pdf, other]

Probing Acoustic Representations for Phonetic Properties

Authors: Danni Ma, Neville Ryant, Mark Liberman

Abstract: Pre-trained acoustic representations such as wav2vec and DeCoAR have attained impressive word error rates (WER) for speech recognition benchmarks, particularly when labeled data is limited. But little is known about what phonetic properties these various representations acquire, and how well they encode transferable features of speech. We compare features from two conventional and four pre-trained… ▽ More Pre-trained acoustic representations such as wav2vec and DeCoAR have attained impressive word error rates (WER) for speech recognition benchmarks, particularly when labeled data is limited. But little is known about what phonetic properties these various representations acquire, and how well they encode transferable features of speech. We compare features from two conventional and four pre-trained systems in some simple frame-level phonetic classification tasks, with classifiers trained on features from one version of the TIMIT dataset and tested on features from another. All contextualized representations offered some level of transferability across domains, and models pre-trained on more audio data give better results; but overall, DeCoAR, the system with the simplest architecture, performs best. This type of benchmarking analysis can thus uncover relative strengths of various proposed acoustic representations. △ Less

Submitted 14 February, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

arXiv:2009.07583 [pdf, other]

doi 10.1109/MMUL.2021.3052437

Video Compression with CNN-based Post Processing

Authors: Fan Zhang, Di Ma, Chen Feng, David R. Bull

Abstract: In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content. Among various compression tools, post-processing can be applied on reconstructed video content to mitigate visible compression artefacts and to enhance overall perceptual quality. Inspired by advances in deep learning, we propos… ▽ More In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content. Among various compression tools, post-processing can be applied on reconstructed video content to mitigate visible compression artefacts and to enhance overall perceptual quality. Inspired by advances in deep learning, we propose a new CNN-based post-processing approach, which has been integrated with two state-of-the-art coding standards, VVC and AV1. The results show consistent coding gains on all tested sequences at various spatial resolutions, with average bit rate savings of 4.0% and 5.8% against original VVC and AV1 respectively (based on the assessment of PSNR). This network has also been trained with perceptually inspired loss functions, which have further improved reconstruction quality based on perceptual quality assessment (VMAF), with average coding gains of 13.9% over VVC and 10.5% against AV1. △ Less

Submitted 14 January, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

arXiv:2009.02752 [pdf, other]

Simultaneous Energy Harvesting and Gait Recognition using Piezoelectric Energy Harvester

Authors: Dong Ma, Guohao Lan, Weitao Xu, Mahbub Hassan, Wen Hu

Abstract: Piezoelectric energy harvester, which generates electricity from stress or vibrations, is gaining increasing attention as a viable solution to extend battery life in wearables. Recent research further reveals that, besides generating energy, PEH can also serve as a passive sensor to detect human gait power-efficiently because its stress or vibration patterns are significantly influenced by the gai… ▽ More Piezoelectric energy harvester, which generates electricity from stress or vibrations, is gaining increasing attention as a viable solution to extend battery life in wearables. Recent research further reveals that, besides generating energy, PEH can also serve as a passive sensor to detect human gait power-efficiently because its stress or vibration patterns are significantly influenced by the gait. However, as PEHs are not designed for precise measurement of motion, achievable gait recognition accuracy remains low with conventional classification algorithms. The accuracy deteriorates further when the generated electricity is stored simultaneously. To classify gait reliably while simultaneously storing generated energy, we make two distinct contributions. First, we propose a preprocessing algorithm to filter out the effect of energy storage on PEH electricity signal. Second, we propose a long short-term memory (LSTM) network-based classifier to accurately capture temporal information in gait-induced electricity generation. We prototype the proposed gait recognition architecture in the form factor of an insole and evaluate its gait recognition as well as energy harvesting performance with 20 subjects. Our results show that the proposed architecture detects human gait with 12% higher recall and harvests up to 127% more energy while consuming 38% less power compared to the state-of-the-art. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 13 pages, 17 figures, and 2 tables

arXiv:2007.14726 [pdf, other]

doi 10.1117/12.2567633

Video compression with low complexity CNN-based spatial resolution adaptation

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from… ▽ More It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from high complexity at the decoder due to the employment of CNN-based super-resolution. In this paper, a novel framework is proposed which supports the flexible allocation of complexity between the encoder and decoder. This approach employs a CNN model for video down-sampling at the encoder and uses a Lanczos3 filter to reconstruct full resolution at the decoder. The proposed method was integrated into the HEVC HM 16.20 software and evaluated on JVET UHD test sequences using the All Intra configuration. The experimental results demonstrate the potential of the proposed approach, with significant bitrate savings (more than 10%) over the original HEVC HM, coupled with reduced computational complexity at both encoder (29%) and decoder (10%). △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2007.09248 [pdf, other]

doi 10.1109/TCCN.2021.3118465

Fine Timing and Frequency Synchronization for MIMO-OFDM: An Extreme Learning Approach

Authors: Jun Liu, Kai Mei, Xiaochen Zhang, Des McLernon, Dongtang Ma, Jibo Wei, Syed Ali Raza Zaidi

Abstract: Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to ach… ▽ More Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to achieve high-precision synchronization. Specifically, exploiting the preamble signals with synchronization offsets, two ELMs are incorporated into a traditional MIMO-OFDM system to estimate both the residual symbol timing offset (RSTO) and the residual carrier frequency offset (RCFO). The simulation results show that the performance of the proposed ELM-based synchronization scheme is superior to the traditional method under both additive white Gaussian noise (AWGN) and frequency selective fading channels. Furthermore, comparing with the existing machine learning based techniques, the proposed method shows outstanding performance without the requirement of perfect channel state information (CSI) and prohibitive computational complexity. Finally, the proposed method is robust in terms of the choice of channel parameters (e.g., number of paths) and also in terms of "generalization ability" from a machine learning standpoint. △ Less

Submitted 1 June, 2022; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: 13 pages, 12 figures, has been accepted for publication in IEEE Transactions on Cognitive Communications and Networking

Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2021

arXiv:2007.07099 [pdf, other]

doi 10.1109/JSTSP.2020.3043064

MFRNet: A New CNN Architecture for Post-Processing and In-loop Filtering

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: In this paper, we propose a novel convolutional neural network (CNN) architecture, MFRNet, for post-processing (PP) and in-loop filtering (ILF) in the context of video compression. This network consists of four Multi-level Feature review Residual dense Blocks (MFRBs), which are connected using a cascading structure. Each MFRB extracts features from multiple convolutional layers using dense connect… ▽ More In this paper, we propose a novel convolutional neural network (CNN) architecture, MFRNet, for post-processing (PP) and in-loop filtering (ILF) in the context of video compression. This network consists of four Multi-level Feature review Residual dense Blocks (MFRBs), which are connected using a cascading structure. Each MFRB extracts features from multiple convolutional layers using dense connections and a multi-level residual learning structure. In order to further improve information flow between these blocks, each of them also reuses high dimensional features from the previous MFRB. This network has been integrated into PP and ILF coding modules for both HEVC (HM 16.20) and VVC (VTM 7.0), and fully evaluated under the JVET Common Test Conditions using the Random Access configuration. The experimental results show significant and consistent coding gains over both anchor codecs (HEVC HM and VVC VTM) and also over other existing CNN-based PP/ILF approaches based on Bjontegaard Delta measurements using both PSNR and VMAF for quality assessment. When MFRNet is integrated into HM 16.20, gains up to 16.0% (BD-rate VMAF) are demonstrated for ILF, and up to 21.0% (BD-rate VMAF) for PP. The respective gains for VTM 7.0 are up to 5.1% for ILF and up to 7.1% for PP. △ Less

Submitted 11 December, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2005.06101 [pdf]

A Cyber Physical System Framework for UAV Communications

Authors: Haijun Wang, Haitao Zhao, Dongtang Ma, Jibo Wei

Abstract: Diverse applications have witnessed the prevalence of unmanned aerial vehicles (UAVs) due to their agility and versatility. Compared with computation and control, the communication tends to be the bottleneck of the whole UAV system. Cyber physical system (CPS), which achieves the integration of the cyber and physical domains, can inspire us to deal with the communication problems through a cross-d… ▽ More Diverse applications have witnessed the prevalence of unmanned aerial vehicles (UAVs) due to their agility and versatility. Compared with computation and control, the communication tends to be the bottleneck of the whole UAV system. Cyber physical system (CPS), which achieves the integration of the cyber and physical domains, can inspire us to deal with the communication problems through a cross-disciplinary method. To this end, we first expound the coupling effects of computation and control to communication. Then, we propose a novel CPS framework for UAV communications. By extending the dimension of communication decisions to computation and control, the framework can precisely orient and settle the communication issues. Further, a quantitative energy optimization model is established to guide the protocol and algorithm design for UAV communications. Case simulation results validate the CPS framework in terms of the energy consumption and communication delay. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: 7 pages, 5 figures, 1 table, 15 references

arXiv:2004.02270 [pdf, ps, other]

Game of Learning Bloch Equation Simulations for MR Fingerprinting

Authors: Mingrui Yang, Yun Jiang, Dan Ma, Bhairav B. Mehta, Mark A. Griswold

Abstract: Purpose: This work proposes a novel approach to efficiently generate MR fingerprints for MR fingerprinting (MRF) problems based on the unsupervised deep learning model generative adversarial networks (GAN). Methods: The GAN model is adopted and modified for better convergence and performance, resulting in an MRF specific model named GAN-MRF. The GAN-MRF model is trained, validated, and tested usin… ▽ More Purpose: This work proposes a novel approach to efficiently generate MR fingerprints for MR fingerprinting (MRF) problems based on the unsupervised deep learning model generative adversarial networks (GAN). Methods: The GAN model is adopted and modified for better convergence and performance, resulting in an MRF specific model named GAN-MRF. The GAN-MRF model is trained, validated, and tested using different MRF fingerprints simulated from the Bloch equations with certain MRF sequence. The performance and robustness of the model are further tested by using in vivo data collected on a 3 Tesla scanner from a healthy volunteer together with MRF dictionaries with different sizes. T1, T2 maps are generated and compared quantitatively. Results: The validation and testing curves for the GAN-MRF model show no evidence of high bias or high variance problems. The sample MRF fingerprints generated from the trained GAN-MRF model agree well with the benchmark fingerprints simulated from the Bloch equations. The in vivo T1, T2 maps generated from the GAN-MRF fingerprints are in good agreement with those generated from the Bloch simulated fingerprints, showing good performance and robustness of the proposed GAN-MRF model. Moreover, the MRF dictionary generation time is reduced from hours to sub-second for the testing dictionary. Conclusion: The GAN-MRF model enables a fast and accurate generation of the MRF fingerprints. It significantly reduces the MRF dictionary generation process and opens the door for real-time applications and sequence optimization problems. △ Less

Submitted 5 April, 2020; originally announced April 2020.

arXiv:2003.13552 [pdf, other]

doi 10.1109/TMM.2021.3108943

BVI-DVC: A Training Database for Deep Video Compression

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: Deep learning methods are increasingly being applied in the optimisation of video compression algorithms and can achieve significantly enhanced coding gains, compared to conventional approaches. Such approaches often employ Convolutional Neural Networks (CNNs) which are trained on databases with relatively limited content coverage. In this paper, a new extensive and representative video database,… ▽ More Deep learning methods are increasingly being applied in the optimisation of video compression algorithms and can achieve significantly enhanced coding gains, compared to conventional approaches. Such approaches often employ Convolutional Neural Networks (CNNs) which are trained on databases with relatively limited content coverage. In this paper, a new extensive and representative video database, BVI-DVC, is presented for training CNN-based video compression systems, with specific emphasis on machine learning tools that enhance conventional coding architectures, including spatial resolution and bit depth up-sampling, post-processing and in-loop filtering. BVI-DVC contains 800 sequences at various spatial resolutions from 270p to 2160p and has been evaluated on ten existing network architectures for four different coding tools. Experimental results show that this database produces significant improvements in terms of coding gains over three existing (commonly used) image/video training databases under the same training and evaluation configurations. The overall additional coding improvements by using the proposed database for all tested coding modules and CNN architectures are up to 10.3% based on the assessment of PSNR and 8.1% based on VMAF. △ Less

Submitted 8 October, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:2003.10404 [pdf, other]

doi 10.1109/TVT.2021.3056408

Spatial Modulation for Joint Radar-Communications Systems: Design, Analysis, and Hardware Prototype

Authors: Dingyou Ma, Nir Shlezinger, Tianyao Huang, Yariv Shavit, Moshe Namer, Yimin Liu, Yonina C. Eldar

Abstract: Dual-function radar-communications (DFRC) systems implement radar and communication functionalities on a single platform. Jointly designing these subsystems can lead to substantial gains in performance as well as size, cost, and power consumption. In this paper, we propose a DFRC system, which utilizes generalized spatial modulation (GSM) to realize coexisting radar and communications waveforms. O… ▽ More Dual-function radar-communications (DFRC) systems implement radar and communication functionalities on a single platform. Jointly designing these subsystems can lead to substantial gains in performance as well as size, cost, and power consumption. In this paper, we propose a DFRC system, which utilizes generalized spatial modulation (GSM) to realize coexisting radar and communications waveforms. Our proposed GSM-based scheme, referred to as spatial modulation based communication-radar (SpaCoR) system, allocates antenna elements among the subsystems based on the transmitted message, thus achieving increased communication rates by embedding additional data bits in the antenna selection. We formulate the resulting signal models, and present a dedicated radar processing scheme. To evaluate the radar performance, we characterize the statistical properties of the transmit beam pattern. Then, we present a hardware prototype of the proposed DFRC system, demonstrating the feasibility of the scheme. Our results show that the proposed GSM system achieves improved communication performance compared to techniques utilizing fixed allocations operating at the same data rate. For the radar subsystem, our experiments show that the spatial agility induced by the GSM transmission improves the angular resolution and reduces the sidelobe level in the transmit beam pattern compared to using fixed antenna allocations. △ Less

Submitted 15 July, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: 14pages

Journal ref: IEEE Transactions on Vehicular Technology ( Volume: 70, Issue: 3, March 2021)

Showing 1–50 of 66 results for author: Ma, D