Search | arXiv e-print repository

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.16248 [pdf]

Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully utilized. To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods. This study introduced a MultiUNet model for segmenting white matter, leveraging the UNet architecture and utilizing manually segmented MRI images as the training data. Subsequently, we extracted white matter features using the Pyradiomics toolkit and applied different machine learning models such as Support Vector Machine, Random Forest, Logistic Regression, and K-Nearest Neighbors to predict autism. The prediction sets all exceeded 80% accuracy. Additionally, we employed Convolutional Neural Network to analyze segmented white matter images, achieving a prediction accuracy of 86.84%. Notably, Support Vector Machine demonstrated the highest prediction accuracy at 89.47%. These findings not only underscore the efficacy of the models but also establish a link between white matter abnormalities and autism. Our study contributes to a comprehensive evaluation of various diagnostic models for autism and introduces a computer-aided diagnostic algorithm for early and objective autism diagnosis based on MRI white matter regions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.05518 [pdf, other]

DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance consistency and temporal map consistency learning. To improve the representation of instances in single-frame maps, we introduce a novel method, DTCLMapper. This approach uses a dual-stream temporal consistency learning module that combines instance embedding with geometry maps. In the instance embedding component, our approach integrates temporal Instance Consistency Learning (ICL), ensuring consistency from vector points and instance features aggregated from points. A vectorized points pre-selection module is employed to enhance the regression efficiency of vector points from each instance. Then aggregated instance features obtained from the vectorized points preselection module are grounded in contrastive learning to realize temporal consistency, where positive and negative samples are selected based on position and semantic information. The geometry map** component introduces Map Consistency Learning (MCL) designed with self-supervised learning. The MCL enhances the generalization capability of our consistent learning approach by concentrating on the global location and distribution constraints of the instances. Extensive experiments on well-recognized benchmarks indicate that the proposed DTCLMapper achieves state-of-the-art performance in vectorized map** tasks, reaching 61.9% and 65.1% mAP scores on the nuScenes and Argoverse datasets, respectively. The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

arXiv:2404.16302 [pdf, other]

CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen

Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause… ▽ More Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW

arXiv:2403.08479 [pdf, other]

MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction

Authors: Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yali Shen, Yu Yao

Abstract: Radiation therapy is crucial in cancer treatment. Experienced experts typically iteratively generate high-quality dose distribution maps, forming the basis for excellent radiation therapy plans. Therefore, automated prediction of dose distribution maps is significant in expediting the treatment process and providing a better starting point for develo** radiation therapy plans. With the remarkabl… ▽ More Radiation therapy is crucial in cancer treatment. Experienced experts typically iteratively generate high-quality dose distribution maps, forming the basis for excellent radiation therapy plans. Therefore, automated prediction of dose distribution maps is significant in expediting the treatment process and providing a better starting point for develo** radiation therapy plans. With the remarkable results of diffusion models in predicting high-frequency regions of dose distribution maps, dose prediction methods based on diffusion models have been extensively studied. However, existing methods mainly utilize CNNs or Transformers as denoising networks. CNNs lack the capture of global receptive fields, resulting in suboptimal prediction performance. Transformers excel in global modeling but face quadratic complexity with image size, resulting in significant computational overhead. To tackle these challenges, we introduce a novel diffusion model, MD-Dose, based on the Mamba architecture for predicting radiation therapy dose distribution in thoracic cancer patients. In the forward process, MD-Dose adds Gaussian noise to dose distribution maps to obtain pure noise images. In the backward process, MD-Dose utilizes a noise predictor based on the Mamba to predict the noise, ultimately outputting the dose distribution maps. Furthermore, We develop a Mamba encoder to extract structural information and integrate it into the noise predictor for localizing dose regions in the planning target volume (PTV) and organs at risk (OARs). Through extensive experiments on a dataset of 300 thoracic tumor patients, we showcase the superiority of MD-Dose in various metrics and time consumption. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2401.17841 [pdf, other]

Stimulus-Informed Generalized Canonical Correlation Analysis for Group Analysis of Neural Responses to Natural Stimuli

Authors: Simon Geirnaert, Yuanyuan Yao, Tom Francart, Alexander Bertrand

Abstract: Various new brain-computer interface technologies or neuroscience applications require decoding stimulus-following neural responses to natural stimuli such as speech and video from, e.g., electroencephalography (EEG) signals. In this context, generalized canonical correlation analysis (GCCA) is often used as a group analysis technique, which allows the extraction of correlated signal components fr… ▽ More Various new brain-computer interface technologies or neuroscience applications require decoding stimulus-following neural responses to natural stimuli such as speech and video from, e.g., electroencephalography (EEG) signals. In this context, generalized canonical correlation analysis (GCCA) is often used as a group analysis technique, which allows the extraction of correlated signal components from the neural activity of multiple subjects attending to the same stimulus. GCCA can be used to improve the signal-to-noise ratio of the stimulus-following neural responses relative to all other irrelevant (non-)neural activity, or to quantify the correlated neural activity across multiple subjects in a group-wise coherence metric. However, the traditional GCCA technique is stimulus-unaware: no information about the stimulus is used to estimate the correlated components from the neural data of several subjects. Therefore, the GCCA technique might fail to extract relevant correlated signal components in practical situations where the amount of information is limited, for example, because of a limited amount of training data or group size. This motivates a new stimulus-informed GCCA (SI-GCCA) framework that allows taking the stimulus into account to extract the correlated components. We show that SI-GCCA outperforms GCCA in various practical settings, for both auditory and visual stimuli. Moreover, we showcase how SI-GCCA can be used to steer the estimation of the components towards the stimulus. As such, SI-GCCA substantially improves upon GCCA for various purposes, ranging from preprocessing to quantifying attention. △ Less

Submitted 1 July, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: 14 pages, 16 figures

arXiv:2401.01496 [pdf, other]

From Pixel to Slide image: Polarization Modality-based Pathological Diagnosis Using Representation Learning

Authors: Jia Dong, Yao Yao, Yang Dong, Hui Ma

Abstract: Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for develo** effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-le… ▽ More Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for develo** effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-level and slice-level annotations for distinguishing thyroid tumors. This structure includes a pathology structure recognition method to predict structures related to thyroid tumors, an encoder-decoder network to extract pixel-level annotation information by learning the feature representations of image blocks, and an attention-based learning mechanism for the final classification task. This mechanism learns the importance of different image blocks in a pathological region, globally considering the information from each block. In the third stage, all information from the image blocks in a region is aggregated using attention mechanisms, followed by classification to determine the category of the region. Experimental results demonstrate that our proposed method can predict microscopic structures more accurately. After color-coding, the method achieves results on unstained pathology slides that approximate the quality of Hematoxylin and eosin staining, reducing the need for stained pathology slides. Furthermore, by leveraging the concept of indirect measurement and extracting polarized features from structures correlated with lesions, the proposed method can also classify samples where membrane structures cannot be obtained through sampling, providing a potential objective and highly accurate indirect diagnostic technique for thyroid tumors. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.16607 [pdf, other]

A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mueller matrix images of liver pathological samples with radiomics features derived from corresponding pathological images to classify HCC and ICC. Our fusion network integrates a two-tier fusion approach, comprising early feature-level fusion and late classification-level fusion. By harnessing the strengths of polarization imaging techniques and image feature-based machine learning, our proposed fusion network significantly enhances classification accuracy. Notably, even at reduced imaging resolutions, the fusion network maintains robust performance due to the additional information provided by polarization features, which may not align with human visual perception. Our experimental results underscore the potential of this fusion network as a powerful tool for computer-aided diagnosis of HCC and ICC, showcasing the benefits and prospects of integrating polarization imaging techniques into the current image-intensive digital pathological diagnosis. We aim to contribute this innovative approach to top-tier journals, offering fresh insights and valuable tools in the fields of medical imaging and cancer diagnosis. By introducing polarization imaging into liver cancer classification, we demonstrate its interdisciplinary potential in addressing challenges in medical image analysis, promising advancements in medical imaging and cancer diagnosis. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.08176 [pdf, other]

doi 10.1109/TCSI.2023.3337283

ASC: Adaptive Scale Feature Map Compression for Deep Neural Network

Authors: Yuan Yao, Tian-Sheuan Chang

Abstract: Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and ut… ▽ More Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in 4$\times$ and up to 7.69$\times$ compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a 32$\times$ throughput increase meets the theoretical bandwidth of DDR5-6400 at just 7.65$\times$ the hardware cost. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06187 [pdf, other]

SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Authors: Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yu Yao, Yali Shen

Abstract: Radiation therapy serves as an effective and standard method for cancer treatment. Excellent radiation therapy plans always rely on high-quality dose distribution maps obtained through repeated trial and error by experienced experts. However, due to individual differences and complex clinical situations, even seasoned expert teams may need help to achieve the best treatment plan every time quickly… ▽ More Radiation therapy serves as an effective and standard method for cancer treatment. Excellent radiation therapy plans always rely on high-quality dose distribution maps obtained through repeated trial and error by experienced experts. However, due to individual differences and complex clinical situations, even seasoned expert teams may need help to achieve the best treatment plan every time quickly. Many automatic dose distribution prediction methods have been proposed recently to accelerate the radiation therapy planning process and have achieved good results. However, these results suffer from over-smoothing issues, with the obtained dose distribution maps needing more high-frequency details, limiting their clinical application. To address these limitations, we propose a dose prediction diffusion model based on SwinTransformer and a projector, SP-DiffDose. To capture the direct correlation between anatomical structure and dose distribution maps, SP-DiffDose uses a structural encoder to extract features from anatomical images, then employs a conditional diffusion process to blend noise and anatomical images at multiple scales and gradually map them to dose distribution maps. To enhance the dose prediction distribution for organs at risk, SP-DiffDose utilizes SwinTransformer in the deeper layers of the network to capture features at different scales in the image. To learn good representations from the fused features, SP-DiffDose passes the fused features through a designed projector, improving dose prediction accuracy. Finally, we evaluate SP-DiffDose on an internal dataset. The results show that SP-DiffDose outperforms existing methods on multiple evaluation metrics, demonstrating the superiority and generalizability of our method. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03227 [pdf, other]

Human Body Model based ID using Shape and Pose Parameters

Authors: Aravind Sundaresan, Brian Burns, Indranil Sur, Yi Yao, Xiao Lin, Sujeong Kim

Abstract: We present a Human Body model based IDentification system (HMID) system that is jointly trained for shape, pose and biometric identification. HMID is based on the Human Mesh Recovery (HMR) network and we propose additional losses to improve and stabilize shape estimation and biometric identification while maintaining the pose and shape output. We show that when our HMID network is trained using ad… ▽ More We present a Human Body model based IDentification system (HMID) system that is jointly trained for shape, pose and biometric identification. HMID is based on the Human Mesh Recovery (HMR) network and we propose additional losses to improve and stabilize shape estimation and biometric identification while maintaining the pose and shape output. We show that when our HMID network is trained using additional shape and pose losses, it shows a significant improvement in biometric identification performance when compared to an identical model that does not use such losses. The HMID model uses raw images instead of silhouettes and is able to perform robust recognition on images collected at range and altitude as many anthropometric properties are reasonably invariant to clothing, view and range. We show results on the USF dataset as well as the BRIAR dataset which includes probes with both clothing and view changes. Our approach (using body model losses) shows a significant improvement in Rank20 accuracy and True Accuracy Rate on the BRIAR evaluation dataset. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: to be published in IEEE International Joint Conference on Biometrics, Ljubljana, Slovenia 2023

arXiv:2310.19180 [pdf, other]

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

Authors: Yao Yao, Peike Li, Boyu Chen, Alex Wang

Abstract: With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner… ▽ More With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner, differing from typical workflows of human composers. To address this issue, we propose JEN-1 Composer, a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music via a single model. JEN-1 Composer framework exhibits the capacity to seamlessly incorporate any diffusion-based music generation system, \textit{e.g.} Jen-1, enhancing its capacity for versatile multi-track music generation. We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations. During the inference, users have the ability to iteratively produce and choose music tracks that meet their preferences, subsequently creating an entire musical composition incrementally following the proposed Human-AI co-composition workflow. Quantitative and qualitative assessments demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis. The proposed JEN-1 Composer represents a significant advance toward interactive AI-facilitated music creation and composition. Demos will be available at https://www.jenmusic.ai/audio-demos. △ Less

Submitted 2 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: Preprints

arXiv:2310.02918 [pdf, other]

Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic

Authors: Mohamed-Khalil Bouzidi, Yue Yao, Daniel Goehring, Joerg Reichardt

Abstract: Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. The… ▽ More Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.02835 [pdf]

A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi **, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space map**s can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space map**s can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hundred, and plays a revolutionary role in single-shot ultrafast optical imaging. However, due to the ultra-high data compression ratio induced by the extremely large sequence depth as well as the limited fidelities of traditional reconstruction algorithms over the reconstruction process, HCUP suffers from a poor image reconstruction quality and fails to capture fine structures in complex transient scenes. To overcome these restrictions, we propose a flexible image reconstruction algorithm based on the total variation (TV) and cascaded denoisers (CD) for HCUP, named the TV-CD algorithm. It applies the TV denoising model cascaded with several advanced deep learning-based denoising models in the iterative plug-and-play alternating direction method of multipliers framework, which can preserve the image smoothness while utilizing the deep denoising networks to obtain more priori, and thus solving the common sparsity representation problem in local similarity and motion compensation. Both simulation and experimental results show that the proposed TV-CD algorithm can effectively improve the image reconstruction accuracy and quality of HCUP, and further promote the practical applications of HCUP in capturing high-dimensional complex physical, chemical and biological ultrafast optical scenes. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 25 pages, 5 figures and 1 table

arXiv:2308.06746 [pdf, ps, other]

Self-supervised Noise2noise Method Utilizing Corrupted Images with a Modular Network for LDCT Denoising

Authors: Yuting Zhu, Qiang He, Yudong Yao, Yueyang Teng

Abstract: Deep learning is a very promising technique for low-dose computed tomography (LDCT) image denoising. However, traditional deep learning methods require paired noisy and clean datasets, which are often difficult to obtain. This paper proposes a new method for performing LDCT image denoising with only LDCT data, which means that normal-dose CT (NDCT) is not needed. We adopt a combination including t… ▽ More Deep learning is a very promising technique for low-dose computed tomography (LDCT) image denoising. However, traditional deep learning methods require paired noisy and clean datasets, which are often difficult to obtain. This paper proposes a new method for performing LDCT image denoising with only LDCT data, which means that normal-dose CT (NDCT) is not needed. We adopt a combination including the self-supervised noise2noise model and the noisy-as-clean strategy. First, we add a second yet similar type of noise to LDCT images multiple times. Note that we use LDCT images based on the noisy-as-clean strategy for corruption instead of NDCT images. Then, the noise2noise model is executed with only the secondary corrupted images for training. We select a modular U-Net structure from several candidates with shared parameters to perform the task, which increases the receptive field without increasing the parameter size. The experimental results obtained on the Mayo LDCT dataset show the effectiveness of the proposed method compared with that of state-of-the-art deep learning methods. The developed code is available at https://github.com/XYuan01/Self-supervised-Noise2Noise-for-LDCT. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2308.04729 [pdf, other]

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Authors: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

Abstract: Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational ef… ▽ More Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1 △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03354 [pdf, other]

Energy-Guided Diffusion Model for CBCT-to-CT Synthesis

Authors: Linjie Fu, Xia Li, Xiuding Cai, Dong Miao, Yu Yao, Yali Shen

Abstract: Cone Beam CT (CBCT) plays a crucial role in Adaptive Radiation Therapy (ART) by accurately providing radiation treatment when organ anatomy changes occur. However, CBCT images suffer from scatter noise and artifacts, making relying solely on CBCT for precise dose calculation and accurate tissue localization challenging. Therefore, there is a need to improve CBCT image quality and Hounsfield Unit (… ▽ More Cone Beam CT (CBCT) plays a crucial role in Adaptive Radiation Therapy (ART) by accurately providing radiation treatment when organ anatomy changes occur. However, CBCT images suffer from scatter noise and artifacts, making relying solely on CBCT for precise dose calculation and accurate tissue localization challenging. Therefore, there is a need to improve CBCT image quality and Hounsfield Unit (HU) accuracy while preserving anatomical structures. To enhance the role and application value of CBCT in ART, we propose an energy-guided diffusion model (EGDiff) and conduct experiments on a chest tumor dataset to generate synthetic CT (sCT) from CBCT. The experimental results demonstrate impressive performance with an average absolute error of 26.87$\pm$6.14 HU, a structural similarity index measurement of 0.850$\pm$0.03, a peak signal-to-noise ratio of the sCT of 19.83$\pm$1.39 dB, and a normalized cross-correlation of the sCT of 0.874$\pm$0.04. These results indicate that our method outperforms state-of-the-art unsupervised synthesis methods in accuracy and visual quality, producing superior sCT images. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.02295

IRS-Enabled Covert and Reliable Communications: How Many Reflection Elements are Required?

Authors: Manlin Wang, Bin Xia, Yao Yao, Zhiyong Chen, Jiangzhou Wang

Abstract: Short-packet communications are applied to various scenarios where transmission covertness and reliability are crucial due to the open wireless medium and finite blocklength. Although intelligent reflection surface (IRS) has been widely utilized to enhance transmission covertness and reliability, the question of how many reflection elements at IRS are required remains unanswered, which is vital to… ▽ More Short-packet communications are applied to various scenarios where transmission covertness and reliability are crucial due to the open wireless medium and finite blocklength. Although intelligent reflection surface (IRS) has been widely utilized to enhance transmission covertness and reliability, the question of how many reflection elements at IRS are required remains unanswered, which is vital to system design and practical deployment. The inherent strong coupling exists between the transmission covertness and reliability by IRS, leading to the question of intractability. To address this issue, the detection error probability at the warder and its approximation are derived first to reveal the relation between covertness performance and the number of reflection elements. Besides, to evaluate the reliability performance of the system, the decoding error probability at the receiver is also derived. Subsequently, the asymptotic reliability performance in high covertness regimes is investigated, which provides theoretical predictions about the number of reflection elements at IRS required to achieve a decoding error probability close to 0 with given covertness requirements. Furthermore, Monte-Carlo simulations verify the accuracy of the derived results for detection (decoding) error probabilities and the validity of the theoretical predictions for reflection elements. Moreover, results show that more reflection elements are required to achieve high reliability with tighter covertness requirements, longer blocklength and higher transmission rates. △ Less

Submitted 9 September, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: The paper has some shortcomings in the theoretical analysis. And it will not be published at the conference, as clamied in last comments

arXiv:2308.02284 [pdf, ps, other]

Covert and Reliable Short-Packet Communications against A Proactive Warder

Authors: Manlin Wang, Yao Yao, Bin Xia, Zhiyong Chen, Jiangzhou Wang

Abstract: Wireless short-packet communications pose challenges to the security and reliability of the transmission. Besides, the proactive warder compounds these challenges, who detects and interferes with the potential transmission. An extra jamming channel is introduced by the proactive warder compared with the passive one, resulting in the inapplicability of analytical methods and results in exsiting wor… ▽ More Wireless short-packet communications pose challenges to the security and reliability of the transmission. Besides, the proactive warder compounds these challenges, who detects and interferes with the potential transmission. An extra jamming channel is introduced by the proactive warder compared with the passive one, resulting in the inapplicability of analytical methods and results in exsiting works. Thus, effective system design schemes are required for short-packet communications against the proactive warder. To address this issue, we consider the analysis and design of covert and reliable transmissions for above systems. Specifically, to investigate the reliable and covert performance of the system, detection error probability at the warder and decoding error probability at the receiver are derived, which is affected by both the transmit power and the jamming power. Furthermore, to maximize the effective throughput, an optimization framework is proposed under reliability and covertness constraints. Numerical results verify the accuracy of analytical results and the feasibility of the optimization framework. It is shown that the tradeoff between transmission reliability and covertness is changed by the proactive warder compared with the passive one. Besides, it is shown that longer blocklength is always beneficial to improve the throughput for systems with optimized transmission rates. But when transmission rates are fixed, the blocklength should be carefully designed since the maximum one is not optimal in this case. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures; to appear in 12th IEEE/CIC lnternational Conference on Communications in China (ICCC 2023)

arXiv:2308.01981 [pdf, other]

doi 10.1016/j.media.2023.103035

CartiMorph: a framework for automated knee articular cartilage morphometrics

Authors: Yongcheng Yao, Junru Zhong, Li** Zhang, Sheheryar Khan, Weitian Chen

Abstract: We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learnin… ▽ More We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learning models were trained and validated for tissue segmentation, template construction, and template-to-image registration. We established methods for surface-normal-based cartilage thickness map**, FCL estimation, and rule-based cartilage parcellation. Our cartilage thickness map showed less error in thin and peripheral regions. We evaluated the effectiveness of the adopted segmentation model by comparing the quantitative metrics obtained from model segmentation and those from manual segmentation. The root-mean-squared deviation of the FCL measurements was less than 8%, and strong correlations were observed for the mean thickness (Pearson's correlation coefficient $ρ\in [0.82,0.97]$), surface area ($ρ\in [0.82,0.98]$) and volume ($ρ\in [0.89,0.98]$) measurements. We compared our FCL measurements with those from a previous study and found that our measurements deviated less from the ground truths. We observed superior performance of the proposed rule-based cartilage parcellation method compared with the atlas-based approach. CartiMorph has the potential to promote imaging biomarkers discovery for knee osteoarthritis. △ Less

Submitted 20 November, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: This preprint is an proofread version of a paper published in Medical Image Analysis (2023), which can be found at https://doi.org/10.1016/j.media.2023.103035

arXiv:2307.00700 [pdf]

doi 10.1109/JSEN.2022.3203147

Coverage Enhancement Strategy in WMSNs Based on a Novel Swarm Intelligence Algorithm: Army Ant Search Optimizer

Authors: Yindi Yao, Qin Wen, Yanpeng Cui, Feng Zhao, Bozhan Zhao, Yao** Zeng

Abstract: As one of the most crucial scenarios of the Internet of Things (IoT), wireless multimedia sensor networks (WMSNs) pay more attention to the information-intensive data (e.g., audio, video, image) for remote environments. The area coverage reflects the perception of WMSNs to the surrounding environment, where a good coverage effect can ensure effective data collection. Given the harsh and complex ph… ▽ More As one of the most crucial scenarios of the Internet of Things (IoT), wireless multimedia sensor networks (WMSNs) pay more attention to the information-intensive data (e.g., audio, video, image) for remote environments. The area coverage reflects the perception of WMSNs to the surrounding environment, where a good coverage effect can ensure effective data collection. Given the harsh and complex physical environment of WMSNs, which easily form the sensing overlap** regions and coverage holes by random deployment. The intention of our research is to deal with the optimization problem of maximizing the coverage rate in WMSNs. By proving the NP-hard of the coverage enhancement of WMSNs, inspired by the predation behavior of army ants, this article proposes a novel swarm intelligence (SI) technology army ant search optimizer (AASO) to solve the above problem, which is implemented by five operators: army ant and prey initialization, recruited by prey, attack prey, update prey, and build ant bridge. The simulation results demonstrate that the optimizer shows good performance in terms of exploration and exploitation on benchmark suites when compared to other representative SI algorithms. More importantly, coverage enhancement AASO-based in WMSNs has better merits in terms of coverage effect when compared to existing approaches. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 13 page, 12 figure, 8 tables

Journal ref: in IEEE Sensors Journal, vol. 22, no. 21, pp. 21299-21311, Nov., 2022

arXiv:2307.00699 [pdf]

doi 10.1109/JSEN.2022.3178441

Game Theory and Coverage Optimization Based Multihop Routing Protocol for Network Lifetime in Wireless Sensor Networks

Authors: Yindi Yao, Xiong Li, Yanpeng Cui, Lang Deng, Chen Wang

Abstract: Wireless sensor networks (WSNs) are self-organizing monitoring networks with a large number of randomly deployed microsensor nodes to collect various physical information to realize tasks such as intelligent perception, efficient control, and decision-making. However, WSN nodes are powered by batteries, so they will run out of energy after a certain time. This energy limitation will greatly constr… ▽ More Wireless sensor networks (WSNs) are self-organizing monitoring networks with a large number of randomly deployed microsensor nodes to collect various physical information to realize tasks such as intelligent perception, efficient control, and decision-making. However, WSN nodes are powered by batteries, so they will run out of energy after a certain time. This energy limitation will greatly constrain the network performance like network lifetime and energy efficiency. In this study, to prolong the network lifetime, we proposed a multi-hop routing protocol based on game theory and coverage optimization (MRP-GTCO). Briefly, in the stage of setup, two innovational strategies including a clustering game with penalty function and cluster head coverage set were designed to realize the uniformity of cluster head distribution and improve the rationality of cluster head election. In the data transmission stage, we first derived the applicable conditions theorem of inter-cluster multi-hop routing. Based on this, a novel multi-hop path selection algorithm related to residual energy and node degree was proposed to provide an energy-efficient data transmission path. The simulation results showed that the MRP-GTCO protocol can effectively reduce the network energy consumption and extend the network lifetime by 159.22%, 50.76%, and 16.46% compared with LGCA, RLEACH, and ECAGT protocols. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 14 pages, 13 figure, 3 tables

Journal ref: in IEEE Sensors Journal, vol. 22, no. 13, pp. 13739-13752, July, 2022

arXiv:2307.00697 [pdf]

doi 10.1109/JSEN.2022.3150770

Energy-Efficient Routing Protocol Based on Multi-Threshold Segmentation in Wireless Sensors Networks for Precision Agriculture

Authors: Yindi Yao, Xiong Li, Yanpeng Cui, Jiajun Wang, Chen Wang

Abstract: Wireless sensor networks (WSNs), one of the fundamental technologies of the Internet of Things (IoT), can provide sensing and communication services efficiently for IoT-based applications, especially energy-limited applications. Clustering routing protocol plays an important role in reducing energy consumption and prolonging network lifetime. The cluster formation and cluster head selection are th… ▽ More Wireless sensor networks (WSNs), one of the fundamental technologies of the Internet of Things (IoT), can provide sensing and communication services efficiently for IoT-based applications, especially energy-limited applications. Clustering routing protocol plays an important role in reducing energy consumption and prolonging network lifetime. The cluster formation and cluster head selection are the key to improving the performance of the clustering routing protocol. An energy-efficient routing protocol based on multi-threshold segmentation (EERPMS) was proposed in this paper to improve the rationality of cluster formation and cluster head selection. In the stage of cluster formation, inspired by multi-threshold image segmentation, an innovative node clustering algorithm was developed. In the stage of cluster head selection, aiming at minimizing the network energy consumption, a calculation theory of the optimal number and location of cluster heads was established. Furthermore, a novel cluster head selection algorithm was constructed based on the residual energy and optimal location of cluster heads. Simulation results show that EERPMS can improve the distribution uniformity of cluster heads, prolong the network lifetime and save up to 64.50%, 58.60%, and 56.15% network energy as compared to RLEACH, CRPFCM, and FIGWO protocols respectively. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 16 pages, 24 figure, 4 tables

Journal ref: in IEEE Sensors Journal, vol. 22, no. 7, pp. 6216-6231, 1 Apr. 2022

arXiv:2307.00696 [pdf]

doi 10.1109/LSENS.2022.3158274

Discrete Army Ant Search Optimizer-Based Target Coverage Enhancement in Directional Sensor Networks

Authors: Yindi Yao, Qin Wen, Yanpeng Cui, Bozhan Zhao

Abstract: Coverage of interest points is one of the most critical issues in directional sensor networks. However, considering the remote or inhospitable environment and the limitation of the perspective of directional sensors, it is easy to form perception blind after random deployment. The intension of our research is to deal with the bound-constrained optimization problem of maximizing the coverage of tar… ▽ More Coverage of interest points is one of the most critical issues in directional sensor networks. However, considering the remote or inhospitable environment and the limitation of the perspective of directional sensors, it is easy to form perception blind after random deployment. The intension of our research is to deal with the bound-constrained optimization problem of maximizing the coverage of target points. A coverage enhancement strategy based on a discrete army ant search optimizer (DAASO) is proposed to solve the above problem, which is inspired by the biological habits of army ants. A set of experiments are conducted using different sensor parameters. Experimental results verify the effectiveness of the DAASO in coverage effect when compared to the existing methods. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 4 pages, 4 figure, 2 tables

Journal ref: in IEEE Sensors Letters, vol. 6, no. 4, pp. 1-4, April 2022, Art no. 7500404

arXiv:2305.03899 [pdf, other]

NL-CS Net: Deep Learning with Non-Local Prior for Image Compressive Sensing

Authors: Shuai Bian, Shouliang Qi, Chen Li, Yudong Yao, Yueyang Teng

Abstract: Deep learning has been applied to compressive sensing (CS) of images successfully in recent years. However, existing network-based methods are often trained as the black box, in which the lack of prior knowledge is often the bottleneck for further performance improvement. To overcome this drawback, this paper proposes a novel CS method using non-local prior which combines the interpretability of t… ▽ More Deep learning has been applied to compressive sensing (CS) of images successfully in recent years. However, existing network-based methods are often trained as the black box, in which the lack of prior knowledge is often the bottleneck for further performance improvement. To overcome this drawback, this paper proposes a novel CS method using non-local prior which combines the interpretability of the traditional optimization methods with the speed of network-based methods, called NL-CS Net. We unroll each phase from iteration of the augmented Lagrangian method solving non-local and sparse regularized optimization problem by a network. NL-CS Net is composed of the up-sampling module and the recovery module. In the up-sampling module, we use learnable up-sampling matrix instead of a predefined one. In the recovery module, patch-wise non-local network is employed to capture long-range feature correspondences. Important parameters involved (e.g. sampling matrix, nonlinear transforms, shrinkage thresholds, step size, $etc.$) are learned end-to-end, rather than hand-crafted. Furthermore, to facilitate practical implementation, orthogonal and binary constraints on the sampling matrix are simultaneously adopted. Extensive experiments on natural images and magnetic resonance imaging (MRI) demonstrate that the proposed method outperforms the state-of-the-art methods while maintaining great interpretability and speed. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 21pages,6figures

ACM Class: I.4.7

arXiv:2303.12735 [pdf, other]

SMUG: Towards robust MRI reconstruction by smoothed unrolling

Authors: Hui Li, **ghan Jia, Shijun Liang, Yuguang Yao, Saiprasad Ravishankar, Sijia Liu

Abstract: Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconst… ▽ More Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconstruction. To address this problem, we propose a novel image reconstruction framework, termed SMOOTHED UNROLLING (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning operation. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense for image classification. Yet, we find that the conventional design that applies RS to the entire DL process is ineffective for MRI reconstruction. We show that SMUG addresses the above issue by customizing the RS operation based on the unrolling architecture of the DL-based MRI reconstruction model. Compared to the vanilla RS approach and several variants of SMUG, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of perturbation sources, including perturbations to the input measurements, different measurement sampling rates, and different unrolling steps. Code for SMUG will be available at https://github.com/LGM70/SMUG. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.12571 [pdf]

3D PETCT Tumor Lesion Segmentation via GCN Refinement

Authors: Hengzhi Xue, Qingqing Fang, Yudong Yao, Yueyang Teng

Abstract: Whole-body PET/CT scan is an important tool for diagnosing various malignancies (e.g., malignant melanoma, lymphoma, or lung cancer), and accurate segmentation of tumors is a key part for subsequent treatment. In recent years, CNN-based segmentation methods have been extensively investigated. However, these methods often give inaccurate segmentation results, such as over-segmentation and under-seg… ▽ More Whole-body PET/CT scan is an important tool for diagnosing various malignancies (e.g., malignant melanoma, lymphoma, or lung cancer), and accurate segmentation of tumors is a key part for subsequent treatment. In recent years, CNN-based segmentation methods have been extensively investigated. However, these methods often give inaccurate segmentation results, such as over-segmentation and under-segmentation. Therefore, to address such issues, we propose a post-processing method based on a graph convolutional neural network (GCN) to refine inaccurate segmentation parts and improve the overall segmentation accuracy. Firstly, nnUNet is used as an initial segmentation framework, and the uncertainty in the segmentation results is analyzed. Certainty and uncertainty nodes establish the nodes of a graph neural network. Each node and its 6 neighbors form an edge, and 32 nodes are randomly selected for uncertain nodes to form edges. The highly uncertain nodes are taken as the subsequent refinement targets. Secondly, the nnUNet result of the certainty nodes is used as label to form a semi-supervised graph network problem, and the uncertainty part is optimized through training the GCN network to improve the segmentation performance. This describes our proposed nnUNet-GCN segmentation framework. We perform tumor segmentation experiments on the PET/CT dataset in the MICCIA2022 autoPET challenge. Among them, 30 cases are randomly selected for testing, and the experimental results show that the false positive rate is effectively reduced with nnUNet-GCN refinement. In quantitative analysis, there is an improvement of 2.12 % on the average Dice score, 6.34 on 95 % Hausdorff Distance (HD95), and 1.72 on average symmetric surface distance (ASSD). The quantitative and qualitative evaluation results show that GCN post-processing methods can effectively improve tumor segmentation performance. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 10 pages,5 figures,38 reference

arXiv:2212.14282 [pdf, other]

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

Authors: Yuan Zhang, Jianhua Zhang, Yuxiang Zhang, Yuan Yao, Guangyi Liu

Abstract: Holographic Multiple-Input and Multiple-Output (MIMO) is envisioned as a promising technology to realize unprecedented spectral efficiency by integrating a large number of antennas into a compact space. Most research on holographic MIMO is based on isotropic scattering environments, and the antenna gain is assumed to be unlimited by deployment space. However, the channel might not satisfy isotropi… ▽ More Holographic Multiple-Input and Multiple-Output (MIMO) is envisioned as a promising technology to realize unprecedented spectral efficiency by integrating a large number of antennas into a compact space. Most research on holographic MIMO is based on isotropic scattering environments, and the antenna gain is assumed to be unlimited by deployment space. However, the channel might not satisfy isotropic scattering because of generalized angle distributions, and the antenna gain is limited by the array aperture in reality. In this letter, we aim to analyze the holographic MIMO channel capacity under practical angle distribution and array aperture constraints. First, we calculate the spectral density for generalized angle distributions by introducing a wavenumber domain-based method. And then, the capacity under generalized angle distributions is analyzed and two different aperture schemes are considered. Finally, numerical results show that the capacity is obviously affected by angle distribution at high signal-to-noise ratio (SNR) but hardly affected at low SNR, and the capacity will not increase infinitely with antenna density due to the array aperture constraint. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.07023 [pdf]

doi 10.21037/qims-23-704

Unsupervised Domain Adaptation for Automated Knee Osteoarthritis Phenotype Classification

Authors: Junru Zhong, Yongcheng Yao, Donal G. Cahill, Fan Xiao, Siyue Li, Jack Lee, Kevin Ki-Wai Ho, Michael Tim-Yun Ong, James F. Griffith, Weitian Chen

Abstract: Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D t… ▽ More Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D turbo/fast spin-echo MR images from our institute (in 2020 and 2021) as the source and target datasets, respectively. For each patient, the degree of knee OA was initially graded according to the MRI Osteoarthritis Knee Score (MOAKS) before being converted to binary OA phenotype labels. The proposed UDA pipeline included (a) pre-processing, which involved automatic segmentation and region-of-interest crop**; (b) source classifier training, which involved pre-training phenotype classifiers on the source dataset; (c) target encoder adaptation, which involved unsupervised adaption of the source encoder to the target encoder and (d) target classifier validation, which involved statistical analysis of the target classification performance evaluated by the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity and accuracy. Additionally, a classifier was trained without UDA for comparison. Results: The target classifier trained with UDA achieved improved AUROC, sensitivity, specificity and accuracy for both knee OA phenotypes compared with the classifier trained without UDA. Conclusion: The proposed UDA approach improves the performance of automated knee OA phenotype classification for small target datasets by utilising a large, high-quality source dataset for training. The results successfully demonstrated the advantages of the UDA approach in classification on small datasets. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Junru Zhong and Yongcheng Yao share the same contribution. 17 pages, 4 figures, 4 tables

arXiv:2212.00014 [pdf, other]

Attentional Ptycho-Tomography (APT) for three-dimensional nanoscale X-ray imaging with minimal data acquisition and computation time

Authors: Iksung Kang, Ziling Wu, Yi Jiang, Yudong Yao, Jun**g Deng, Jeffrey Klug, Stefan Vogt, George Barbastathis

Abstract: Noninvasive X-ray imaging of nanoscale three-dimensional objects, e.g. integrated circuits (ICs), generally requires two types of scanning: ptychographic, which is translational and returns estimates of complex electromagnetic field through ICs; and tomographic scanning, which collects complex field projections from multiple angles. Here, we present Attentional Ptycho-Tomography (APT), an approach… ▽ More Noninvasive X-ray imaging of nanoscale three-dimensional objects, e.g. integrated circuits (ICs), generally requires two types of scanning: ptychographic, which is translational and returns estimates of complex electromagnetic field through ICs; and tomographic scanning, which collects complex field projections from multiple angles. Here, we present Attentional Ptycho-Tomography (APT), an approach trained to provide accurate reconstructions of ICs despite incomplete measurements, using a dramatically reduced amount of angular scanning. Training process includes regularizing priors based on typical IC patterns and the physics of X-ray propagation. We demonstrate that APT with 12-time reduced angles achieves fidelity comparable to the gold standard with the original set of angles. With the same set of reduced angles, APT also outperforms baseline reconstruction methods. In our experiments, APT achieves 108-time aggregate reduction in data acquisition and computation without compromising quality. We expect our physics-assisted machine learning framework could also be applied to other branches of nanoscale imaging. △ Less

Submitted 29 November, 2022; originally announced December 2022.

Comments: 27 pages, 7 figures

arXiv:2211.12481 [pdf]

Quality Analysis of Battery Degradation Models with Real Battery Aging Experiment Data

Authors: Cunzhi Zhao, Xingpeng Li, Yan Yao

Abstract: The installation capacity of energy storage system, especially the battery energy storage system (BESS), has increased significantly in recent years, which is mainly applied to mitigate the fluctuation caused by renewable energy sources (RES) due to the fast response and high round-trip energy efficiency of BESS. The main components of majority of BESSs are lithium-ion batteries, which will degrad… ▽ More The installation capacity of energy storage system, especially the battery energy storage system (BESS), has increased significantly in recent years, which is mainly applied to mitigate the fluctuation caused by renewable energy sources (RES) due to the fast response and high round-trip energy efficiency of BESS. The main components of majority of BESSs are lithium-ion batteries, which will degrade during the BESS daily operation. Heuristic battery degradation models are proposed to consider the battery degradation in the operations of energy systems to optimize the scheduling. However, those heuristic models are not evaluated or demonstrated with real battery degradation data. Thus, this paper will perform a quality analysis on the popular heuristic battery degradation models using the real battery aging experiment data to evaluate their performance. A benchmark model is also proposed to represent the real battery degradation value based on the averaged cycle value of the experimental data. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 5 pages

arXiv:2210.08550 [pdf, ps, other]

Assessing the Optimality of LinDist3Flow for Optimal Tap Selection of Step Voltage Regulators in Unbalanced Distribution Networks

Authors: Krishna Sandeep Ayyagari, Sherin Ann Abraham, Yiyun Yao, Shibani Ghosh, Francisco Flores-Espino, Adarsh Nagarajan, Nikolaos Gatsis

Abstract: The adoption of distributed energy resources such as photovoltaics (PVs) has increased dramatically during the previous decade. The increased penetration of PVs into distribution networks (DNs) can cause voltage fluctuations that have to be mitigated. One of the key utility assets employed to this end are step-voltage regulators (SVRs). It is desirable to include tap selection of SVRs in optimal p… ▽ More The adoption of distributed energy resources such as photovoltaics (PVs) has increased dramatically during the previous decade. The increased penetration of PVs into distribution networks (DNs) can cause voltage fluctuations that have to be mitigated. One of the key utility assets employed to this end are step-voltage regulators (SVRs). It is desirable to include tap selection of SVRs in optimal power flow (OPF) routines, a task that turns out to be challenging because the resultant OPF problem is nonconvex with added complexities stemming from accurate SVR modeling. While several convex relaxations based on semi-definite programming (SDP) have been presented in the literature for optimal tap selection, SDP-based schemes do not scale well and are challenging to implement in large-scale planning or operational frameworks. This paper deals with the optimal tap selection (OPTS) problem for wye-connected SVRs using linear approximations of power flow equations. Specifically, the $\textit{LinDist3Flow}$ model is adopted and the effective SVR ratio is assumed to be continuous--enabling the formulation of a problem called $\textit{LinDist3Flow-OPTS}$, which amounts to a linear program. The scalability and optimality gap of $\textit{LinDist3Flow-OPTS}$ are evaluated with respect to existing SDP-based and nonlinear programming techniques for optimal tap selection in three standard feeders, namely, the IEEE 13-bus, 123-bus, and 8500-node DNs. For all DNs considered, $\textit{LinDist3Flow-OPTS}$ achieves an optimality gap of approximately $1\%$ or less while significantly lowering the computational burden. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: Accepted at the 61st IEEE Conference on Decision and Control - Dec. 6-9, 2022, in Cancún, Mexico

arXiv:2209.09408 [pdf, other]

Deep learning at the edge enables real-time streaming ptychographic imaging

Authors: Anakha V Babu, Tao Zhou, Saugat Kandel, Tekin Bicer, Zhengchun Liu, William Judge, Daniel J. Ching, Yi Jiang, Sinisa Veseli, Steven Henke, Ryan Chard, Yudong Yao, Ekaterina Sirazitdinova, Geetika Gupta, Martin V. Holt, Ian T. Foster, Antonino Miceli, Mathew J. Cherukara

Abstract: Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact… ▽ More Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.08513 [pdf, other]

Performance Analysis of Reconfigurable Intelligent Surface Assisted Two-Way NOMA Networks

Authors: Ziwei Liu, Xinwei Yue, Chao Zhang, Yuanwei Liu, Yuanyuan Yao, Yafei Wang, Zhiguo Ding

Abstract: This paper investigates the performance of reconfigurable intelligent surface assisted two-way non-orthogonal multiple access (RIS-TW-NOMA) networks, where a pair of users exchange their information through a RIS. The influence of imperfect successive interference cancellation on RIS-TW-NOMA is taken into account. To evaluate the potential performance of RIS-TW-NOMA, we derive the exact and asympt… ▽ More This paper investigates the performance of reconfigurable intelligent surface assisted two-way non-orthogonal multiple access (RIS-TW-NOMA) networks, where a pair of users exchange their information through a RIS. The influence of imperfect successive interference cancellation on RIS-TW-NOMA is taken into account. To evaluate the potential performance of RIS-TW-NOMA, we derive the exact and asymptotic expressions of outage probability and ergodic rate for a pair of users. Based on the analytical results, the diversity orders and high signal-to-noise ratio (SNR) slopes are obtained in the high SNR regime, which are closely related to the number of RIS elements. Additionally, we analyze the system throughput and energy efficiency of RIS-TW-NOMA networks in both delay-limited and delay-tolerant transmission modes. Numerical results indicate that: 1) The outage behaviors and ergodic rate of RIS-TW-NOMA are superior to that of RIS-TW-OMA and two-way relay OMA (TWR-OMA); 2) As the number of RIS elements increases, the RIS-TW-NOMA networks are capable of achieving the enhanced outage performance; and 3) By comparing with RIS-TW-OMA and TWR-OMA networks, the energy efficiency and system throughput of RIS-TW-NOMA has obvious advantages. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2209.01325 [pdf, ps, other]

Quasi-supervised Learning for Super-resolution PET

Authors: Guangtong Yang, Chen Li, Yudong Yao, Ge Wang, Yueyang Teng

Abstract: Low resolution of positron emission tomography (PET) limits its diagnostic performance. Deep learning has been successfully applied to achieve super-resolution PET. However, commonly used supervised learning methods in this context require many pairs of low- and high-resolution (LR and HR) PET images. Although unsupervised learning utilizes unpaired images, the results are not as good as that obta… ▽ More Low resolution of positron emission tomography (PET) limits its diagnostic performance. Deep learning has been successfully applied to achieve super-resolution PET. However, commonly used supervised learning methods in this context require many pairs of low- and high-resolution (LR and HR) PET images. Although unsupervised learning utilizes unpaired images, the results are not as good as that obtained with supervised deep learning. In this paper, we propose a quasi-supervised learning method, which is a new type of weakly-supervised learning methods, to recover HR PET images from LR counterparts by leveraging similarity between unpaired LR and HR image patches. Specifically, LR image patches are taken from a patient as inputs, while the most similar HR patches from other patients are found as labels. The similarity between the matched HR and LR patches serves as a prior for network construction. Our proposed method can be implemented by designing a new network or modifying an existing network. As an example in this study, we have modified the cycle-consistent generative adversarial network (CycleGAN) for super-resolution PET. Our numerical and experimental results qualitatively and quantitatively show the merits of our method relative to the state-ofthe-art methods. The code is publicly available at https://github.com/PigYang-ops/CycleGAN-QSDL. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 12 pages, 11 figures

arXiv:2208.02405 [pdf, other]

Transformer Convolutional Neural Networks for Automated Artifact Detection in Scalp EEG

Authors: Wei Yan Peh, Yuanyuan Yao, Justin Dauwels

Abstract: It is well known that electroencephalograms (EEGs) often contain artifacts due to muscle activity, eye blinks, and various other causes. Detecting such artifacts is an essential first step toward a correct interpretation of EEGs. Although much effort has been devoted to semi-automated and automated artifact detection in EEG, the problem of artifact detection remains challenging. In this paper, we… ▽ More It is well known that electroencephalograms (EEGs) often contain artifacts due to muscle activity, eye blinks, and various other causes. Detecting such artifacts is an essential first step toward a correct interpretation of EEGs. Although much effort has been devoted to semi-automated and automated artifact detection in EEG, the problem of artifact detection remains challenging. In this paper, we propose a convolutional neural network (CNN) enhanced by transformers using belief matching (BM) loss for automated detection of five types of artifacts: chewing, electrode pop, eye movement, muscle, and shiver. Specifically, we apply these five detectors at individual EEG channels to distinguish artifacts from background EEG. Next, for each of these five types of artifacts, we combine the output of these channel-wise detectors to detect artifacts in multi-channel EEG segments. These segment-level classifiers can detect specific artifacts with a balanced accuracy (BAC) of 0.947, 0.735, 0.826, 0.857, and 0.655 for chewing, electrode pop, eye movement, muscle, and shiver artifacts, respectively. Finally, we combine the outputs of the five segment-level detectors to perform a combined binary classification (any artifact vs. background). The resulting detector achieves a sensitivity (SEN) of 60.4%, 51.8%, and 35.5%, at a specificity (SPE) of 95%, 97%, and 99%, respectively. This artifact detection module can reject artifact segments while only removing a small fraction of the background EEG, leading to a cleaner EEG for further analysis. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: This is an extension to a paper presented at the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) Scottish Event Campus, Glasgow, UK, July 11-15, 2022

arXiv:2208.01880 [pdf, other]

Joint Sensing and Communications for Deep Reinforcement Learning-based Beam Management in 6G

Authors: Yujie Yao, Hao Zhou, Melike Erol-Kantarci

Abstract: User location is a piece of critical information for network management and control. However, location uncertainty is unavoidable in certain settings leading to localization errors. In this paper, we consider the user location uncertainty in the mmWave networks, and investigate joint vision-aided sensing and communications using deep reinforcement learning-based beam management for future 6G netwo… ▽ More User location is a piece of critical information for network management and control. However, location uncertainty is unavoidable in certain settings leading to localization errors. In this paper, we consider the user location uncertainty in the mmWave networks, and investigate joint vision-aided sensing and communications using deep reinforcement learning-based beam management for future 6G networks. In particular, we first extract pixel characteristic-based features from satellite images to improve localization accuracy. Then we propose a UK-medoids based method for user clustering with location uncertainty, and the clustering results are consequently used for the beam management. Finally, we apply the DRL algorithm for intra-beam radio resource allocation. The simulations first show that our proposed vision-aided method can substantially reduce the localization error. The proposed UK-medoids and DRL based scheme (UKM-DRL) is compared with two other schemes: K-means based clustering and DRL based resource allocation (K-DRL) and UK-means based clustering and DRL based resource allocation (UK-DRL). The proposed method has 17.2% higher throughput and 7.7% lower delay than UK-DRL, and more than doubled throughput and 55.8% lower delay than K-DRL. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2208.00025 [pdf, other]

Six-center Assessment of CNN-Transformer with Belief Matching Loss for Patient-independent Seizure Detection in EEG

Authors: Wei Yan Peh, Prasanth Thangavel, Yuanyuan Yao, John Thomas, Yee Leng Tan, Justin Dauwels

Abstract: Neurologists typically identify epileptic seizures from electroencephalograms (EEGs) by visual inspection. This process is often time-consuming, especially for EEG recordings that last hours or days. To expedite the process, a reliable, automated, and patient-independent seizure detector is essential. However, develo** a patient-independent seizure detector is challenging as seizures exhibit div… ▽ More Neurologists typically identify epileptic seizures from electroencephalograms (EEGs) by visual inspection. This process is often time-consuming, especially for EEG recordings that last hours or days. To expedite the process, a reliable, automated, and patient-independent seizure detector is essential. However, develo** a patient-independent seizure detector is challenging as seizures exhibit diverse characteristics across patients and recording devices. In this study, we propose a patient-independent seizure detector to automatically detect seizures in both scalp EEG and intracranial EEG (iEEG). First, we deploy a convolutional neural network with transformers and belief matching loss to detect seizures in single-channel EEG segments. Next, we extract regional features from the channel-level outputs to detect seizures in multi-channel EEG segments. At last, we apply postprocessing filters to the segment-level outputs to determine seizures' start and end points in multi-channel EEGs. Finally, we introduce the minimum overlap evaluation scoring as an evaluation metric that accounts for minimum overlap between the detection and seizure, improving upon existing assessment metrics. We trained the seizure detector on the Temple University Hospital Seizure (TUH-SZ) dataset and evaluated it on five independent EEG datasets. We evaluate the systems with the following metrics: sensitivity (SEN), precision (PRE), and average and median false positive rate per hour (aFPR/h and mFPR/h). Across four adult scalp EEG and iEEG datasets, we obtained SEN of 0.617-1.00, PRE of 0.534-1.00, aFPR/h of 0.425-2.002, and mFPR/h of 0-1.003. The proposed seizure detector can detect seizures in adult EEGs and takes less than 15s for a 30 minutes EEG. Hence, this system could aid clinicians in reliably identifying seizures expeditiously, allocating more time for devising proper treatment. △ Less

Submitted 22 November, 2022; v1 submitted 29 July, 2022; originally announced August 2022.

Comments: Submitting to IJNS

arXiv:2206.08751 [pdf, other]

Perceptual Quality Assessment of Virtual Reality Videos in the Wild

Authors: Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

Abstract: Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomin… ▽ More Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomings, we construct the VR Video Quality in the Wild (VRVQW) database, containing $502$ user-generated videos with diverse content and distortion characteristics. Based on VRVQW, we conduct a formal psychophysical experiment to record the scanpaths and perceived quality scores from $139$ participants under two different viewing conditions. We provide a thorough statistical analysis of the recorded data, observing significant impact of viewing conditions on both human scanpaths and perceived quality. Moreover, we develop an objective quality assessment model for VR videos based on pseudocylindrical representation and convolution. Results on the proposed VRVQW show that our method is superior to existing video quality assessment models. We have made the database and code available at https://github.com/limuhit/VR-Video-Quality-in-the-Wild. △ Less

Submitted 15 March, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:2206.06145 [pdf]

Identification of cancer-kee** genes as therapeutic targets by finding network control hubs

Authors: Xizhe Zhang, Chunyu Pan, Xinru Wei, Meng Yu, Shuangjie Liu, Jun An, Jie** Yang, Baojun Wei, Wenjun Hao, Yang Yao, Yuyan Zhu, Weixiong Zhang

Abstract: Finding cancer driver genes has been a focal theme of cancer research and clinical studies. One of the recent approaches is based on network structural controllability that focuses on finding a control scheme and driver genes that can steer the cell from an arbitrary state to a designated state. While theoretically sound, this approach is impractical for many reasons, e.g., the control scheme is o… ▽ More Finding cancer driver genes has been a focal theme of cancer research and clinical studies. One of the recent approaches is based on network structural controllability that focuses on finding a control scheme and driver genes that can steer the cell from an arbitrary state to a designated state. While theoretically sound, this approach is impractical for many reasons, e.g., the control scheme is often not unique and half of the nodes may be driver genes for the cell. We developed a novel approach that transcends structural controllability. Instead of considering driver genes for one control scheme, we considered control hub genes that reside in the middle of a control path of every control scheme. Control hubs are the most vulnerable spots for controlling the cell and exogenous stimuli on them may render the cell uncontrollable. We adopted control hubs as cancer-keep genes (CKGs) and applied them to a gene regulatory network of bladder cancer (BLCA). All the genes on the cell cycle and p53 singling pathways in BLCA are CKGs, confirming the importance of these genes and the two pathways in cancer. A smaller set of 35 sensitive CKGs (sCKGs) for BLCA was identified by removing network links. Six sCKGs (RPS6KA3, FGFR3, N-cadherin (CDH2), EP300, caspase-1, and FN1) were subjected to small-interferencing-RNA knockdown in four cell lines to validate their effects on the proliferation or migration of cancer cells. Knocking down RPS6KA3 in a mouse model of BLCA significantly inhibited the growth of tumor xenografts in the mouse model. Combined, our results demonstrated the value of CKGs as therapeutic targets for cancer therapy and the potential of CKGs as an effective means for studying and characterizing cancer etiology. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: Contact the corresponding authors for supplementary material

arXiv:2205.10758 [pdf, other]

doi 10.1109/EMBC48229.2022.9871233

Residual Channel Attention Network for Brain Glioma Segmentation

Authors: Yiming Yao, Peisheng Qian, Ziyuan Zhao, Zeng Zeng

Abstract: A glioma is a malignant brain tumor that seriously affects cognitive functions and lowers patients' life quality. Segmentation of brain glioma is challenging because of interclass ambiguities in tumor regions. Recently, deep learning approaches have achieved outstanding performance in the automatic segmentation of brain glioma. However, existing algorithms fail to exploit channel-wise feature inte… ▽ More A glioma is a malignant brain tumor that seriously affects cognitive functions and lowers patients' life quality. Segmentation of brain glioma is challenging because of interclass ambiguities in tumor regions. Recently, deep learning approaches have achieved outstanding performance in the automatic segmentation of brain glioma. However, existing algorithms fail to exploit channel-wise feature interdependence to select semantic attributes for glioma segmentation. In this study, we implement a novel deep neural network that integrates residual channel attention modules to calibrate intermediate features for glioma segmentation. The proposed channel attention mechanism adaptively weights feature channel-wise to optimize the latent representation of gliomas. We evaluate our method on the established dataset BraTS2017. Experimental results indicate the superiority of our method. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2205.08143 [pdf, other]

doi 10.1016/j.ultrasmedbio.2023.11.009

Brachial Plexus Nerve Trunk Segmentation Using Deep Learning: A Comparative Study with Doctors' Manual Segmentation

Authors: Yu Wang, Binbin Zhu, Lingsi Kong, Jianlin Wang, Bin Gao, Jianhua Wang, Dingcheng Tian, Yudong Yao

Abstract: Ultrasound-guided nerve block anesthesia (UGNB) is a high-tech visual nerve block anesthesia method that can observe the target nerve and its surrounding structures, the puncture needle's advancement, and local anesthetics spread in real-time. The key in UGNB is nerve identification. With the help of deep learning methods, the automatic identification or segmentation of nerves can be realized, ass… ▽ More Ultrasound-guided nerve block anesthesia (UGNB) is a high-tech visual nerve block anesthesia method that can observe the target nerve and its surrounding structures, the puncture needle's advancement, and local anesthetics spread in real-time. The key in UGNB is nerve identification. With the help of deep learning methods, the automatic identification or segmentation of nerves can be realized, assisting doctors in completing nerve block anesthesia accurately and efficiently. Here, we establish a public dataset containing 320 ultrasound images of brachial plexus (BP). Three experienced doctors jointly produce the BP segmentation ground truth and label brachial plexus trunks. We design a brachial plexus segmentation system (BPSegSys) based on deep learning. BPSegSys achieves experienced-doctor-level nerve identification performance in various experiments. We evaluate BPSegSys' performance in terms of intersection-over-union (IoU), a commonly used performance measure for segmentation experiments. Considering three dataset groups in our established public dataset, the IoU of BPSegSys are 0.5238, 0.4715, and 0.5029, respectively, which exceed the IoU 0.5205, 0.4704, and 0.4979 of experienced doctors. In addition, we show that BPSegSys can help doctors identify brachial plexus trunks more accurately, with IoU improvement up to 27%, which has significant clinical application value. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 9 pages

Journal ref: [J]. Ultrasound in Medicine & Biology, 2024, 50(3): 374-383

arXiv:2204.10984 [pdf, other]

Deep Reinforcement Learning-based Radio Resource Allocation and Beam Management under Location Uncertainty in 5G mmWave Networks

Authors: Yujie Yao, Hao Zhou, Melike Erol-Kantarci

Abstract: Millimeter Wave (mmWave) is an important part of 5G new radio (NR), in which highly directional beams are adapted to compensate for the substantial propagation loss based on UE locations. However, the location information may have some errors such as GPS errors. In any case, some uncertainty, and localization error is unavoidable in most settings. Applying these distorted locations for clustering… ▽ More Millimeter Wave (mmWave) is an important part of 5G new radio (NR), in which highly directional beams are adapted to compensate for the substantial propagation loss based on UE locations. However, the location information may have some errors such as GPS errors. In any case, some uncertainty, and localization error is unavoidable in most settings. Applying these distorted locations for clustering will increase the error of beam management. Meanwhile, the traffic demand may change dynamically in the wireless environment. Therefore, a scheme that can handle both the uncertainty of localization and dynamic radio resource allocation is needed. In this paper, we propose a UK-means-based clustering and deep reinforcement learning-based resource allocation algorithm (UK-DRL) for radio resource allocation and beam management in 5G mmWave networks. We first apply UK-means as the clustering algorithm to mitigate the localization uncertainty, then deep reinforcement learning (DRL) is adopted to dynamically allocate radio resources. Finally, we compare the UK-DRL with K-means-based clustering and DRL-based resource allocation algorithm (K-DRL), the simulations show that our proposed UK-DRL-based method achieves 150% higher throughput and 61.5% lower delay compared with K-DRL when traffic load is 4Mbps. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted to 2022 IEEE Symposium on Computers and Communications)

arXiv:2202.12940 [pdf]

Fully-integrated multipurpose microwave frequency identification system on a single chip

Authors: Yuhan Yao, Yuhe Zhao, Yanxian Wei, Feng Zhou, Daigao Chen, Yuguang Zhang, Xi Xiao, Ming Li, Jianji Dong, Shaohua Yu, Xinliang Zhang

Abstract: We demonstrate a fully-integrated multipurpose microwave frequency identification system on silicon-on-insulator platform. Thanks to its multipurpose features, the chip is able to identify different types of microwave signals, including single-frequency, multiple-frequency, chirped and frequency-hop** microwave signals, as well as discriminate instantaneous frequency variation among the frequenc… ▽ More We demonstrate a fully-integrated multipurpose microwave frequency identification system on silicon-on-insulator platform. Thanks to its multipurpose features, the chip is able to identify different types of microwave signals, including single-frequency, multiple-frequency, chirped and frequency-hop** microwave signals, as well as discriminate instantaneous frequency variation among the frequency-modulated signals. This demonstration exhibits fully integrated solution and fully functional microwave frequency identification, which can meet the requirements in reduction of size, weight and power for future advanced microwave photonic processor. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 23 pages,6 figures

arXiv:2202.09953 [pdf]

doi 10.1016/j.isprsjprs.2021.11.003

LiDAR-guided Stereo Matching with a Spatial Consistency Constraint

Authors: Yongjun Zhang, Siyuan Zou, Xinyi Liu, Xu Huang, Yi Wan, Yongxiang Yao

Abstract: The complementary fusion of light detection and ranging (LiDAR) data and image data is a promising but challenging task for generating high-precision and high-density point clouds. This study proposes an innovative LiDAR-guided stereo matching approach called LiDAR-guided stereo matching (LGSM), which considers the spatial consistency represented by continuous disparity or depth changes in the hom… ▽ More The complementary fusion of light detection and ranging (LiDAR) data and image data is a promising but challenging task for generating high-precision and high-density point clouds. This study proposes an innovative LiDAR-guided stereo matching approach called LiDAR-guided stereo matching (LGSM), which considers the spatial consistency represented by continuous disparity or depth changes in the homogeneous region of an image. The LGSM first detects the homogeneous pixels of each LiDAR projection point based on their color or intensity similarity. Next, we propose a riverbed enhancement function to optimize the cost volume of the LiDAR projection points and their homogeneous pixels to improve the matching robustness. Our formulation expands the constraint scopes of sparse LiDAR projection points with the guidance of image information to optimize the cost volume of pixels as much as possible. We applied LGSM to semi-global matching and AD-Census on both simulated and real datasets. When the percentage of LiDAR points in the simulated datasets was 0.16%, the matching accuracy of our method achieved a subpixel level, while that of the original stereo matching algorithm was 3.4 pixels. The experimental results show that LGSM is suitable for indoor, street, aerial, and satellite image datasets and provides good transferability across semi-global matching and AD-Census. Furthermore, the qualitative and quantitative evaluations demonstrate that LGSM is superior to two state-of-the-art optimizing cost volume methods, especially in reducing mismatches in difficult matching areas and refining the boundaries of objects. △ Less

Submitted 24 February, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: we replace an article because of the addition of journal reference, DOI, and report number information

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing Volume 183(2021) 164-177

arXiv:2202.09020 [pdf, other]

A Comprehensive Survey with Quantitative Comparison of Image Analysis Methods for Microorganism Biovolume Measurements

Authors: Jiawei Zhang, Chen Li, Md Mamunur Rahaman, Yudong Yao, **li Ma, **ghua Zhang, Xin Zhao, Tao Jiang, Marcin Grzegorzek

Abstract: With the acceleration of urbanization and living standards, microorganisms play increasingly important roles in industrial production, bio-technique, and food safety testing. Microorganism biovolume measurements are one of the essential parts of microbial analysis. However, traditional manual measurement methods are time-consuming and challenging to measure the characteristics precisely. With the… ▽ More With the acceleration of urbanization and living standards, microorganisms play increasingly important roles in industrial production, bio-technique, and food safety testing. Microorganism biovolume measurements are one of the essential parts of microbial analysis. However, traditional manual measurement methods are time-consuming and challenging to measure the characteristics precisely. With the development of digital image processing techniques, the characteristics of the microbial population can be detected and quantified. The changing trend can be adjusted in time and provided a basis for the improvement. The applications of the microorganism biovolume measurement method have developed since the 1980s. More than 62 articles are reviewed in this study, and the articles are grouped by digital image segmentation methods with periods. This study has high research significance and application value, which can be referred to microbial researchers to have a comprehensive understanding of microorganism biovolume measurements using digital image analysis methods and potential applications. △ Less

Submitted 2 May, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.08552 [pdf, other]

EBHI:A New Enteroscope Biopsy Histopathological H&E Image Dataset for Image Classification Evaluation

Authors: Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, Haoyuan Chen, Wanli Liu, Yudong Yao, Hongzan Sun, Ning Xu, Xinyu Huang, Marcin Grzegorze

Abstract: Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal c… ▽ More Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal cancer, especially enteroscope biopsies, hinders the accurate evaluation of computer-aided diagnosis techniques. Methods: A new publicly available Enteroscope Biopsy Histopathological H&E Image Dataset (EBHI) is published in this paper. To demonstrate the effectiveness of the EBHI dataset, we have utilized several machine learning, convolutional neural networks and novel transformer-based classifiers for experimentation and evaluation, using an image with a magnification of 200x. Results: Experimental results show that the deep learning method performs well on the EBHI dataset. Traditional machine learning methods achieve maximum accuracy of 76.02% and deep learning method achieves a maximum accuracy of 95.37%. Conclusion: To the best of our knowledge, EBHI is the first publicly available colorectal histopathology enteroscope biopsy dataset with four magnifications and five types of images of tumor differentiation stages, totaling 5532 images. We believe that EBHI could attract researchers to explore new classification algorithms for the automated diagnosis of colorectal cancer, which could help physicians and patients in clinical settings. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2201.07232 [pdf, other]

Real-time X-ray Phase-contrast Imaging Using SPINNet -- A Speckle-based Phase-contrast Imaging Neural Network

Authors: Zhi Qiao, Xianbo Shi, Yudong Yao, Michael J. Wojcik, Luca Rebuffi, Mathew J. Cherukara, Lahsen Assoufid

Abstract: X-ray phase-contrast imaging has become indispensable for visualizing samples with low absorption contrast. In this regard, speckle-based techniques have shown significant advantages in spatial resolution, phase sensitivity, and implementation flexibility compared with traditional methods. However, their computational cost has hindered their wider adoption. By exploiting the power of deep learning… ▽ More X-ray phase-contrast imaging has become indispensable for visualizing samples with low absorption contrast. In this regard, speckle-based techniques have shown significant advantages in spatial resolution, phase sensitivity, and implementation flexibility compared with traditional methods. However, their computational cost has hindered their wider adoption. By exploiting the power of deep learning, we developed a new speckle-based phase-contrast imaging neural network (SPINNet) that boosts the phase retrieval speed by at least two orders of magnitude compared to existing methods. To achieve this performance, we combined SPINNet with a novel coded-mask-based technique, an enhanced version of the speckle-based method. Using this scheme, we demonstrate a simultaneous reconstruction of absorption and phase images on the order of 100 ms, where a traditional correlation-based analysis would take several minutes even with a cluster. In addition to significant improvement in speed, our experimental results show that the imaging resolution and phase retrieval quality of SPINNet outperform existing single-shot speckle-based methods. Furthermore, we successfully demonstrate its application in 3D X-ray phase-contrast tomography. Our result shows that SPINNet could enable many applications requiring high-resolution and fast data acquisition and processing, such as in-situ and in-operando 2D and 3D phase-contrast imaging and real-time at-wavelength metrology and wavefront sensing. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 13 pages, 8 figures

arXiv:2201.04809 [pdf, other]

Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks

Authors: Yuchong Yao, Xiaohui Wangr, Yuanbang Ma, Han Fang, Jiaying Wei, Liyuan Chen, Ali Anaissi, Ali Braytee

Abstract: Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore t… ▽ More Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore the balance to the data. The former pre-trains the autoencoder weights in an unsupervised manner. However, it is unstable when the images from different categories have similar features. The latter is improved based on BAGAN by facilitating supervised autoencoder training, but the pre-training is biased towards the majority classes. In this work, we propose a novel Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images. In particular, we utilize a conditional convolutional variational autoencoder with supervised and balanced pre-training for the GAN initialization and training with gradient penalty. Our proposed method presents a superior performance of other state-of-the-art methods on the highly imbalanced version of MNIST, Fashion-MNIST, CIFAR-10, and two medical imaging datasets. Our method can synthesize high-quality minority samples in terms of Fréchet inception distance, structural similarity index measure and perceptual quality. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2111.04615

Safe Control of Arbitrary Nonlinear Systems using Dynamic Extension

Authors: Yihang Yao, Tianhao Wei, Changliu Liu

Abstract: Safe control for control-affine systems has been extensively studied. However, due to the complexity of system dynamics, it is challenging and time-consuming to apply these methods directly to non-control-affine systems, which cover a large group of dynamic systems, such as UAVs and systems with data-driven Neural Network Dynamic Models (NNDMs). Although all dynamic systems can be written in contr… ▽ More Safe control for control-affine systems has been extensively studied. However, due to the complexity of system dynamics, it is challenging and time-consuming to apply these methods directly to non-control-affine systems, which cover a large group of dynamic systems, such as UAVs and systems with data-driven Neural Network Dynamic Models (NNDMs). Although all dynamic systems can be written in control-affine forms through dynamic extension, it remains unclear how to optimally design a computationally efficient algorithm to safely control the extended system. This paper addresses this challenge by proposing an optimal approach to synthesize safe control for the extended system under the framework of energy-function-based safe control. The proposed method first extends the energy function and then performs hyperparameter optimization to maximize performance while guaranteeing safety. It has been theoretically proved that our method guarantees safety (forward invariance of the safe set) and performance (bounded tracking error and smoother trajectories). It has been numerically validated that the proposed method is computationally efficient for non-control-affine systems. △ Less

Submitted 15 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: We are not confident about the content. This paper needs further inspection to be published

Showing 1–50 of 87 results for author: Yao, Y