Search | arXiv e-print repository

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness

Abstract: Recent advancements in Natural Language Processing (NLP) have seen Large-scale Language Models (LLMs) excel at producing high-quality text for various purposes. Notably, in Text-To-Speech (TTS) systems, the integration of BERT for semantic token generation has underscored the importance of semantic content in producing coherent speech outputs. Despite this, the specific utility of LLMs in enhancin… ▽ More Recent advancements in Natural Language Processing (NLP) have seen Large-scale Language Models (LLMs) excel at producing high-quality text for various purposes. Notably, in Text-To-Speech (TTS) systems, the integration of BERT for semantic token generation has underscored the importance of semantic content in producing coherent speech outputs. Despite this, the specific utility of LLMs in enhancing TTS synthesis remains considerably limited. This research introduces an innovative approach, Llama-VITS, which enhances TTS synthesis by enriching the semantic content of text using LLM. Llama-VITS integrates semantic embeddings from Llama2 with the VITS model, a leading end-to-end TTS framework. By leveraging Llama2 for the primary speech synthesis process, our experiments demonstrate that Llama-VITS matches the naturalness of the original VITS (ORI-VITS) and those incorporate BERT (BERT-VITS), on the LJSpeech dataset, a substantial collection of neutral, clear speech. Moreover, our method significantly enhances emotive expressiveness on the EmoV_DB_bea_sem dataset, a curated selection of emotionally consistent speech from the EmoV_DB dataset, highlighting its potential to generate emotive speech. △ Less

Submitted 17 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 9 pages, 2 figures, 4 tables; accepted at LREC-COLING 2024

arXiv:2402.14099 [pdf, other]

EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy

Authors: Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Rui Zhang, Quan Chen, Kai Ding

Abstract: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment o… ▽ More Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment of NSCLC. Manual segmentation is time and labor-consuming and causes delays in treatment initiation. Although many lung nodule detection methods, including deep learning-based models, have been proposed, there is still a long-standing problem of high false positives (FPs) with most of these methods. Here, we developed an electronic health record (EHR) guided lung tumor auto-segmentation called EXACT-Net (EHR-enhanced eXACtitude in Tumor segmentation), where the extracted information from EHRs using a pre-trained large language model (LLM), was used to remove the FPs and keep the TP nodules only. The auto-segmentation model was trained on NSCLC patients' computed tomography (CT), and the pre-trained LLM was used with the zero-shot learning approach. Our approach resulted in a 250% boost in successful nodule detection using the data from ten NSCLC patients treated in our institution. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2308.13993 [pdf]

Comprehensive performance comparison among different types of features in data-driven battery state of health estimation

Authors: Xinhong Feng, Yongzhi Zhang, Rui Xiong, Chun Wang

Abstract: Battery state of health (SOH), which informs the maximal available capacity of the battery, is a key indicator of battery aging failure. Accurately estimating battery SOH is a vital function of the battery management system that remains to be addressed. In this study, a physics-informed Gaussian process regression (GPR) model is developed for battery SOH estimation, with the performance being syst… ▽ More Battery state of health (SOH), which informs the maximal available capacity of the battery, is a key indicator of battery aging failure. Accurately estimating battery SOH is a vital function of the battery management system that remains to be addressed. In this study, a physics-informed Gaussian process regression (GPR) model is developed for battery SOH estimation, with the performance being systematically compared with that of different types of features and machine learning (ML) methods. The method performance is validated based on 58826 cycling data units of 118 cells. Experimental results show that the physics-driven ML generally estimates more accurate SOH than other non-physical features under different scenarios. The physical features-based GPR predicts battery SOH with the errors being less than 1.1% based on 10 to 20 mins' relaxation data. And the high robustness and generalization capability of the methodology is also validated against different ratios of training and test data under unseen conditions. Results also highlight the more effective capability of knowledge transfer between different types of batteries with the physical features and GPR. This study demonstrates the excellence of physical features in indicating the state evolution of complex systems, and the improved indication performance of these features by combining a suitable ML method. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2302.10193 [pdf]

Non-line-of-sight photoacoustic imaging

Authors: Yuting Shen, Xiaohua Feng, Fei Gao

Abstract: Photoacoustic imaging is a promising imaging technique for human brain due to its high sensitivity and functional imaging ability. However, the skull would cause strong attenuation and distortion to the photoacoustic signals, which makes non-invasive transcranial imaging difficult. In this work, the temporal bone is selected as an imaging window to minimize the influence of the skull. Moreover, no… ▽ More Photoacoustic imaging is a promising imaging technique for human brain due to its high sensitivity and functional imaging ability. However, the skull would cause strong attenuation and distortion to the photoacoustic signals, which makes non-invasive transcranial imaging difficult. In this work, the temporal bone is selected as an imaging window to minimize the influence of the skull. Moreover, non-line-of-sight photoacoustic imaging is introduced to enhance the field of view, where the skull is considered as a reflector. Simulation studies are carried out to show that the image quality can be improved with reflected signal considered. △ Less

Submitted 18 February, 2023; originally announced February 2023.

arXiv:2301.06321 [pdf]

Deep-learning-based on-chip rapid spectral imaging with high spatial resolution

Authors: Jiawei Yang, Kaiyu Cui, Yidong Huang, Wei Zhang, Xue Feng, Fang Liu

Abstract: Spectral imaging extends the concept of traditional color cameras to capture images across multiple spectral channels and has broad application prospects. Conventional spectral cameras based on scanning methods suffer from low acquisition speed and large volume. On-chip computational spectral imaging based on metasurface filters provides a promising scheme for portable applications, but endures lo… ▽ More Spectral imaging extends the concept of traditional color cameras to capture images across multiple spectral channels and has broad application prospects. Conventional spectral cameras based on scanning methods suffer from low acquisition speed and large volume. On-chip computational spectral imaging based on metasurface filters provides a promising scheme for portable applications, but endures long computation time for point-by-point iterative spectral reconstruction and mosaic effect in the reconstructed spectral images. In this study, we demonstrated on-chip rapid spectral imaging eliminating the mosaic effect in the spectral image by deep-learning-based spectral data cube reconstruction. We experimentally achieved four orders of magnitude speed improvement than iterative spectral reconstruction and high fidelity of spectral reconstruction over 99% for a standard color board. In particular, we demonstrated video-rate spectral imaging for moving objects and outdoor driving scenes with good performance for recognizing metamerism, where the concolorous sky and white cars can be distinguished via their spectra, showing great potential for autonomous driving and other practical applications in the field of intelligent perception. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2212.07778 [pdf, other]

doi 10.1109/TPAMI.2024.3359326

Efficient Visual Computing with Camera RAW Snapshots

Authors: Zhihao Li, Ming Lu, Xu Zhang, Xin Feng, M. Salman Asif, Zhan Ma

Abstract: Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP… ▽ More Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $ρ$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $ρ$-Vision that eliminates the ISP are the potential reductions in computations and processing times. △ Less

Submitted 25 January, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted by T-PAMI 2024. Homepage: https://njuvision.github.io/rho-vision

arXiv:2207.12941 [pdf, other]

Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

Authors: Fengjun Li, Xin Feng, Fanglin Chen, Guangming Lu, Wenjie Pei

Abstract: Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily… ▽ More Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily true. The real-world degradations can be beyond the simulation scope by the handcrafted degradations, which are referred to as novel degradations. In this work, we propose to learn a latent representation space for degradations, which can be generalized from handcrafted (base) degradations to novel degradations. The obtained representations for a novel degradation in this latent space are then leveraged to generate degraded images consistent with the novel degradation to compose paired training data for SR model. Furthermore, we perform variational inference to match the posterior of degradations in latent representation space with a prior distribution (e.g., Gaussian distribution). Consequently, we are able to sample more high-quality representations for a novel degradation to augment the training data for SR model. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness and advantages of our method for blind super-resolution with novel degradations. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2206.02705 [pdf]

Human Behavior Recognition Method Based on CEEMD-ES Radar Selection

Authors: Zhaolin Zhang, Mingqi Song, Wugang Meng, Yuhan Liu, Fengcong Li, Xiang Feng, Yinan Zhao

Abstract: In recent years, the millimeter-wave radar to identify human behavior has been widely used in medical,security, and other fields. When multiple radars are performing detection tasks, the validity of the features contained in each radar is difficult to guarantee. In addition, processing multiple radar data also requires a lot of time and computational cost. The Complementary Ensemble Empirical Mode… ▽ More In recent years, the millimeter-wave radar to identify human behavior has been widely used in medical,security, and other fields. When multiple radars are performing detection tasks, the validity of the features contained in each radar is difficult to guarantee. In addition, processing multiple radar data also requires a lot of time and computational cost. The Complementary Ensemble Empirical Mode Decomposition-Energy Slice (CEEMD-ES) multistatic radar selection method is proposed to solve these problems. First, this method decomposes and reconstructs the radar signal according to the difference in the reflected echo frequency between the limbs and the trunk of the human body. Then, the radar is selected according to the difference between the ratio of echo energy of limbs and trunk and the theoretical value. The time domain, frequency domain and various entropy features of the selected radar are extracted. Finally, the Extreme Learning Machine (ELM) recognition model of the ReLu core is established. Experiments show that this method can effectively select the radar, and the recognition rate of three kinds of human actions is 98.53%. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: 4 pages, 5 figures

arXiv:2205.09933 [pdf, other]

Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive Review

Authors: Xin-Ru Feng, Heng-Chao Li, Rui Wang, Qian Du, ** Jia, Antonio Plaza

Abstract: Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a… ▽ More Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a baseline, we show how to improve NMF by utilizing the main properties of HSIs (e.g., spectral, spatial, and structural information). We categorize three important development directions including constrained NMF, structured NMF, and generalized NMF. Furthermore, several experiments are conducted to illustrate the effectiveness of associated algorithms. Finally, we conclude the article with possible future directions with the purposes of providing guidelines and inspiration to promote the development of hyperspectral unmixing. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2203.04767 [pdf, other]

A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling

Authors: Yike Zhang, Xiaobing Feng, Yi Liu, Songjun Cao, Long Ma

Abstract: Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framew… ▽ More Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framework consists of three core parts: a basic ASR module to generate n-best lists of a speech query, a text classification module to determine which domain the speech query belongs to, and a reranking module to rescore n-best lists using domain-specific language models. In addition, an instance sampling based method to training neural network language models (NNLMs) is proposed to address the data imbalance problem in multi-domain ASR. In experiments, the proposed framework was evaluated on navigation domain and music domain, since navigating and playing music are two main features of Tencent Map. Compared to a general ASR system, the proposed framework achieves a relative 13% $\sim$ 22% character error rate reduction on several test sets collected from Tencent Map and our in-car voice assistant. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 7 pages, 1 figure

arXiv:2112.10074 [pdf, other]

doi 10.59275/j.melba.2022-354b

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Develo** scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS. △ Less

Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2111.10633 [pdf, other]

Sparse Tensor-based Multiscale Representation for Point Cloud Geometry Compression

Authors: Jianqiang Wang, Dandan Ding, Zhu Li, Xiaoxing Feng, Chuntong Cao, Zhan Ma

Abstract: This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also all… ▽ More This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also allows us to compress scale-wise MP-POVs by exploiting cross-scale and same-scale correlations extensively and flexibly. The overall compression efficiency highly depends on the accuracy of estimated occupancy probability for each MP-POV. Thus, we first design the Sparse Convolution-based Neural Network (SparseCNN) which stacks sparse convolutions and voxel sampling to best characterize and embed spatial correlations. We then develop the SparseCNN-based Occupancy Probability Approximation (SOPA) model to estimate the occupancy probability either in a single-stage manner only using the cross-scale correlation, or in a multi-stage manner by exploiting stage-wise correlation among same-scale neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to aggregate local variations as spatial priors in feature attribute to improve the SOPA. Our unified approach not only shows state-of-the-art performance in both lossless and lossy compression modes across a variety of datasets including the dense object PCGs (8iVFB, Owlii, MUVB) and sparse LiDAR PCGs (KITTI, Ford) when compared with standardized MPEG G-PCC and other prevalent learning-based schemes, but also has low complexity which is attractive to practical applications. △ Less

Submitted 21 October, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: 17 pages, 15 figures

arXiv:2108.04016 [pdf, other]

Deep Learning methods for automatic evaluation of delayed enhancement-MRI. The results of the EMIDEC challenge

Authors: Alain Lalande, Zhihao Chen, Thibaut Pommier, Thomas Decourselle, Abdul Qayyum, Michel Salomon, Dominique Ginhac, Youssef Skandarani, Arnaud Boucher, Khawla Brahim, Marleen de Bruijne, Robin Camarasa, Teresa M. Correia, Xue Feng, Kibrom B. Girum, Anja Hennemuth, Markus Huellebrand, Raabid Hussain, Matthias Ivantsits, Jun Ma, Craig Meyer, Rishabh Sharma, Jixi Shi, Nikolaos V. Tsekos, Marta Varela , et al. (8 additional authors not shown)

Abstract: A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to eva… ▽ More A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to evaluate the extent of MI. To automatically assess myocardial status, the results of the EMIDEC challenge that focused on this task are presented in this paper. The challenge's main objectives were twofold. First, to evaluate if deep learning methods can distinguish between normal and pathological cases. Second, to automatically calculate the extent of myocardial infarction. The publicly available database consists of 150 exams divided into 50 cases with normal MRI after injection of a contrast agent and 100 cases with myocardial infarction (and then with a hyperenhanced area on DE-MRI), whatever their inclusion in the cardiac emergency department. Along with MRI, clinical characteristics are also provided. The obtained results issued from several works show that the automatic classification of an exam is a reachable task (the best method providing an accuracy of 0.92), and the automatic segmentation of the myocardium is possible. However, the segmentation of the diseased area needs to be improved, mainly due to the small size of these areas and the lack of contrast with the surrounding structures. △ Less

Submitted 10 August, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: Submitted to Medical Image Analysis

arXiv:2107.03165 [pdf, other]

Improving Speech Recognition Accuracy of Local POI Using Geographical Models

Authors: Songjun Cao, Yike Zhang, Xiaobing Feng, Long Ma

Abstract: Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specifi… ▽ More Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialect-specific top layer. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of long tail and homophone POI. During decoding, specific language models are selected on demand according to users' geographic location. Experiments show that the proposed Geo-AM achieves 6.5%$\sim$10.1% relative character error rate (CER) reduction on an accent testset and the proposed Geo-AM and Geo-LM totally achieve over 18.7% relative CER reduction on Tencent Map task. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted by SLT 2021

arXiv:2104.02474 [pdf]

doi 10.1364/OE.430281

All-Optical Image Identification with Programmable Matrix Transformation

Authors: Shikang Li, Baohua Ni, Xue Feng, Kaiyu Cui, Fang Liu, Wei Zhang, Yidong Huang

Abstract: An optical neural network is proposed and demonstrated with programmable matrix transformation and nonlinear activation function of photodetection (square-law detection). Based on discrete phase-coherent spatial modes, the dimensionality of programmable optical matrix operations is 30~37, which is implemented by spatial light modulators. With this architecture, all-optical classification tasks of… ▽ More An optical neural network is proposed and demonstrated with programmable matrix transformation and nonlinear activation function of photodetection (square-law detection). Based on discrete phase-coherent spatial modes, the dimensionality of programmable optical matrix operations is 30~37, which is implemented by spatial light modulators. With this architecture, all-optical classification tasks of handwritten digits, objects and depth images are performed on the same platform with high accuracy. Due to the parallel nature of matrix multiplication, the processing speed of our proposed architecture is potentially as high as7.4T~74T FLOPs per second (with 10~100GHz detector) △ Less

Submitted 31 March, 2021; originally announced April 2021.

Journal ref: optics express 2021

arXiv:2010.01291 [pdf, other]

Unsupervised Shadow Removal Using Target Consistency Generative Adversarial Network

Authors: Chao Tan, Xin Feng

Abstract: Unsupervised shadow removal aims to learn a non-linear function to map the original image from shadow domain to non-shadow domain in the absence of paired shadow and non-shadow data. In this paper, we develop a simple yet efficient target-consistency generative adversarial network (TC-GAN) for the shadow removal task in the unsupervised manner. Compared with the bidirectional map** in cycle-cons… ▽ More Unsupervised shadow removal aims to learn a non-linear function to map the original image from shadow domain to non-shadow domain in the absence of paired shadow and non-shadow data. In this paper, we develop a simple yet efficient target-consistency generative adversarial network (TC-GAN) for the shadow removal task in the unsupervised manner. Compared with the bidirectional map** in cycle-consistency GAN based methods for shadow removal, TC-GAN tries to learn a one-sided map** to cast shadow images into shadow-free ones. With the proposed target-consistency constraint, the correlations between shadow images and the output shadow-free image are strictly confined. Extensive comparison experiments results show that TC-GAN outperforms the state-of-the-art unsupervised shadow removal methods by 14.9% in terms of FID and 31.5% in terms of KID. It is rather remarkable that TC-GAN achieves comparable performance with supervised shadow removal methods. △ Less

Submitted 30 May, 2021; v1 submitted 3 October, 2020; originally announced October 2020.

arXiv:2009.00454 [pdf]

A Digital Twin for Reconfigurable Intelligent Surface Assisted Wireless Communication

Authors: Baoling Sheen, ** Yang, Xianglong Feng, Md Moin Uddin Chowdhury

Abstract: Reconfigurable Intelligent Surface (RIS) has emerged as one of the key technologies for 6G in recent years, which comprise a large number of low-cost passive elements that can smartly interact with the im**ing electromagnetic waves for performance enhancement. However, optimally configuring massive number of RIS elements remains a challenge. In this paper, we present a novel digital-twin framewo… ▽ More Reconfigurable Intelligent Surface (RIS) has emerged as one of the key technologies for 6G in recent years, which comprise a large number of low-cost passive elements that can smartly interact with the im**ing electromagnetic waves for performance enhancement. However, optimally configuring massive number of RIS elements remains a challenge. In this paper, we present a novel digital-twin framework for RIS-assisted wireless networks which we name it Environment-Twin (Env-Twin). The goal of the Env-Twin framework is to enable automation of optimal control at various granularities. In this paper, we present one example of the Env-Twin models to learn the map** function between the RIS configuration with measured attributes for the receiver location, and the corresponding achievable rate in an RIS-assisted wireless network without involving explicit channel estimation or beam training overhead. Once learned, our Env-Twin model can be used to predict optimal RIS configuration for any new receiver locations in the same wireless network. We leveraged deep learning (DL) techniques to build our model and studied its performance and robustness. Simulation results demonstrate that the proposed Env-Twin model can recommend near-optimal RIS configurations for test receiver locations which achieved close to an upper bound performance that assumes perfect channel knowledge. Our Env-Twin model was trained using less than 2% of the total receiver locations. This promising result represents great potential of the proposed Env-Twin framework for develo** a practical RIS solution where the panel can automatically configure itself without requesting channel state information (CSI) from the wireless network infrastructure. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2008.09352 [pdf, other]

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Authors: Zhang Li, Jiehua Zhang, Tao Tan, Xichao Teng, Xiaoliang Sun, Yang Li, Lihong Liu, Yang Xiao, Byungjae Lee, Yilong Li, Qianni Zhang, Shujiao Sun, Yushan Zheng, Junyu Yan, Ni Li, Yiyu Hong, Junsu Ko, Hyun Jung, Yanling Liu, Yu-cheng Chen, Ching-wei Wang, Vladimir Yurovskiy, Pavel Maevskikh, Vahid Khanagha, Yi Jiang , et al. (8 additional authors not shown)

Abstract: Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection)… ▽ More Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.02096 [pdf]

doi 10.1109/TMI.2021.3055428

Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, ** Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue. △ Less

Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

arXiv:2005.02689 [pdf]

doi 10.1364/OPTICA.440013

Dynamic brain spectrum acquired by a real-time ultra-spectral imaging chip with reconfigurable metasurfaces

Authors: Jian Xiong, Xusheng Cai, Kaiyu Cui, Yidong Huang, Jiawei Yang, Hongbo Zhu, Wenzheng Li, Bo Hong, Shijie Rao, Zekun Zheng, Sheng Xu, Yuhan He, Fang Liu, Xue Feng, Wei Zhang

Abstract: Spectral imaging paves way for various fields and particular in biomedical research. However, spectral imaging mainly depending on spatial or temporal scanning, cannot achieve high temporal, spatial and spectral resolution simultaneously. In this study, we demonstrated a silicon real-time ultra-spectral imaging chip based on reconfigurable metasurfaces, comprising of 155,216 (356$\times$436) image… ▽ More Spectral imaging paves way for various fields and particular in biomedical research. However, spectral imaging mainly depending on spatial or temporal scanning, cannot achieve high temporal, spatial and spectral resolution simultaneously. In this study, we demonstrated a silicon real-time ultra-spectral imaging chip based on reconfigurable metasurfaces, comprising of 155,216 (356$\times$436) image-adaptive micro-spectrometers with ultra-high center-wavelength accuracy of 0.04 nm and spectral resolution of 0.8 nm. It is employed for imaging brain hemodynamics, and the dynamic spectral absorption properties of deoxyhemoglobin and oxyhemoglobin in a rat barrel cortex were obtained, which enlighten the spectroscopy in vivo studies and other real-time applications. △ Less

Submitted 28 March, 2023; v1 submitted 6 May, 2020; originally announced May 2020.

Journal ref: Optica 9, 461-468 (2022)

arXiv:2001.05551 [pdf, other]

Substituting Gadolinium in Brain MRI Using DeepContrast

Authors: Haoran Sun, Xueqing Liu, Xinyang Feng, Chen Liu, Nanyan Zhu, Sabrina J. Gjerswold-Selleck, Hong-Jian Wei, Pavan S. Upadhyayula, Angeliki Mela, Cheng-Chia Wu, Peter D. Canoll, Andrew F. Laine, J. Thomas Vaughan, Scott A. Small, Jia Guo

Abstract: Cerebral blood volume (CBV) is a hemodynamic correlate of oxygen metabolism and reflects brain activity and function. High-resolution CBV maps can be generated using the steady-state gadolinium-enhanced MRI technique. Such a technique requires an intravenous injection of exogenous gadolinium based contrast agent (GBCA) and recent studies suggest that the GBCA can accumulate in the brain after freq… ▽ More Cerebral blood volume (CBV) is a hemodynamic correlate of oxygen metabolism and reflects brain activity and function. High-resolution CBV maps can be generated using the steady-state gadolinium-enhanced MRI technique. Such a technique requires an intravenous injection of exogenous gadolinium based contrast agent (GBCA) and recent studies suggest that the GBCA can accumulate in the brain after frequent use. We hypothesize that endogenous sources of contrast might exist within the most conventional and commonly acquired structural MRI, potentially obviating the need for exogenous contrast. Here, we test this hypothesis by develo** and optimizing a deep learning algorithm, which we call DeepContrast, in mice. We find that DeepContrast performs equally well as exogenous GBCA in map** CBV of the normal brain tissue and enhancing glioblastoma. Together, these studies validate our hypothesis that a deep learning approach can potentially replace the need for GBCAs in brain MRI. △ Less

Submitted 15 January, 2020; originally announced January 2020.

Journal ref: The IEEE International Symposium on Biomedical Imaging (ISBI) 2020

arXiv:1912.08011 [pdf]

Application of Word2vec in Phoneme Recognition

Authors: Xin Feng, Lei Wang

Abstract: In this paper, we present how to hybridize a Word2vec model and an attention-based end-to-end speech recognition model. We build a phoneme recognition system based on Listen, Attend and Spell model. And the phoneme recognition model uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme vectors. At the same… ▽ More In this paper, we present how to hybridize a Word2vec model and an attention-based end-to-end speech recognition model. We build a phoneme recognition system based on Listen, Attend and Spell model. And the phoneme recognition model uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme vectors. At the same time, in order to solve the problem of overfitting in the 61 phoneme recognition model on TIMIT dataset, we propose a new training method. A 61-39 phoneme map** comparison table is used to inverse map the phonemes of the dataset to generate more 61 phoneme training data. At the end of training, replace the dataset with a standard dataset for corrective training. Our model can achieve the best result under the TIMIT dataset which is 16.5% PER (Phoneme Error Rate). △ Less

Submitted 19 December, 2019; v1 submitted 17 December, 2019; originally announced December 2019.

arXiv:1911.09982 [pdf]

HybridNetSeg: A Compact Hybrid Network for Retinal Vessel Segmentation

Authors: Ling Luo, Dingyu Xue, Xinglong Feng

Abstract: A large number of retinal vessel analysis methods based on image segmentation have emerged in recent years. However, existing methods depend on cumbersome backbones, such as VGG16 and ResNet-50, benefiting from their powerful feature extraction capabilities but suffering from high computational costs. In this paper, we propose a novel neural network (HybridNetSeg) dedicated to solving this drawbac… ▽ More A large number of retinal vessel analysis methods based on image segmentation have emerged in recent years. However, existing methods depend on cumbersome backbones, such as VGG16 and ResNet-50, benefiting from their powerful feature extraction capabilities but suffering from high computational costs. In this paper, we propose a novel neural network (HybridNetSeg) dedicated to solving this drawback while further improving overall performance. Considering deformable convolution can extract complex and variable structural information, and larger kernel in mixed depthwise convolution makes contribution to higher accuracy. We have integrated these two modules and propose a Hybrid Convolution Block (HCB) using the idea of heuristic learning. Inspired by the U-Net, we use HCB to replace a part of the common convolution of the U-Net encoder, drastically reducing the parameter count to 0.71M while accelerating the inference process. Not only that, we also propose a multi-scale mixed loss mechanism. Extensive experiments on three major benchmark datasets demonstrate the effectiveness of our proposed method △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 16 pages, 3 figures

arXiv:1910.08375 [pdf, other]

Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Authors: Z. Ma, L. Song, X. Feng, G. Yang, W. Zhu, J. Liu, Y. Zhang, X. Yang, Y. Yin

Abstract: Intracranial aneurysm (IA) is a life-threatening blood spot in human's brain if it ruptures and causes cerebral hemorrhage. It is challenging to detect whether an IA has ruptured from medical images. In this paper, we propose a novel graph based neural network named GraphNet to detect IA rupture from 3D surface data. GraphNet is based on graph convolution network (GCN) and is designed for graph-le… ▽ More Intracranial aneurysm (IA) is a life-threatening blood spot in human's brain if it ruptures and causes cerebral hemorrhage. It is challenging to detect whether an IA has ruptured from medical images. In this paper, we propose a novel graph based neural network named GraphNet to detect IA rupture from 3D surface data. GraphNet is based on graph convolution network (GCN) and is designed for graph-level classification and node-level segmentation. The network uses GCN blocks to extract surface local features and pools to global features. 1250 patient data including 385 ruptured and 865 unruptured IAs were collected from clinic for experiments. The performance on randomly selected 234 test patient data was reported. The experiment with the proposed GraphNet achieved accuracy of 0.82, area-under-curve (AUC) of receiver operating characteristic (ROC) curve 0.82 in the classification task, significantly outperforming the baseline approach without using graph based networks. The segmentation output of the model achieved mean graph-node-based dice coefficient (DSC) score 0.88. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: Submitted to ISBI 2020

arXiv:1910.04919 [pdf]

From Species to Cultivar: Soybean Cultivar Recognition using Multiscale Sliding Chord Matching of Leaf Images

Authors: Bin Wang, Yongsheng Gao, Xiaohan Yu, Xiaohui Yuan, Shengwu Xiong, Xianzhong Feng

Abstract: Leaf image recognition techniques have been actively researched for plant species identification. However it remains unclear whether leaf patterns can provide sufficient information for cultivar recognition. This paper reports the first attempt on soybean cultivar recognition from plant leaves which is not only a challenging research problem but also important for soybean cultivar evaluation, sele… ▽ More Leaf image recognition techniques have been actively researched for plant species identification. However it remains unclear whether leaf patterns can provide sufficient information for cultivar recognition. This paper reports the first attempt on soybean cultivar recognition from plant leaves which is not only a challenging research problem but also important for soybean cultivar evaluation, selection and production in agriculture. In this paper, we propose a novel multiscale sliding chord matching (MSCM) approach to extract leaf patterns that are distinctive for soybean cultivar identification. A chord is defined to slide along the contour for measuring the synchronised patterns of exterior shape and interior appearance of soybean leaf images. A multiscale sliding chord strategy is developed to extract features in a coarse-to-fine hierarchical order. A joint description that integrates the leaf descriptors from different parts of a soybean plant is proposed for further enhancing the discriminative power of cultivar description. We built a cultivar leaf image database, SoyCultivar, consisting of 1200 sample leaf images from 200 soybean cultivars for performance evaluation. Encouraging experimental results of the proposed method in comparison to the state-of-the-art leaf species recognition methods demonstrate the availability of cultivar information in soybean leaves and effectiveness of the proposed MSCM for soybean cultivar identification, which may advance the research in leaf recognition from species to cultivar. △ Less

Submitted 10 October, 2019; originally announced October 2019.

Comments: 33 pages, 8 figures

arXiv:1907.00943 [pdf, other]

Estimating brain age based on a healthy population with deep learning and structural MRI

Authors: Xinyang Feng, Zachary C. Lipton, Jie Yang, Scott A. Small, Frank A. Provenzano

Abstract: Numerous studies have established that estimated brain age, as derived from statistical models trained on healthy populations, constitutes a valuable biomarker that is predictive of cognitive decline and various neurological diseases. In this work, we curate a large-scale heterogeneous dataset (N = 10,158, age range 18 - 97) of structural brain MRIs in a healthy population from multiple publicly-a… ▽ More Numerous studies have established that estimated brain age, as derived from statistical models trained on healthy populations, constitutes a valuable biomarker that is predictive of cognitive decline and various neurological diseases. In this work, we curate a large-scale heterogeneous dataset (N = 10,158, age range 18 - 97) of structural brain MRIs in a healthy population from multiple publicly-available sources, upon which we train a deep learning model for brain age estimation. The availability of the large-scale dataset enables a more uniform age distribution across adult life-span for effective age estimation with no bias toward certain age groups. We demonstrate that the age estimation accuracy, evaluated with mean absolute error (MAE) and correlation coefficient (r), outperforms previously reported methods in both a hold-out test set reflective of the custom population (MAE = 4.06 years, r = 0.970) and an independent life-span evaluation dataset (MAE = 4.21 years, r = 0.960) on which a previous study has evaluated. We further demonstrate the utility of the estimated age in life-span aging analysis of cognitive functions. Furthermore, we conduct extensive ablation tests and employ feature-attribution techniques to analyze which regions contribute the most predictive value, demonstrating the prominence of the frontal lobe as well as pattern shift across life-span. In summary, we achieve superior age estimation performance confirming the efficacy of deep learning and the added utility of training with data both in larger number and more uniformly distributed than in previous studies. We demonstrate the regional contribution to our brain age predictions through multiple routes and confirm the association of divergence between estimated and chronological brain age with neuropsychological measures. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: 32 pages, 9 figures, 6 tables

arXiv:1901.10088 [pdf, other]

doi 10.1103/PhysRevA.101.042327

Subspace Stabilization Analysis for Non-Markovian Open Quantum Systems

Authors: Shikun Zhang, Kun Liu, Daoyi Dong, Xiaoxue Feng, Feng Pan

Abstract: Studied in this article is non-Markovian open quantum systems parametrized by Hamiltonian H, coupling operator L, and memory kernel function γ, which is a proper candidate for describing the dynamics of various solid-state quantum information processing devices. We look into the subspace stabilization problem of the system from the perspective of dynamical systems and control. The problem translat… ▽ More Studied in this article is non-Markovian open quantum systems parametrized by Hamiltonian H, coupling operator L, and memory kernel function γ, which is a proper candidate for describing the dynamics of various solid-state quantum information processing devices. We look into the subspace stabilization problem of the system from the perspective of dynamical systems and control. The problem translates itself into finding analytic conditions that characterize invariant and attractive subspaces. Necessary and sufficient conditions are found for subspace invariance based on algebraic computations, and sufficient conditions are derived for subspace attractivity by applying a double integral Lyapunov functional. Mathematical proof is given for those conditions and a numerical example is provided to illustrate the theoretical result. △ Less

Submitted 28 January, 2019; originally announced January 2019.

Comments: 7 pages, 1 figure

Journal ref: Phys. Rev. A 101, 042327 (2020)

Showing 1–27 of 27 results for author: Feng, X