-
Determined Multichannel Blind Source Separation with Clustered Source Model
Authors:
Jianyu Wang,
Shanzheng Guan
Abstract:
The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the…
▽ More
The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the other hand, NCPD preserves intrinsic structure but lacks interpretable latent factors, making it challenging to incorporate prior information as constraints. To address these limitations, we introduce a clustered source model based on nonnegative block-term decomposition (NBTD). This model defines blocks as outer products of vectors (clusters) and matrices (for spectral structure modeling), offering interpretable latent vectors. Moreover, it enables straightforward integration of orthogonality constraints to ensure independence among source images. Experimental results demonstrate that our proposed method outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Multichannel blind speech source separation with a disjoint constraint source model
Authors:
Jianyu Wang,
Shanzheng Guan
Abstract:
Multichannel convolutive blind speech source separation refers to the problem of separating different speech sources from the observed multichannel mixtures without much a priori information about the mixing system. Multichannel nonnegative matrix factorization (MNMF) has been proven to be one of the most powerful separation frameworks and the representative algorithms such as MNMF and the indepen…
▽ More
Multichannel convolutive blind speech source separation refers to the problem of separating different speech sources from the observed multichannel mixtures without much a priori information about the mixing system. Multichannel nonnegative matrix factorization (MNMF) has been proven to be one of the most powerful separation frameworks and the representative algorithms such as MNMF and the independent low-rank matrix analysis (ILRMA) have demonstrated great performance. However, the sparseness properties of speech source signals are not fully taken into account in such a framework. It is well known that speech signals are sparse in nature, which is considered in this work to improve the separation performance. Specifically, we utilize the Bingham and Laplace distributions to formulate a disjoint constraint regularizer, which is subsequently incorporated into both MNMF and ILRMA. We then derive majorization-minimization rules for updating parameters related to the source model, resulting in the development of two enhanced algorithms: s-MNMF and s-ILRMA. Comprehensive simulations are conducted, and the results unequivocally demonstrate the efficacy of our proposed methodologies.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Independent low-rank matrix analysis based on the Sinkhorn divergence source model for blind source separation
Authors:
Jianyu Wang,
Shanzheng Guan,
**gdong Chen,
Jacob Benesty
Abstract:
The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then empl…
▽ More
The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then employed to estimate the source model related parameters. In reality, however, the spectral coefficients from different frequency bands may be dependent, which is not considered in the existing ILRMA algorithm. This paper presents an improved version of ILRMA, which considers the dependency between the spectral coefficients from different frequency bands. The Sinkhorn divergence is then exploited to optimize the source model parameters. As a result of using the cross-band information, the BSS performance is improved. But the number of parameters to be estimated also increases significantly, and so is the computational complexity. To reduce the algorithm complexity, we apply the Kronecker product to decompose the modeling matrix into the product of a number of matrices of much smaller dimensionality. An efficient algorithm is then developed to implement the Sinkhorn divergence based BSS algorithm and the complexity is reduced by an order of magnitude.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement
Authors:
Ding-Jiun Huang,
Yu-Ting Kao,
Tieh-Hung Chuang,
Ya-Chun Tsai,
**g-Kai Lou,
Shuen-Huei Guan
Abstract:
In recent years, several video quality assessment (VQA) methods have been developed, achieving high performance. However, these methods were not specifically trained for enhanced videos, which limits their ability to predict video quality accurately based on human subjective perception. To address this issue, we propose a stack-based framework for VQA that outperforms existing state-of-the-art met…
▽ More
In recent years, several video quality assessment (VQA) methods have been developed, achieving high performance. However, these methods were not specifically trained for enhanced videos, which limits their ability to predict video quality accurately based on human subjective perception. To address this issue, we propose a stack-based framework for VQA that outperforms existing state-of-the-art methods on VDPVE, a dataset consisting of enhanced videos. In addition to proposing the VQA framework for enhanced videos, we also investigate its application on professionally generated content (PGC). To address copyright issues with premium content, we create the PGCVQ dataset, which consists of videos from YouTube. We evaluate our proposed approach and state-of-the-art methods on PGCVQ, and provide new insights on the results. Our experiments demonstrate that existing VQA algorithms can be applied to PGC videos, and we find that VQA performance for PGC videos can be improved by considering the plot of a play, which highlights the importance of video semantic understanding.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
CaraNet: Context Axial Reverse Attention Network for Segmentation of Small Medical Objects
Authors:
Ange Lou,
Shuyue Guan,
Murray Loew
Abstract:
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects' sizes, shapes, and scanning modalities. Recently, many convolutional neural networks (CNN) have been designed for segmentation tasks and achieved great success. Few studies, however, have fully considered the sizes of objects, and thus…
▽ More
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects' sizes, shapes, and scanning modalities. Recently, many convolutional neural networks (CNN) have been designed for segmentation tasks and achieved great success. Few studies, however, have fully considered the sizes of objects, and thus most demonstrate poor performance for small objects segmentation. This can have a significant impact on the early detection of diseases. This paper proposes a Context Axial Reverse Attention Network (CaraNet) to improve the segmentation performance on small objects compared with several recent state-of-the-art models. CaraNet applies axial reserve attention (ARA) and channel-wise feature pyramid (CFP) module to dig feature information of small medical object. And we evaluate our model by six different measurement metrics. We test our CaraNet on brain tumor (BraTS 2018) and polyp (Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-LaribPolypDB) segmentation datasets. Our CaraNet achieves the top-rank mean Dice segmentation accuracy, and results show a distinct advantage of CaraNet in the segmentation of small medical objects.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Informing selection of performance metrics for medical image segmentation evaluation using configurable synthetic errors
Authors:
Shuyue Guan,
Ravi K. Samala,
Weijie Chen
Abstract:
Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate…
▽ More
Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate properties of performance metrics, but they lack the complexity of anatomical variations in real images. In this study, we present a tool to emulate segmentations by adjusting the reference (truth) masks of anatomical objects extracted from real medical images. Our tool is designed to modify the defined truth contours and emulate different types of segmentation errors with a set of user-configurable parameters. We defined the ground truth objects from 230 patient images in the Glioma Image Segmentation for Radiotherapy (GLIS-RT) database. For each object, we used our segmentation synthesis tool to synthesize 10 versions of segmentation (i.e., 10 simulated segmentors or algorithms), where each version has a pre-defined combination of segmentation errors. We then applied 20 performance metrics to evaluate all synthetic segmentations. We demonstrated the properties of these metrics, including their ability to capture specific types of segmentation errors. By analyzing the intrinsic properties of these metrics and categorizing the segmentation errors, we are working toward the goal of develo** a decision-tree tool for assisting in the selection of segmentation performance metrics.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Graph Neural Network and Koopman Models for Learning Networked Dynamics: A Comparative Study on Power Grid Transients Prediction
Authors:
Sai Pushpak Nandanoori,
Sheng Guan,
Soumya Kundu,
Seemita Pal,
Khushbu Agarwal,
Yinghui Wu,
Sutanay Choudhury
Abstract:
Continuous monitoring of the spatio-temporal dynamic behavior of critical infrastructure networks, such as the power systems, is a challenging but important task. In particular, accurate and timely prediction of the (electro-mechanical) transient dynamic trajectories of the power grid is necessary for early detection of any instability and prevention of catastrophic failures. Existing approaches f…
▽ More
Continuous monitoring of the spatio-temporal dynamic behavior of critical infrastructure networks, such as the power systems, is a challenging but important task. In particular, accurate and timely prediction of the (electro-mechanical) transient dynamic trajectories of the power grid is necessary for early detection of any instability and prevention of catastrophic failures. Existing approaches for the prediction of dynamic trajectories either rely on the availability of accurate physical models of the system, use computationally expensive time-domain simulations, or are applicable only at local prediction problems (e.g., a single generator). In this paper, we report the application of two broad classes of data-driven learning models -- along with their algorithmic implementation and performance evaluation -- in predicting transient trajectories in power networks using only streaming measurements and the network topology as input. One class of models is based on the Koopman operator theory which allows for capturing the nonlinear dynamic behavior via an infinite-dimensional linear operator. The other class of models is based on the graph convolutional neural networks which are adept at capturing the inherent spatio-temporal correlations within the power network. Transient dynamic datasets for training and testing the models are synthesized by simulating a wide variety of load change events in the IEEE 68-bus system, categorized by the load change magnitudes, as well as by the degree of connectivity and the distance to nearest generator nodes. The results confirm that the proposed predictive models can successfully predict the post-disturbance transient evolution of the system with a high level of accuracy.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
A Sneak Attack on Segmentation of Medical Images Using Deep Neural Network Classifiers
Authors:
Shuyue Guan,
Murray Loew
Abstract:
Instead of using current deep-learning segmentation models (like the UNet and variants), we approach the segmentation problem using trained Convolutional Neural Network (CNN) classifiers, which automatically extract important features from images for classification. Those extracted features can be visualized and formed into heatmaps using Gradient-weighted Class Activation Map** (Grad-CAM). This…
▽ More
Instead of using current deep-learning segmentation models (like the UNet and variants), we approach the segmentation problem using trained Convolutional Neural Network (CNN) classifiers, which automatically extract important features from images for classification. Those extracted features can be visualized and formed into heatmaps using Gradient-weighted Class Activation Map** (Grad-CAM). This study tested whether the heatmaps could be used to segment the classified targets. We also proposed an evaluation method for the heatmaps; that is, to re-train the CNN classifier using images filtered by heatmaps and examine its performance. We used the mean-Dice coefficient to evaluate segmentation results. Results from our experiments show that heatmaps can locate and segment partial tumor areas. But use of only the heatmaps from CNN classifiers may not be an optimal approach for segmentation. We have verified that the predictions of CNN classifiers mainly depend on tumor areas, and dark regions in Grad-CAM's heatmaps also contribute to classification.
△ Less
Submitted 27 January, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
Fourier Neural Operator Networks: A Fast and General Solver for the Photoacoustic Wave Equation
Authors:
Steven Guan,
Ko-Tsung Hsu,
Parag V. Chitnis
Abstract:
Simulation tools for photoacoustic wave propagation have played a key role in advancing photoacoustic imaging by providing quantitative and qualitative insights into parameters affecting image quality. Classical methods for numerically solving the photoacoustic wave equation relies on a fine discretization of space and can become computationally expensive for large computational grids. In this wor…
▽ More
Simulation tools for photoacoustic wave propagation have played a key role in advancing photoacoustic imaging by providing quantitative and qualitative insights into parameters affecting image quality. Classical methods for numerically solving the photoacoustic wave equation relies on a fine discretization of space and can become computationally expensive for large computational grids. In this work, we apply Fourier Neural Operator (FNO) networks as a fast data-driven deep learning method for solving the 2D photoacoustic wave equation in a homogeneous medium. Comparisons between the FNO network and pseudo-spectral time domain approach demonstrated that the FNO network generated comparable simulations with small errors and was several orders of magnitude faster. Moreover, the FNO network was generalizable and can generate simulations not observed in the training data.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
CaraNet: Context Axial Reverse Attention Network for Segmentation of Small Medical Objects
Authors:
Ange Lou,
Shuyue Guan,
Hanseok Ko,
Murray Loew
Abstract:
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects' sizes, shapes, and scanning modalities. Recently, many convolutional neural networks (CNN) have been designed for segmentation tasks and achieved great success. Few studies, however, have fully considered the sizes of objects, and thus…
▽ More
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects' sizes, shapes, and scanning modalities. Recently, many convolutional neural networks (CNN) have been designed for segmentation tasks and achieved great success. Few studies, however, have fully considered the sizes of objects, and thus most demonstrate poor performance for small objects segmentation. This can have a significant impact on the early detection of diseases. This paper proposes a Context Axial Reserve Attention Network (CaraNet) to improve the segmentation performance on small objects compared with several recent state-of-the-art models. We test our CaraNet on brain tumor (BraTS 2018) and polyp (Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-LaribPolypDB) segmentation datasets. Our CaraNet achieves the top-rank mean Dice segmentation accuracy, and results show a distinct advantage of CaraNet in the segmentation of small medical objects.
△ Less
Submitted 13 January, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Distributed adaptive algorithm based on the asymmetric cost of error functions
Authors:
Sihai Guan,
Qing Cheng,
Yong Zhao
Abstract:
In this paper, a family of novel diffusion adaptive estimation algorithm is proposed from the asymmetric cost function perspective by combining diffusion strategy and the linear-linear cost (LLC), quadratic-quadratic cost (QQC), and linear-exponential cost (LEC), at all distributed network nodes, and named diffusion LLCLMS (DLLCLMS), diffusion QQCLMS (DQQCLMS), and diffusion LECLMS (DLECLMS), resp…
▽ More
In this paper, a family of novel diffusion adaptive estimation algorithm is proposed from the asymmetric cost function perspective by combining diffusion strategy and the linear-linear cost (LLC), quadratic-quadratic cost (QQC), and linear-exponential cost (LEC), at all distributed network nodes, and named diffusion LLCLMS (DLLCLMS), diffusion QQCLMS (DQQCLMS), and diffusion LECLMS (DLECLMS), respectively. Then the stability of mean estimation error and computational complexity of those three diffusion algorithms are analyzed theoretically. Finally, several experiment simulation results are designed to verify the superiority of those three proposed diffusion algorithms. Experimental simulation results show that DLLCLMS, DQQCLMS, and DLECLMS algorithms are more robust to the input signal and impulsive noise than the DSELMS, DRVSSLMS, and DLLAD algorithms. In brief, theoretical analysis and experiment results show that those proposed DLLCLMS, DQQCLMS, and DLECLMS algorithms have superior performance when estimating the unknown linear system under the changeable impulsive noise environments and different types of input signals.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Attention-based multi-channel speaker verification with ad-hoc microphone arrays
Authors:
Chengdong Liang,
Junqi Chen,
Shanzheng Guan,
Xiao-Lei Zhang
Abstract:
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel s…
▽ More
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel speaker verification with ad-hoc microphone arrays. Specifically, we add an inter-channel processing layer and a global fusion layer after the pooling layer of a single-channel speaker verification system. The inter-channel processing layer applies a so-called residual self-attention along the channel dimension for allocating weights to different microphones. The global fusion layer integrates all channels in a way that is independent to the number of the input channels. We further replace the softmax operator in the residual self-attention with sparsemax, which forces the channel weights of very noisy channels to zero. Experimental results with ad-hoc microphone arrays of over 30 channels demonstrate the effectiveness of the proposed methods. For example, the multi-channel speaker verification with sparsemax achieves an equal error rate (EER) of over 20% lower than oracle one-best system on semi-real data sets, and over 30% lower on simulation data sets, in test scenarios with both matched and mismatched channel numbers.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
CFPNet-M: A Light-Weight Encoder-Decoder Based Network for Multimodal Biomedical Image Real-Time Segmentation
Authors:
Ange Lou,
Shuyue Guan,
Murray Loew
Abstract:
Currently, developments of deep learning techniques are providing instrumental to identify, classify, and quantify patterns in medical images. Segmentation is one of the important applications in medical image analysis. In this regard, U-Net is the predominant approach to medical image segmentation tasks. However, we found that those U-Net based models have limitations in several aspects, for exam…
▽ More
Currently, developments of deep learning techniques are providing instrumental to identify, classify, and quantify patterns in medical images. Segmentation is one of the important applications in medical image analysis. In this regard, U-Net is the predominant approach to medical image segmentation tasks. However, we found that those U-Net based models have limitations in several aspects, for example, millions of parameters in the U-Net consuming considerable computation resource and memory, lack of global information, and missing some tough objects. Therefore, we applied two modifications to improve the U-Net model: 1) designed and added the dilated channel-wise CNN module, 2) simplified the U shape network. Based on these two modifications, we proposed a novel light-weight architecture -- Channel-wise Feature Pyramid Network for Medicine (CFPNet-M). To evaluate our method, we selected five datasets with different modalities: thermography, electron microscopy, endoscopy, dermoscopy, and digital retinal images. And we compared its performance with several models having different parameter scales. This paper also involves our previous studies of DC-UNet and some commonly used light-weight neural networks. We applied the Tanimoto similarity instead of the Jaccard index for gray-level image measurements. By comparison, CFPNet-M achieves comparable segmentation results on all five medical datasets with only 0.65 million parameters, which is about 2% of U-Net, and 8.8 MB memory. Meanwhile, the inference speed can reach 80 FPS on a single RTX 2070Ti GPU with the 256 by 192 pixels input size.
△ Less
Submitted 30 May, 2021; v1 submitted 9 May, 2021;
originally announced May 2021.
-
Dense Dilated UNet: Deep Learning for 3D Photoacoustic Tomography Image Reconstruction
Authors:
Steven Guan,
Ko-Tsung Hsu,
Matthias Eyassu,
Parag V. Chitnis
Abstract:
In photoacoustic tomography (PAT), the acoustic pressure waves produced by optical excitation are measured by an array of detectors and used to reconstruct an image. Sparse spatial sampling and limited-view detection are two common challenges faced in PAT. Reconstructing from incomplete data using standard methods results in severe streaking artifacts and blurring. We propose a modified convolutio…
▽ More
In photoacoustic tomography (PAT), the acoustic pressure waves produced by optical excitation are measured by an array of detectors and used to reconstruct an image. Sparse spatial sampling and limited-view detection are two common challenges faced in PAT. Reconstructing from incomplete data using standard methods results in severe streaking artifacts and blurring. We propose a modified convolutional neural network (CNN) architecture termed Dense Dilation UNet (DD-UNet) for correcting artifacts in 3D PAT. The DD-Net leverages the benefits of dense connectivity and dilated convolutions to improve CNN performance. We compare the proposed CNN in terms of image quality as measured by the multiscale structural similarity index metric to the Fully Dense UNet (FD-UNet). Results demonstrate that the DD-Net consistently outperforms the FD-UNet and is able to more reliably reconstruct smaller image features.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays
Authors:
Shanzheng Guan,
Shupei Liu,
Junqi Chen,
Wenbo Zhu,
Shengqiang Li,
Xu Tan,
Ziye Yang,
Menglong Xu,
Yijiang Chen,
Jianyu Wang,
Xiao-Lei Zhang
Abstract:
Recently, there is a research trend on ad-hoc microphone arrays. However, most research was conducted on simulated data. Although some data sets were collected with a small number of distributed devices, they were not synchronized which hinders the fundamental theoretical research to ad-hoc microphone arrays. To address this issue, this paper presents a synchronized speech corpus, named Libri-adho…
▽ More
Recently, there is a research trend on ad-hoc microphone arrays. However, most research was conducted on simulated data. Although some data sets were collected with a small number of distributed devices, they were not synchronized which hinders the fundamental theoretical research to ad-hoc microphone arrays. To address this issue, this paper presents a synchronized speech corpus, named Libri-adhoc40, which collects the replayed Librispeech data from loudspeakers by ad-hoc microphone arrays of 40 strongly synchronized distributed nodes in a real office environment. Besides, to provide the evaluation target for speech frontend processing and other applications, we also recorded the replayed speech in an anechoic chamber. We trained several multi-device speech recognition systems on both the Libri-adhoc40 dataset and a simulated dataset. Experimental results demonstrate the validness of the proposed corpus which can be used as a benchmark to reflect the trend and difference of the models with different ad-hoc microphone arrays. The dataset is online available at https://github.com/ISmallFish/Libri-adhoc40.
△ Less
Submitted 6 April, 2021; v1 submitted 28 March, 2021;
originally announced March 2021.
-
Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation
Authors:
Jianyu Wang,
Shanzheng Guan,
Shupei Liu,
Xiao-Lei Zhang
Abstract:
Multichannel blind audio source separation aims to recover the latent sources from their multichannel mixtures without supervised information. One state-of-the-art blind audio source separation method, named independent low-rank matrix analysis (ILRMA), unifies independent vector analysis (IVA) and nonnegative matrix factorization (NMF). However, the spectra matrix produced from NMF may not find a…
▽ More
Multichannel blind audio source separation aims to recover the latent sources from their multichannel mixtures without supervised information. One state-of-the-art blind audio source separation method, named independent low-rank matrix analysis (ILRMA), unifies independent vector analysis (IVA) and nonnegative matrix factorization (NMF). However, the spectra matrix produced from NMF may not find a compact spectral basis. It may not guarantee the identifiability of each source as well. To address this problem, here we propose to enhance the identifiability of the source model by a minimum-volume prior distribution. We further regularize a multichannel NMF (MNMF) and ILRMA respectively with the minimum-volume regularizer. The proposed methods maximize the posterior distribution of the separated sources, which ensures the stability of the convergence. Experimental results demonstrate the effectiveness of the proposed methods compared with auxiliary independent vector analysis, MNMF, ILRMA and its extensions.
△ Less
Submitted 29 March, 2021; v1 submitted 16 January, 2021;
originally announced January 2021.
-
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Authors:
Ziye Yang,
Shanzheng Guan,
Xiao-Lei Zhang
Abstract:
Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting…
▽ More
Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting and tracing a specific speaker in the ad-hoc array. However, this technique has not been explored yet. In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning. The algorithm contains three components. First, we propose a supervised channel selection framework based on speaker extraction, where the estimated utterance-level SNRs of the target speech are used as the basis for the channel selection. Second, we apply the selected channels to a deep learning based MVDR algorithm, where a single-channel speaker extraction algorithm is applied to each selected channel for estimating the mask of the target speech. We conducted an extensive experiment on a WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Segmentation of Infrared Breast Images Using MultiResUnet Neural Network
Authors:
Ange Lou,
Shuyue Guan,
Nada Kamona,
Murray Loew
Abstract:
Breast cancer is the second leading cause of death for women in the U.S. Early detection of breast cancer is key to higher survival rates of breast cancer patients. We are investigating infrared (IR) thermography as a noninvasive adjunct to mammography for breast cancer screening. IR imaging is radiation-free, pain-free, and non-contact. Automatic segmentation of the breast area from the acquired…
▽ More
Breast cancer is the second leading cause of death for women in the U.S. Early detection of breast cancer is key to higher survival rates of breast cancer patients. We are investigating infrared (IR) thermography as a noninvasive adjunct to mammography for breast cancer screening. IR imaging is radiation-free, pain-free, and non-contact. Automatic segmentation of the breast area from the acquired full-size breast IR images will help limit the area for tumor search, as well as reduce the time and effort costs of manual segmentation. Autoencoder-like convolutional and deconvolutional neural networks (C-DCNN) had been applied to automatically segment the breast area in IR images in previous studies. In this study, we applied a state-of-the-art deep-learning segmentation model, MultiResUnet, which consists of an encoder part to capture features and a decoder part for precise localization. It was used to segment the breast area by using a set of breast IR images, collected in our pilot study by imaging breast cancer patients and normal volunteers with a thermal infrared camera (N2 Imager). The database we used has 450 images, acquired from 14 patients and 16 volunteers. We used a thresholding method to remove interference in the raw images and remapped them from the original 16-bit to 8-bit, and then cropped and segmented the 8-bit images manually. Experiments using leave-one-out cross-validation (LOOCV) and comparison with the ground-truth images by using Tanimoto similarity show that the average accuracy of MultiResUnet is 91.47%, which is about 2% higher than that of the autoencoder. MultiResUnet offers a better approach to segment breast IR images than our previous model.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
DC-UNet: Rethinking the U-Net Architecture with Dual Channel Efficient CNN for Medical Images Segmentation
Authors:
Ange Lou,
Shuyue Guan,
Murray Loew
Abstract:
Recently, deep learning has become much more popular in computer vision area. The Convolution Neural Network (CNN) has brought a breakthrough in images segmentation areas, especially, for medical images. In this regard, U-Net is the predominant approach to medical image segmentation task. The U-Net not only performs well in segmenting multimodal medical images generally, but also in some tough cas…
▽ More
Recently, deep learning has become much more popular in computer vision area. The Convolution Neural Network (CNN) has brought a breakthrough in images segmentation areas, especially, for medical images. In this regard, U-Net is the predominant approach to medical image segmentation task. The U-Net not only performs well in segmenting multimodal medical images generally, but also in some tough cases of them. However, we found that the classical U-Net architecture has limitation in several aspects. Therefore, we applied modifications: 1) designed efficient CNN architecture to replace encoder and decoder, 2) applied residual module to replace skip connection between encoder and decoder to improve based on the-state-of-the-art U-Net model. Following these modifications, we designed a novel architecture--DC-UNet, as a potential successor to the U-Net architecture. We created a new effective CNN architecture and build the DC-UNet based on this CNN. We have evaluated our model on three datasets with tough cases and have obtained a relative improvement in performance of 2.90%, 1.49% and 11.42% respectively compared with classical U-Net. In addition, we used the Tanimoto similarity to replace the Jaccard similarity for gray-to-gray image comparisons.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
A Novel Measure to Evaluate Generative Adversarial Networks Based on Direct Analysis of Generated Images
Authors:
Shuyue Guan,
Murray Loew
Abstract:
The Generative Adversarial Network (GAN) is a state-of-the-art technique in the field of deep learning. A number of recent papers address the theory and applications of GANs in various fields of image processing. Fewer studies, however, have directly evaluated GAN outputs. Those that have been conducted focused on using classification performance, e.g., Inception Score (IS) and statistical metrics…
▽ More
The Generative Adversarial Network (GAN) is a state-of-the-art technique in the field of deep learning. A number of recent papers address the theory and applications of GANs in various fields of image processing. Fewer studies, however, have directly evaluated GAN outputs. Those that have been conducted focused on using classification performance, e.g., Inception Score (IS) and statistical metrics, e.g., Fréchet Inception Distance (FID). Here, we consider a fundamental way to evaluate GANs by directly analyzing the images they generate, instead of using them as inputs to other classifiers. We characterize the performance of a GAN as an image generator according to three aspects: 1) Creativity: non-duplication of the real images. 2) Inheritance: generated images should have the same style, which retains key features of the real images. 3) Diversity: generated images are different from each other. A GAN should not generate a few different images repeatedly. Based on the three aspects of ideal GANs, we have designed the Likeness Score (LS) to evaluate GAN performance, and have applied it to evaluate several typical GANs. We compared our proposed measure with two commonly used GAN evaluation methods: IS and FID, and four additional measures. Furthermore, we discuss how these evaluations could help us deepen our understanding of GANs and improve their performance.
△ Less
Submitted 7 April, 2021; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Limited View and Sparse Photoacoustic Tomography for Neuroimaging with Deep Learning
Authors:
Steven Guan,
Amir A. Khan,
Siddhartha Sikdar,
Parag V. Chitnis
Abstract:
Photoacoustic tomography (PAT) is a nonionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their view of the imaging target, which result in significant image reconstruction…
▽ More
Photoacoustic tomography (PAT) is a nonionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their view of the imaging target, which result in significant image reconstruction artifacts degrading image quality. Iterative reconstruction methods can be used to reduce artifacts but are computationally expensive. In this work, we propose a novel deep learning approach termed pixelwise deep learning (PixelDL) that first employs pixelwise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to directly reconstruct an image. Simulated photoacoustic data from synthetic vasculature phantom and mouse-brain vasculature were used for training and testing, respectively. Results demonstrated that PixelDL achieved comparable performance to iterative methods and outperformed other CNN-based approaches for correcting artifacts. PixelDL is a computationally efficient approach that enables for realtime PAT rendering and for improved image quality, quantification, and interpretation.
△ Less
Submitted 27 June, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Diffusion probabilistic LMS algorithm
Authors:
Sihai Guan,
Chun Meng,
Bharat Biswal
Abstract:
In this paper, a novel diffusion estimation algorithm is proposed from a probabilistic perspective by combining diffusion strategy and the probabilistic least-mean-squares (PLMS) at all agents. The proposed method diffusion probabilistic LMS (DPLMS) is more robust to input signal and impulsive interference than the DSE-LMS, DRVSSLMS and DLLAD algorithms. Instead of minimizing the estimate error, t…
▽ More
In this paper, a novel diffusion estimation algorithm is proposed from a probabilistic perspective by combining diffusion strategy and the probabilistic least-mean-squares (PLMS) at all agents. The proposed method diffusion probabilistic LMS (DPLMS) is more robust to input signal and impulsive interference than the DSE-LMS, DRVSSLMS and DLLAD algorithms. Instead of minimizing the estimate error, the DPLMS algorithm is derived from approximating the posterior distribution with an isotropic Gaussian distribution. The stability of mean performance and computational complexity are analyzed theoretically. Results from the simulation indicate that the DPLMS algorithm is more robust to input signal and impulsive interference than the DSE-LMS, DRVSSLMS and DLLAD algorithms. These results suggest that the DPLMS algorithm can perform better in identifying the unknown coefficients under the complex and changeable impulsive interference environments.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Optimal step-size of least mean absolute fourth algorithm in low SNR
Authors:
Sihai Guan,
Chun Meng,
Bharat Biswal
Abstract:
There is a need to improve the capability of the adaptive filtering algorithm against Gaussian or multiple types of non-Gaussian noises, time-varying system, and systems with low SNR. In this paper, we propose an optimized least mean absolute fourth (OPLMF) algorithm, especially for a time-varying unknown system with low signal-noise-rate (SNR). The optimal step-size of OPLMF is obtained by minimi…
▽ More
There is a need to improve the capability of the adaptive filtering algorithm against Gaussian or multiple types of non-Gaussian noises, time-varying system, and systems with low SNR. In this paper, we propose an optimized least mean absolute fourth (OPLMF) algorithm, especially for a time-varying unknown system with low signal-noise-rate (SNR). The optimal step-size of OPLMF is obtained by minimizing the mean-square deviation (MSD) at any given moment in time. In addition, the mean convergence and steady-state error of OPLMF are derived. Also the theoretical computational complexity of OPLMF is analyzed. Furthermore, the simulation experiment results of system identification are used to illustrate the principle and efficiency of the OPLMF algorithm. The performance of the algorithm is analyzed mathematically and validated experimentally. Simulation results demonstrate that the proposed OPLMF is superior to the normalized LMF (NLMF) and variable step-size of LMF using quotient form (VSSLMFQ) algorithms.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Convex Combination of Overlap-Save Frequency-Domain Adaptive Filters
Authors:
Sihai Guan,
Zhi Li
Abstract:
In order to decrease the steady-state error and reduce the computational complexity and increase the ability to identify a large unknown system, a convex combination of overlap-save frequency-domain adaptive filters (COSFDAF) algorithm is proposed. From the articles available, most papers discuss convex combinations of adaptive-filter algorithms focusing on the time domain. Those algorithms show b…
▽ More
In order to decrease the steady-state error and reduce the computational complexity and increase the ability to identify a large unknown system, a convex combination of overlap-save frequency-domain adaptive filters (COSFDAF) algorithm is proposed. From the articles available, most papers discuss convex combinations of adaptive-filter algorithms focusing on the time domain. Those algorithms show better performances in convergence speed and steady-state error. The major defect of those algorithms, however, is the computational complexity. To deal with this problem and motivated by frequency-domain adaptive filters (FDAF) and convex optimization, this paper gives an adaptive filter algorithm, that consists of combining the two FDAFs using the convex combination principles and derives a formula to update the mixing parameter. The computational complexity of the COSFDAF is analyzed theoretically. The simulation results show that no matter what kinds of signal to be processed, whether correlated (i.e. colored noise) or uncorrelated (i.e. white noise), the proposed algorithm has better performance in identify the unknown coefficients when compared to a single overlap-save FDAF or the convex combination of two time-domain adaptive filters.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Noise constrained least mean absolute third algorithm
Authors:
Sihai Guan,
Zhi Li
Abstract:
The learning speed of an adaptive algorithm can be improved by properly constraining the cost function of the adaptive algorithm. Besides, the stabilization of the NCLMF algorithm is more complicated, whose stability depends solely on the input power of the adaptive filter and the NCLMF algorithm with unbounded repressors is not mean square stability even for a small value of the step-size. So, in…
▽ More
The learning speed of an adaptive algorithm can be improved by properly constraining the cost function of the adaptive algorithm. Besides, the stabilization of the NCLMF algorithm is more complicated, whose stability depends solely on the input power of the adaptive filter and the NCLMF algorithm with unbounded repressors is not mean square stability even for a small value of the step-size. So, in this paper, a noise variance constrained least mean absolute third (LMAT) algorithm is investigated. The noise constrained LMAT (NCLMAT) algorithm is obtained by constraining the cost function of the standard LMAT algorithm to the third-order moment of the additive noise. And it can eliminate a variety of non-Gaussian distribution of noise, such as Rayleigh noise, Binary noise and so on. The NCLMAT algorithm is a type of variable step-size LMAT algorithm where the step-size rule arises naturally from the constraints. The main aim of this work is first time to derive the NCLMAT adaptive algorithm, analyze its convergence behavior, mean square error (MSE), mean-square deviation (MSD) and assess its performance in different noise environments. Finally, the experimental results in system identification applications presented here illustrate the principle and efficiency of the NCLMAT algorithm.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.