Search | arXiv e-print repository

Transmission Benefits and Cost Allocation under Ambiguity

Abstract: Disputes over cost allocation can present a significant barrier to investment in shared infrastructure. While it may be desirable to allocate cost in a way that corresponds to expected benefits, investments in long-lived projects are made under conditions of substantial uncertainty. In the context of electricity transmission, uncertainty combined with the inherent complexity of power systems analy… ▽ More Disputes over cost allocation can present a significant barrier to investment in shared infrastructure. While it may be desirable to allocate cost in a way that corresponds to expected benefits, investments in long-lived projects are made under conditions of substantial uncertainty. In the context of electricity transmission, uncertainty combined with the inherent complexity of power systems analysis prevents the calculation of an estimated distribution of benefits that is agreeable to all participants. To analyze aspects of the cost allocation problem, we construct a model for transmission and generation expansion planning under uncertainty, enabling the identification of transmission investments as well as the calculation of benefits for users of the network. Numerical tests confirm the potential for realized benefits at the participant level to differ significantly from ex ante estimates. Based on the model and numerical tests we discuss several issues, including 1) establishing a valid counterfactual against which to measure benefits, 2) allocating cost to new and incumbent generators vs. solely allocating to loads, 3) calculating benefits at the portfolio vs. the individual project level, 4) identifying losers in a surplus-enhancing transmission expansion, and 5) quantifying the divergence between cost allocation decisions made ex ante and benefits realized ex post. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 32 pages, 7 figures, 7 tables

arXiv:2402.03492 [pdf, other]

Beyond Strong labels: Weakly-supervised Learning Based on Gaussian Pseudo Labels for The Segmentation of Ellipse-like Vascular Structures in Non-contrast CTs

Authors: Qixiang Ma, Antoine Łucas, Huazhong Shu, Adrien Kaladji, Pascal Haigron

Abstract: Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-i… ▽ More Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a weakly-supervised framework using ellipses' topology in slices, including 1) an efficient annotation process based on predefined standards, 2) ellipse-fitting processing, 3) the generation of 2D Gaussian heatmaps serving as pseudo labels, 4) a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54\% of Dice score on average), reducing labeling time by around 82.0\%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74\% of Dice score on average) with a reduction of 66.3\% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95\% in Dice score for 2D models while a reduction of 11.65 voxel spacing in Hausdorff distance for 3D model. △ Less

Submitted 10 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2309.00885 [pdf, other]

doi 10.1016/j.media.2023.102945

A Generic Fundus Image Enhancement Network Boosted by Frequency Self-supervised Representation Learning

Authors: Heng Li, Haofeng Liu, Huazhu Fu, Yanwu Xu, Hui Shu, Ke Niu, Yan Hu, Jiang Liu

Abstract: Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement… ▽ More Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: Accepted by Medical Image Analysis in Auguest, 2023

Journal ref: Medical Image Analysis, 2023, 90:102945

arXiv:2209.05913 [pdf, other]

doi 10.1109/TIP.2022.3207571

Dual-Scale Single Image Dehazing Via Neural Augmentation

Authors: Zhengguo Li, Chaobing Zheng, Haiyan Shu, Shiqian Wu

Abstract: Model-based single image dehazing algorithms restore haze-free images with sharp edges and rich details for real-world hazy images at the expense of low PSNR and SSIM values for synthetic hazy images. Data-driven ones restore haze-free images with high PSNR and SSIM values for synthetic hazy images but with low contrast, and even some remaining haze for real world hazy images. In this paper, a nov… ▽ More Model-based single image dehazing algorithms restore haze-free images with sharp edges and rich details for real-world hazy images at the expense of low PSNR and SSIM values for synthetic hazy images. Data-driven ones restore haze-free images with high PSNR and SSIM values for synthetic hazy images but with low contrast, and even some remaining haze for real world hazy images. In this paper, a novel single image dehazing algorithm is introduced by combining model-based and data-driven approaches. Both transmission map and atmospheric light are first estimated by the model-based methods, and then refined by dual-scale generative adversarial networks (GANs) based approaches. The resultant algorithm forms a neural augmentation which converges very fast while the corresponding data-driven approach might not converge. Haze-free images are restored by using the estimated transmission map and atmospheric light as well as the Koschmiederlaw. Experimental results indicate that the proposed algorithm can remove haze well from real-world and synthetic hazy images. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Single image dehazing, dual-scale, neural augmentation, haze line averaging, generative adversarial network. arXiv admin note: substantial text overlap with arXiv:2111.10943

arXiv:2206.04684 [pdf, other]

Structure-consistent Restoration Network for Cataract Fundus Image Enhancement

Authors: Heng Li, Haofeng Liu, Huazhu Fu, Hai Shu, Yitian Zhao, Xiaoling Luo, Yan Hu, Jiang Liu

Abstract: Fundus photography is a routine examination in clinics to diagnose and monitor ocular diseases. However, for cataract patients, the fundus image always suffers quality degradation caused by the clouding lens. The degradation prevents reliable diagnosis by ophthalmologists or computer-aided systems. To improve the certainty in clinical diagnosis, restoration algorithms have been proposed to enhance… ▽ More Fundus photography is a routine examination in clinics to diagnose and monitor ocular diseases. However, for cataract patients, the fundus image always suffers quality degradation caused by the clouding lens. The degradation prevents reliable diagnosis by ophthalmologists or computer-aided systems. To improve the certainty in clinical diagnosis, restoration algorithms have been proposed to enhance the quality of fundus images. Unfortunately, challenges remain in the deployment of these algorithms, such as collecting sufficient training data and preserving retinal structures. In this paper, to circumvent the strict deployment requirement, a structure-consistent restoration network (SCR-Net) for cataract fundus images is developed from synthesized data that shares an identical structure. A cataract simulation model is firstly designed to collect synthesized cataract sets (SCS) formed by cataract fundus images sharing identical structures. Then high-frequency components (HFCs) are extracted from the SCS to constrain structure consistency such that the structure preservation in SCR-Net is enforced. The experiments demonstrate the effectiveness of SCR-Net in the comparison with state-of-the-art methods and the follow-up clinical applications. The code is available at https://github.com/liamheng/ArcNet-Medical-Image-Enhancement. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.14833 [pdf, other]

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

Authors: Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, **de Song, Bin Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen

Abstract: To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a c… ▽ More To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment, while facilitating daily task iteration. Specifically, the compute container is based on Mobile Neural Network (MNN), a tensor compute engine along with the data processing and model execution libraries, which are exposed through a refined Python thread-level virtual machine (VM) to support diverse ML tasks and concurrent task execution. The core of MNN is the novel mechanisms of operator decomposition and semi-auto search, sharply reducing the workload in manually optimizing hundreds of operators for tens of hardware backends and further quickly identifying the best backend with runtime optimization for a computation graph. The data pipeline introduces an on-device stream processing framework to enable processing user behavior data at source. The deployment platform releases ML tasks with an efficient push-then-pull method and supports multi-granularity deployment policies. We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability. Extensive micro-benchmarks also highlight the superior performance of MNN and the Python thread-level VM. Walle has been in large-scale production use in Alibaba, while MNN has been open source with a broad impact in the community. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: Accepted by OSDI 2022

arXiv:2205.04846 [pdf, other]

MNet: Rethinking 2D/3D Networks for Anisotropic Medical Image Segmentation

Authors: Zhangfu Dong, Yuting He, Xiaoming Qi, Yang Chen, Huazhong Shu, Jean-Louis Coatrieux, Guanyu Yang, Shuo Li

Abstract: The nature of thick-slice scanning causes severe inter-slice discontinuities of 3D medical images, and the vanilla 2D/3D convolutional neural networks (CNNs) fail to represent sparse inter-slice information and dense intra-slice information in a balanced way, leading to severe underfitting to inter-slice features (for vanilla 2D CNNs) and overfitting to noise from long-range slices (for vanilla 3D… ▽ More The nature of thick-slice scanning causes severe inter-slice discontinuities of 3D medical images, and the vanilla 2D/3D convolutional neural networks (CNNs) fail to represent sparse inter-slice information and dense intra-slice information in a balanced way, leading to severe underfitting to inter-slice features (for vanilla 2D CNNs) and overfitting to noise from long-range slices (for vanilla 3D CNNs). In this work, a novel mesh network (MNet) is proposed to balance the spatial representation inter axes via learning. 1) Our MNet latently fuses plenty of representation processes by embedding multi-dimensional convolutions deeply into basic modules, making the selections of representation processes flexible, thus balancing representation for sparse inter-slice information and dense intra-slice information adaptively. 2) Our MNet latently fuses multi-dimensional features inside each basic module, simultaneously taking the advantages of 2D (high segmentation accuracy of the easily recognized regions in 2D view) and 3D (high smoothness of 3D organ contour) representations, thus obtaining more accurate modeling for target regions. Comprehensive experiments are performed on four public datasets (CT\&MR), the results consistently demonstrate the proposed MNet outperforms the other methods. The code and datasets are available at: https://github.com/zfdong-code/MNet △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted by IJCAI 2022

arXiv:2111.00242 [pdf]

Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Authors: Jiasong Wu, Qingchun Li, Guanyu Yang, Lei Li, Lotfi Senhadji, Huazhong Shu

Abstract: In traditional speech denoising tasks, clean audio signals are often used as the training target, but absolutely clean signals are collected from expensive recording equipment or in studios with the strict environments. To overcome this drawback, we propose an end-to-end self-supervised speech denoising training scheme using only noisy audio signals, named Only-Noisy Training (ONT), without extra… ▽ More In traditional speech denoising tasks, clean audio signals are often used as the training target, but absolutely clean signals are collected from expensive recording equipment or in studios with the strict environments. To overcome this drawback, we propose an end-to-end self-supervised speech denoising training scheme using only noisy audio signals, named Only-Noisy Training (ONT), without extra training conditions. The proposed ONT strategy constructs training pairs only from each single noisy audio, and it contains two modules: training audio pairs generated module and speech denoising module. The first module adopts a random audio sub-sampler on each noisy audio to generate training pairs. The sub-sampled pairs are then fed into a novel complex-valued speech denoising module. Experimental results show that the proposed method not only eliminates the high dependence on clean targets of traditional audio denoising tasks, but also achieves on-par or better performance than other training strategies. Availability-ONT is available at https://github.com/liqingchunnnn/Only-Noisy-Training △ Less

Submitted 19 January, 2023; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: 11 pages, 4 figures, 6 tables

arXiv:2109.12271 [pdf, other]

doi 10.1007/978-3-031-09002-8_1

BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation

Authors: Qiran Jia, Hai Shu

Abstract: Convolutional neural networks (CNNs) have achieved remarkable success in automatically segmenting organs or lesions on 3D medical images. Recently, vision transformer networks have exhibited exceptional performance in 2D image classification tasks. Compared with CNNs, transformer networks have an appealing advantage of extracting long-range features due to their self-attention algorithm. Therefore… ▽ More Convolutional neural networks (CNNs) have achieved remarkable success in automatically segmenting organs or lesions on 3D medical images. Recently, vision transformer networks have exhibited exceptional performance in 2D image classification tasks. Compared with CNNs, transformer networks have an appealing advantage of extracting long-range features due to their self-attention algorithm. Therefore, we propose a CNN-Transformer combined model, called BiTr-Unet, with specific modifications for brain tumor segmentation on multi-modal MRI scans. Our BiTr-Unet achieves good performance on the BraTS2021 validation dataset with median Dice score 0.9335, 0.9304 and 0.8899, and median Hausdorff distance 2.8284, 2.2361 and 1.4142 for the whole tumor, tumor core, and enhancing tumor, respectively. On the BraTS2021 testing dataset, the corresponding results are 0.9257, 0.9350 and 0.8874 for Dice score, and 3, 2.2361 and 1.4142 for Hausdorff distance. The code is publicly available at https://github.com/JustaTinyDot/BiTr-Unet. △ Less

Submitted 30 December, 2021; v1 submitted 25 September, 2021; originally announced September 2021.

Comments: Accepted by MICCAI BrainLes 2021

Journal ref: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries.(BrainLes 2021). LNCS 12963, pp. 3-14, 2022

arXiv:2106.04130 [pdf, other]

EnMcGAN: Adversarial Ensemble Learning for 3D Complete Renal Structures Segmentation

Authors: Yuting He, Rongjun Ge, Xiaoming Qi, Guanyu Yang, Yang Chen, Youyong Kong, Huazhong Shu, Jean-Louis Coatrieux, Shuo Li

Abstract: 3D complete renal structures(CRS) segmentation targets on segmenting the kidneys, tumors, renal arteries and veins in one inference. Once successful, it will provide preoperative plans and intraoperative guidance for laparoscopic partial nephrectomy(LPN), playing a key role in the renal cancer treatment. However, no success has been reported in 3D CRS segmentation due to the complex shapes of rena… ▽ More 3D complete renal structures(CRS) segmentation targets on segmenting the kidneys, tumors, renal arteries and veins in one inference. Once successful, it will provide preoperative plans and intraoperative guidance for laparoscopic partial nephrectomy(LPN), playing a key role in the renal cancer treatment. However, no success has been reported in 3D CRS segmentation due to the complex shapes of renal structures, low contrast and large anatomical variation. In this study, we utilize the adversarial ensemble learning and propose Ensemble Multi-condition GAN(EnMcGAN) for 3D CRS segmentation for the first time. Its contribution is three-fold. 1)Inspired by windowing, we propose the multi-windowing committee which divides CTA image into multiple narrow windows with different window centers and widths enhancing the contrast for salient boundaries and soft tissues. And then, it builds an ensemble segmentation model on these narrow windows to fuse the segmentation superiorities and improve whole segmentation quality. 2)We propose the multi-condition GAN which equips the segmentation model with multiple discriminators to encourage the segmented structures meeting their real shape conditions, thus improving the shape feature extraction ability. 3)We propose the adversarial weighted ensemble module which uses the trained discriminators to evaluate the quality of segmented structures, and normalizes these evaluation scores for the ensemble weights directed at the input image, thus enhancing the ensemble results. 122 patients are enrolled in this study and the mean Dice coefficient of the renal structures achieves 84.6%. Extensive experiments with promising results on renal structures reveal powerful segmentation accuracy and great clinical significance in renal cancer treatment. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Journal ref: Information Processing in Medical Imaging (IPMI) 2021

arXiv:2011.02881 [pdf, other]

doi 10.1007/978-3-030-72084-1_39

A Two-Stage Cascade Model with Variational Autoencoders and Attention Gates for MRI Brain Tumor Segmentation

Authors: Chenggang Lyu, Hai Shu

Abstract: Automatic MRI brain tumor segmentation is of vital importance for the disease diagnosis, monitoring, and treatment planning. In this paper, we propose a two-stage encoder-decoder based model for brain tumor subregional segmentation. Variational autoencoder regularization is utilized in both stages to prevent the overfitting issue. The second-stage network adopts attention gates and is trained addi… ▽ More Automatic MRI brain tumor segmentation is of vital importance for the disease diagnosis, monitoring, and treatment planning. In this paper, we propose a two-stage encoder-decoder based model for brain tumor subregional segmentation. Variational autoencoder regularization is utilized in both stages to prevent the overfitting issue. The second-stage network adopts attention gates and is trained additionally using an expanded dataset formed by the first-stage outputs. On the BraTS 2020 validation dataset, the proposed method achieves the mean Dice score of 0.9041, 0.8350, and 0.7958, and Hausdorff distance (95%) of 4.953, 6.299, and 23.608 for the whole tumor, tumor core, and enhancing tumor, respectively. The corresponding results on the BraTS 2020 testing dataset are 0.8729, 0.8357, and 0.8205 for Dice score, and 11.4288, 19.9690, and 15.6711 for Hausdorff distance. The code is publicly available at https://github.com/shu-hai/two-stage-VAE-Attention-gate-BraTS2020. △ Less

Submitted 28 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

Journal ref: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (BrainLes 2020)

arXiv:2010.14841 [pdf, other]

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices

Authors: Yiwu Yao, Yuchao Li, Chengyu Wang, Tianhang Yu, Houjiang Chen, Xiaotang Jiang, Jun Yang, Jun Huang, Wei Lin, Hui Shu, Chengfei Lv

Abstract: The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models. To avoid the information loss due to the combination of quantization… ▽ More The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models. To avoid the information loss due to the combination of quantization and Winograd convolution, a Range-Scaled Quantization (RSQ) training method is proposed to expand the quantized numerical range and to distill knowledge from high-precision values. Moreover, an improved Conv1D equipped DFSMN (ConvDFSMN) model is designed for mobile deployment. We conduct extensive experiments on both ConvDFSMN and Wav2letter models. Results demonstrate the models can be effectively optimized with the proposed pipeline. Especially, Wav2letter achieves 1.48* speedup with an approximate 0.07% WER decrease on ARMv7-based mobile devices. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2007.14177 [pdf]

Generative networks as inverse problems with fractional wavelet scattering networks

Authors: Jiasong Wu, **g Zhang, Fuzhi Wu, Youyong Kong, Guanyu Yang, Lotfi Senhadji, Huazhong Shu

Abstract: Deep learning is a hot research topic in the field of machine learning methods and applications. Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but both of them are difficult to train since they need to train the generator (or encoder) and the discriminator (or decoder) simultaneously, which is easy to cau… ▽ More Deep learning is a hot research topic in the field of machine learning methods and applications. Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but both of them are difficult to train since they need to train the generator (or encoder) and the discriminator (or decoder) simultaneously, which is easy to cause unstable training. In order to solve or alleviate the synchronous training difficult problems of GANs and VAEs, recently, researchers propose Generative Scattering Networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain the features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate the image. The advantage of GSNs is the parameters of ScatNets are not needed to learn, and the disadvantage of GSNs is that the expression ability of ScatNets is slightly weaker than CNNs and the dimensional reduction method of Principal Component Analysis (PCA) is easy to lead overfitting in the training of GSNs, and therefore affect the generated quality in the testing process. In order to further improve the quality of generated images while keep the advantages of GSNs, this paper proposes Generative Fractional Scattering Networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets) instead of ScatNets as the encoder to obtain the features (or FrScatNet embeddings) and use the similar CNNs of GSNs as the decoder to generate the image. Additionally, this paper develops a new dimensional reduction method named Feature-Map Fusion (FMF) instead of PCA for better kee** the information of FrScatNets and the effect of image fusion on the quality of image generation is also discussed. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 27 pages, 13 figures, 6 tables

arXiv:2007.10629 [pdf]

CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language

Authors: Jiasong Wu, Xuan Li, Taotao Li, Fanman Meng, Youyong Kong, Guanyu Yang, Lotfi Senhadji, Huazhong Shu

Abstract: Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network… ▽ More Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network for learning the combination of three modalities, audio, face, and sign language information, for better solving the speech separation problem. To train the model, we introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset, in which three modalities of audio, face, and sign language coexist. Experiment results show that the proposed model has better performance and robustness than the usual audio-visual system. Besides, sign language modality can also be used alone to supervise speech separation tasks, and the introduction of sign language is helpful for hearing-impaired people to learn and communicate. Last, our model is a general speech separation framework and can achieve very competitive separation performance on two open-source audio-visual datasets. The code is available at https://github.com/iveveive/SLNSpeech △ Less

Submitted 2 November, 2023; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: 13 pages, 6 figures, 5 tables

arXiv:2003.09279 [pdf, other]

Control Reconfiguration of Dynamical Systems for Improved Performance via Reverse- and Forward-engineering

Authors: Han Shu, Xuan Zhang, Na Li, Antonis Papachristodoulou

Abstract: This paper presents a control reconfiguration approach to improve the performance of two classes of dynamical systems. Motivated by recent research on re-engineering cyber-physical systems, we propose a three-step control retrofit procedure. First, we reverse-engineer a dynamical system to dig out an optimization problem it actually solves. Second, we forward-engineer the system by applying a corr… ▽ More This paper presents a control reconfiguration approach to improve the performance of two classes of dynamical systems. Motivated by recent research on re-engineering cyber-physical systems, we propose a three-step control retrofit procedure. First, we reverse-engineer a dynamical system to dig out an optimization problem it actually solves. Second, we forward-engineer the system by applying a corresponding faster algorithm to solve this optimization problem. Finally, by comparing the original and accelerated dynamics, we obtain the implementation of the redesigned part (the extra dynamics). As a result, the convergence rate/speed or transient behavior of the given system can be improved while the system control structure is maintained. Internet congestion control and distributed proportional-integral (PI) control, as applications in the two different classes of target systems, are used to show the effectiveness of the proposed approach. △ Less

Submitted 5 January, 2021; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: 20 pages, 3 figures

arXiv:2003.03519 [pdf, other]

Distilling portable Generative Adversarial Networks for Image Translation

Authors: Hanting Chen, Yunhe Wang, Han Shu, Changyuan Wen, Chun**g Xu, Boxin Shi, Chao Xu, Chang Xu

Abstract: Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters i… ▽ More Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Journal ref: AAAI 2020

arXiv:2002.11581 [pdf, other]

Automatically Searching for U-Net Image Translator Architecture

Authors: Han Shu, Yunhe Wang

Abstract: Image translators have been successfully applied to many important low level image processing tasks. However, classical network architecture of image translator like U-Net, is borrowed from other vision tasks like biomedical image segmentation. This straightforward adaptation may not be optimal and could cause redundancy in the network structure. In this paper, we propose an automatic architecture… ▽ More Image translators have been successfully applied to many important low level image processing tasks. However, classical network architecture of image translator like U-Net, is borrowed from other vision tasks like biomedical image segmentation. This straightforward adaptation may not be optimal and could cause redundancy in the network structure. In this paper, we propose an automatic architecture searching method for image translator. By utilizing evolutionary algorithm, we investigate a more efficient network architecture which costs less computation resources and achieves better performance than the original one. Extensive qualitative and quantitative experiments are conducted to demonstrate the effectiveness of the proposed method. Moreover, we transplant the searched network architecture to other datasets which are not involved in the architecture searching procedure. Efficiency of the searched architecture on these datasets further demonstrates the generalization of the method. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:1911.10145 [pdf]

Machine-learning-based Classification of Lower-grade gliomas and High-grade gliomas using Radiomic Features in Multi-parametric MRI

Authors: Ge Cui, Jiwoong Jeong, Bob Press, Yang Lei, Hui-Kuo Shu, Tian Liu, Walter Curran, Hui Mao, Xiaofeng Yang

Abstract: Objectives: Glioblastomas are the most aggressive brain and central nervous system (CNS) tumors with poor prognosis in adults. The purpose of this study is to develop a machine-learning based classification method using radio-mic features of multi-parametric MRI to classify high-grade gliomas (HGG) and low-grade gliomas (LGG). Methods: Multi-parametric MRI of 80 patients, 40 HGG and 40 LGG, with g… ▽ More Objectives: Glioblastomas are the most aggressive brain and central nervous system (CNS) tumors with poor prognosis in adults. The purpose of this study is to develop a machine-learning based classification method using radio-mic features of multi-parametric MRI to classify high-grade gliomas (HGG) and low-grade gliomas (LGG). Methods: Multi-parametric MRI of 80 patients, 40 HGG and 40 LGG, with gliomas from the MICCAI BRATs 2015 training database were used in this study. Each patient's T1, contrast-enhanced T1, T2, and Fluid Attenuated Inversion Recovery (FLAIR) MRIs as well as the tumor contours were provided in the database. Using the given contours, radiomic features from all four multi-parametric MRIs were extracted. Of these features, a feature selection process using two-sample T-test and least absolute shrinkage, selection operator (LASSO), and a feature correlation threshold was applied to various combinations of T1, contrast-enhanced T1, T2, and FLAIR MRIs separately. These selected features were then used to train, test, and cross-validate a random forest to differentiate HGG and LGG. Finally, the classification accuracy and area under the curve (AUC) were used to evaluate the classification method. Results: Optimized parameters showed that on average, the overall accuracy of our classification method was 0.913 or 73 out of 80 correct classifications, 36/40 for HGG and 37/40 for LGG, with an AUC of 0.956 based on the combination with FLAIR, T1, T1c and T2 MRIs. Conclusion: This study shows that radio-mic features derived from multi-parametric MRI could be used to accurately classify high and lower grade gliomas. The radio-mic features from multi-parametric MRI in combination with even more advanced machine learning methods may further elucidate the underlying tumor biology and response to therapy. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 14 pages, 5 figures

arXiv:1911.09264 [pdf]

Air, bone and soft-tissue Segmentation on 3D brain MRI Using Semantic Classification Random Forest with Auto-Context Model

Authors: Xue Dong, Yang Lei, Sibo Tian, Yingzi Liu, Tonghe Wang, Tian Liu, Walter J. Curran, Hui Mao, Hui-Kuo Shu, Xiaofeng Yang

Abstract: As bone and air produce weak signals with conventional MR sequences, segmentation of these tissues particularly difficult in MRI. We propose to integrate patch-based anatomical signatures and an auto-context model into a machine learning framework to iteratively segment MRI into air, bone and soft tissue. The proposed semantic classification random forest (SCRF) method consists of a training stage… ▽ More As bone and air produce weak signals with conventional MR sequences, segmentation of these tissues particularly difficult in MRI. We propose to integrate patch-based anatomical signatures and an auto-context model into a machine learning framework to iteratively segment MRI into air, bone and soft tissue. The proposed semantic classification random forest (SCRF) method consists of a training stage and a segmentation stage. During training stage, patch-based anatomical features were extracted from registered MRI-CT training images, and the most informative features were identified to train a series of classification forests with auto-context model. During segmentation stage, we extracted selected features from MRI and fed them into the well-trained forests for MRI segmentation. The DSC for air, bone and soft tissue obtained with proposed SCRF were 0.976, 0.819 and 0.932, compared to 0.916, 0.673 and 0.830 with RF, 0.942, 0.791 and 0.917 with U-Net. SCRF also demonstrated superior segmentation performances for sensitivity and specificity over RF and U-Net for all three structure types. The proposed segmentation technique could be a useful tool to segment bone, air and soft tissue, and have the potential to be applied to attenuation correction of PET/MRI system, MRI-only radiation treatment planning and MR-guided focused ultrasound surgery. △ Less

Submitted 22 November, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: 18 pages, 8 figures

arXiv:1907.11837 [pdf, other]

Attribute Aware Pooling for Pedestrian Attribute Recognition

Authors: Kai Han, Yunhe Wang, Han Shu, Chuanjian Liu, Chun**g Xu, Chang Xu

Abstract: This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers t… ▽ More This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition. △ Less

Submitted 26 July, 2019; originally announced July 2019.

Comments: Accepted by IJCAI 2019

arXiv:1907.10804 [pdf, other]

Co-Evolutionary Compression for Unpaired Image Translation

Authors: Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chun**g Xu, Qi Tian, Chang Xu

Abstract: Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation. However, generators in these networks are of complicated architectures with large number of parameters and huge computational complexities. Existing methods are mainly designed for compressing and speeding-up deep neural networks in the classificatio… ▽ More Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation. However, generators in these networks are of complicated architectures with large number of parameters and huge computational complexities. Existing methods are mainly designed for compressing and speeding-up deep neural networks in the classification task, and cannot be directly applied on GANs for image translation, due to their different objectives and training procedures. To this end, we develop a novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously. In practice, generators for two image domains are encoded as two populations and synergistically optimized for investigating the most important convolution filters iteratively. Fitness of each individual is calculated using the number of parameters, a discriminator-aware regularization, and the cycle consistency. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method for obtaining compact and effective generators. △ Less

Submitted 24 July, 2019; originally announced July 2019.

Comments: Accepted by ICCV 2019

arXiv:1509.07951 [pdf]

Error Gradient-based Variable-Lp Norm Constraint LMS Algorithm for Sparse System Identification

Authors: Yong Feng, Fei Chen, Rui Zeng, Jiasong Wu, Huazhong Shu

Abstract: Sparse adaptive filtering has gained much attention due to its wide applicability in the field of signal processing. Among the main algorithm families, sparse norm constraint adaptive filters develop rapidly in recent years. However, when applied for system identification, most priori work in sparse norm constraint adaptive filtering suffers from the difficulty of adaptability to the sparsity of t… ▽ More Sparse adaptive filtering has gained much attention due to its wide applicability in the field of signal processing. Among the main algorithm families, sparse norm constraint adaptive filters develop rapidly in recent years. However, when applied for system identification, most priori work in sparse norm constraint adaptive filtering suffers from the difficulty of adaptability to the sparsity of the systems to be identified. To address this problem, we propose a novel variable p-norm constraint least mean square (LMS) algorithm, which serves as a variant of the conventional Lp-LMS algorithm established for sparse system identification. The parameter p is iteratively adjusted by the gradient descent method applied to the instantaneous square error. Numerical simulations show that this new approach achieves better performance than the traditional Lp-LMS and LMS algorithms in terms of steady-state error and convergence rate. △ Less

Submitted 26 September, 2015; originally announced September 2015.

Comments: Submitted to 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 5 pages, 2 tables, 2 figures, 15 equations, 15 references

arXiv:1503.01185 [pdf]

Gradient Compared Lp-LMS Algorithms for Sparse System Identification

Authors: Yong Feng, Jiasong Wu, Rui Zeng, Limin Luo, Huazhong Shu

Abstract: In this paper, we propose two novel p-norm penalty least mean square (Lp-LMS) algorithms as supplements of the conventional Lp-LMS algorithm established for sparse adaptive filtering recently. A gradient comparator is employed to selectively apply the zero attractor of p-norm constraint for only those taps that have the same polarity as that of the gradient of the squared instantaneous error, whic… ▽ More In this paper, we propose two novel p-norm penalty least mean square (Lp-LMS) algorithms as supplements of the conventional Lp-LMS algorithm established for sparse adaptive filtering recently. A gradient comparator is employed to selectively apply the zero attractor of p-norm constraint for only those taps that have the same polarity as that of the gradient of the squared instantaneous error, which leads to the new proposed gradient compared p-norm constraint LMS algorithm (LpGC-LMS). We explain that the LpGC-LMS can achieve lower mean square error than the standard Lp-LMS algorithm theoretically and experimentally. To further improve the performance of the filter, the LpNGC-LMS algorithm is derived using a new gradient comparator which takes the sign-smoothed version of the previous one. The performance of the LpNGC-LMS is superior to that of the LpGC-LMS in theory and in simulations. Moreover, these two comparators can be easily applied to other norm constraint LMS algorithms to derive some new approaches for sparse adaptive filtering. The numerical simulation results show that the two proposed algorithms achieve better performance than the standard LMS algorithm and Lp-LMS algorithm in terms of convergence rate and steady-state behavior in sparse system identification settings. △ Less

Submitted 10 March, 2015; v1 submitted 3 March, 2015; originally announced March 2015.

Comments: Submitted to 27th Chinese Control and Decision Conference (CCDC 2015), 5 pages, 4 tables, 5 figures, 7 equations, 11 references

Showing 1–23 of 23 results for author: Shu, H