-
Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data
Authors:
Shan Cong,
Zhoujie Fan,
Hongwei Liu,
Yinghan Zhang,
Xin Wang,
Haoran Luo,
Xiaohui Yao
Abstract:
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,…
▽ More
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities, most studies overlook the informativeness disparities between modalities. Here, we propose TMM, a trusted multiview multimodal graph attention framework for AD diagnosis, using extensive brain-wide transcriptomics and imaging data. First, we construct view-specific brain regional co-function networks (RRIs) from transcriptomics and multimodal radiomics data to incorporate interaction information from both biomolecular and imaging perspectives. Next, we apply graph attention (GAT) processing to each RRI network to produce graph embeddings and employ cross-modal attention to fuse transcriptomics-derived embedding with each imagingderived embedding. Finally, a novel true-false-harmonized class probability (TFCP) strategy is designed to assess and adaptively adjust the prediction confidence of each modality for AD diagnosis. We evaluate TMM using the AHBA database with brain-wide transcriptomics data and the ADNI database with three imaging modalities (AV45-PET, FDG-PET, and VBM-MRI). The results demonstrate the superiority of our method in identifying AD, EMCI, and LMCI compared to state-of-the-arts. Code and data are available at https://github.com/Yaolab-fantastic/TMM.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution
Authors:
Yihong Chen,
Zhen Fan,
Shuai Dong,
Zhiwei Chen,
Wenjie Li,
Minghui Qin,
Min Zeng,
Xubing Lu,
Guofu Zhou,
Xingsen Gao,
Jun-Ming Liu
Abstract:
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co…
▽ More
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhi**g Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Hai** Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail
Authors:
Ming** Chen,
Junhao Chen,
Xiaojun Ye,
Huan-ang Gao,
Xiaoxue Chen,
Zhaoxin Fan,
Hao Zhao
Abstract:
3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the re…
▽ More
3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture map**. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture map** method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning
Authors:
Zhaoxin Fan,
Runmin Jiang,
Junhao Wu,
Xin Huang,
Tianyang Wang,
Heng Huang,
Min Xu
Abstract:
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation,…
▽ More
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation, we propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging. Our pipeline integrates three innovative components: a probability-based pseudo-label generation technique for synthesizing dense segmentation masks from sparse annotations, a Probabilistic Multi-head Self-Attention network for robust feature extraction within our Probabilistic Transformer Network, and a Probability-informed Segmentation Loss Function to enhance training with annotation confidence. Demonstrating significant advances, our approach not only rivals the performance of fully supervised methods but also surpasses existing weakly supervised methods in CT and MRI datasets, achieving up to 18.1% improvement in Dice scores for certain organs. The code is available at https://github.com/runminjiang/PW4MedSeg.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Authors:
Zhiyun Fan,
Linhao Dong,
Jun Zhang,
Lu Lu,
Zejun Ma
Abstract:
Multi-talker automatic speech recognition plays a crucial role in scenarios involving multi-party interactions, such as meetings and conversations. Due to its inherent complexity, this task has been receiving increasing attention. Notably, the serialized output training (SOT) stands out among various approaches because of its simplistic architecture and exceptional performance. However, the freque…
▽ More
Multi-talker automatic speech recognition plays a crucial role in scenarios involving multi-party interactions, such as meetings and conversations. Due to its inherent complexity, this task has been receiving increasing attention. Notably, the serialized output training (SOT) stands out among various approaches because of its simplistic architecture and exceptional performance. However, the frequent speaker changes in token-level SOT (t-SOT) present challenges for the autoregressive decoder in effectively utilizing context to predict output sequences. To address this issue, we introduce a masked t-SOT label, which serves as the cornerstone of an auxiliary training loss. Additionally, we utilize a speaker similarity matrix to refine the self-attention mechanism of the decoder. This strategic adjustment enhances contextual relationships within the same speaker's tokens while minimizing interactions between different speakers' tokens. We denote our method as speaker-aware SOT (SA-SOT). Experiments on the Librispeech datasets demonstrate that our SA-SOT obtains a relative cpWER reduction ranging from 12.75% to 22.03% on the multi-talker test sets. Furthermore, with more extensive training, our method achieves an impressive cpWER of 3.41%, establishing a new state-of-the-art result on the LibrispeechMix dataset.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Adversarial Purification and Fine-tuning for Robust UDC Image Restoration
Authors:
Zhenbo Song,
Zhenyuan Zhang,
Kaihao Zhang,
Wenhan Luo,
Zhaoxin Fan,
Jianfeng Lu
Abstract:
This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks. Despite its innovative approach to seamless display integration, UDC technology faces unique image degradation challenges exacerbated by the susceptibility to adversarial perturbations. Our research initially conducts an in-depth robustness evalua…
▽ More
This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks. Despite its innovative approach to seamless display integration, UDC technology faces unique image degradation challenges exacerbated by the susceptibility to adversarial perturbations. Our research initially conducts an in-depth robustness evaluation of deep-learning-based UDC image restoration models by employing several white-box and black-box attacking methods. This evaluation is pivotal in understanding the vulnerabilities of current UDC image restoration techniques. Following the assessment, we introduce a defense framework integrating adversarial purification with subsequent fine-tuning processes. First, our approach employs diffusion-based adversarial purification, effectively neutralizing adversarial perturbations. Then, we apply the fine-tuning methodologies to refine the image restoration models further, ensuring that the quality and fidelity of the restored images are maintained. The effectiveness of our proposed approach is validated through extensive experiments, showing marked improvements in resilience against typical adversarial attacks.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method
Authors:
Chenyan Zhang,
Yifei Chen,
Zhenxiong Fan,
Yiyu Huang,
Wenchao Weng,
Ruiquan Ge,
Dong Zeng,
Changmiao Wang
Abstract:
Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination be…
▽ More
Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination because of the consistent projection of data introduced by conditional bootstrap. This often results in image fragmentation and incoherence. Furthermore, the inherent limitations of the diffusion model often lead to excessive smoothing of the generated images. In the same vein, some deep learning-based models often suffer from poor generalization performance, meaning their effectiveness is greatly affected by different acceleration factors. To address these challenges, we propose a novel diffusion model-based MRI reconstruction method, named TC-DiffRecon, which does not rely on a specific acceleration factor for training. We also suggest the incorporation of the MF-UNet module, designed to enhance the quality of MRI images generated by the model while mitigating the over-smoothing issue to a certain extent. During the image generation sampling process, we employ a novel TCKG module and a Coarse-to-Fine sampling scheme. These additions aim to harmonize image texture, expedite the sampling process, while achieving data consistency. Our source code is available at https://github.com/JustlfC03/TC-DiffRecon.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Smart Energy Network Digital Twins: Findings from a UK-Based Demonstrator Project
Authors:
Matthew Deakin,
Marta Vanin,
Zhong Fan,
Dirk Van Hertem
Abstract:
Digital Twins promise to deliver a step-change in distribution system operations and planning, but there are few real-world examples that explore the challenges of combining imperfect model and measurement data, and then use these as the basis for subsequent analysis. In this work we propose a Digital Twin framework for electrical distribution systems and implement that framework on the Smart Ener…
▽ More
Digital Twins promise to deliver a step-change in distribution system operations and planning, but there are few real-world examples that explore the challenges of combining imperfect model and measurement data, and then use these as the basis for subsequent analysis. In this work we propose a Digital Twin framework for electrical distribution systems and implement that framework on the Smart Energy Network Demonstrator microgrid in the UK. The data and software implementation are made available open-source, and consist of a network model, power meter measurements, and unbalanced power flow-based algorithms. Measurement and network uncertainties are shown to have a substantial impact on the quality of Digital Twin outputs. The potential benefits of a dynamic export limit and voltage control are estimated using the Digital Twin, using simulated measurements to address data quality challenges, with results showing curtailment for an exemplar day could be reduced by 56%. Power meter data and a network model are shown to be necessary for develo** algorithms that enable decision-making that is robust to real-world uncertainties, with possibilities and challenges of Digital Twin development clearly demonstrated.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
CrackCLF: Automatic Pavement Crack Detection based on Closed-Loop Feedback
Authors:
Chong Li,
Zhun Fan,
Ying Chen,
Huibiao Lin,
Laura Moretti,
Giuseppe Loprencipe,
Weihua Sheng,
Kelvin C. P. Wang
Abstract:
Automatic pavement crack detection is an important task to ensure the functional performances of pavements during their service life. Inspired by deep learning (DL), the encoder-decoder framework is a powerful tool for crack detection. However, these models are usually open-loop (OL) systems that tend to treat thin cracks as the background. Meanwhile, these models can not automatically correct err…
▽ More
Automatic pavement crack detection is an important task to ensure the functional performances of pavements during their service life. Inspired by deep learning (DL), the encoder-decoder framework is a powerful tool for crack detection. However, these models are usually open-loop (OL) systems that tend to treat thin cracks as the background. Meanwhile, these models can not automatically correct errors in the prediction, nor can it adapt to the changes of the environment to automatically extract and detect thin cracks. To tackle this problem, we embed closed-loop feedback (CLF) into the neural network so that the model could learn to correct errors on its own, based on generative adversarial networks (GAN). The resulting model is called CrackCLF and includes the front and back ends, i.e. segmentation and adversarial network. The front end with U-shape framework is employed to generate crack maps, and the back end with a multi-scale loss function is used to correct higher-order inconsistencies between labels and crack maps (generated by the front end) to address open-loop system issues. Empirical results show that the proposed CrackCLF outperforms others methods on three public datasets. Moreover, the proposed CLF can be defined as a plug and play module, which can be embedded into different neural network models to improve their performances.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Retinal OCT Synthesis with Denoising Diffusion Probabilistic Models for Layer Segmentation
Authors:
Yuli Wu,
Weidong He,
Dennis Eschweiler,
Ningxin Dou,
Zixin Fan,
Shengli Mi,
Peter Walter,
Johannes Stegmaier
Abstract:
Modern biomedical image analysis using deep learning often encounters the challenge of limited annotated data. To overcome this issue, deep generative models can be employed to synthesize realistic biomedical images. In this regard, we propose an image synthesis method that utilizes denoising diffusion probabilistic models (DDPMs) to automatically generate retinal optical coherence tomography (OCT…
▽ More
Modern biomedical image analysis using deep learning often encounters the challenge of limited annotated data. To overcome this issue, deep generative models can be employed to synthesize realistic biomedical images. In this regard, we propose an image synthesis method that utilizes denoising diffusion probabilistic models (DDPMs) to automatically generate retinal optical coherence tomography (OCT) images. By providing rough layer sketches, the trained DDPMs can generate realistic circumpapillary OCT images. We further find that more accurate pseudo labels can be obtained through knowledge adaptation, which greatly benefits the segmentation task. Through this, we observe a consistent improvement in layer segmentation accuracy, which is validated using various neural networks. Furthermore, we have discovered that a layer segmentation model trained solely with synthesized images can achieve comparable results to a model trained exclusively with real images. These findings demonstrate the promising potential of DDPMs in reducing the need for manual annotations of retinal OCT images.
△ Less
Submitted 6 March, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval
Authors:
Kaixing Yang,
Xukun Zhou,
Xulong Tang,
Ran Diao,
Hongyan Liu,
Jun He,
Zhaoxin Fan
Abstract:
Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based mode…
▽ More
Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based model-agnostic contrastive learning framework. BeatDance incorporates a Beat-Aware Music-Dance InfoExtractor, a Trans-Temporal Beat Blender, and a Beat-Enhanced Hubness Reducer to improve dance-music retrieval performance by utilizing the alignment between music beats and dance movements. We also introduce the Music-Dance (MD) dataset, a large-scale collection of over 10,000 music-dance video pairs for training and testing. Experimental results on the MD dataset demonstrate the superiority of our method over existing baselines, achieving state-of-the-art performance. The code and dataset will be made public available upon acceptance.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells
Authors:
Shengyuan Zhang,
Peng Zuo,
Xuesong Yin,
Zheng Fan
Abstract:
Electric vehicles (EVs) have become a popular mode of transportation, with their performance depending on the ageing of the Li-ion batteries used to power them. However, it can be challenging and time-consuming to determine the capacity retention of a battery in service. A rapid and reliable testing method for state of health (SoH) determination is desired. Ultrasonic testing techniques are promis…
▽ More
Electric vehicles (EVs) have become a popular mode of transportation, with their performance depending on the ageing of the Li-ion batteries used to power them. However, it can be challenging and time-consuming to determine the capacity retention of a battery in service. A rapid and reliable testing method for state of health (SoH) determination is desired. Ultrasonic testing techniques are promising due to their efficient, portable, and non-destructive features. In this study, we demonstrate that ultrasonic speed decreases with the degradation of the capacity of an LFP prismatic cell. We explain this correlation through numerical simulation, which describes wave propagation in porous media. We propose that the reduction of binder stiffness can be a primary cause of the change in ultrasonic speed during battery ageing. This work brings new insights into ultrasonic SoH estimation techniques.
△ Less
Submitted 24 September, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Codebook Configuration for RIS-aided Systems via Implicit Neural Representations
Authors:
Huiying Yang,
Ru**g Xiong,
Yao Xiao,
Zhijie Fan,
Tiebin Mi,
Robert Caiming Qiu,
Zenan Ling
Abstract:
Reconfigurable Intelligent Surface (RIS) is envisioned to be an enabling technique in 6G wireless communications. By configuring the reflection beamforming codebook, RIS focuses signals on target receivers to enhance signal strength. In this paper, we investigate the codebook configuration for RIS-aided communication systems. We formulate an implicit relationship between user's coordinates informa…
▽ More
Reconfigurable Intelligent Surface (RIS) is envisioned to be an enabling technique in 6G wireless communications. By configuring the reflection beamforming codebook, RIS focuses signals on target receivers to enhance signal strength. In this paper, we investigate the codebook configuration for RIS-aided communication systems. We formulate an implicit relationship between user's coordinates information and the codebook from the perspective of signal radiation mechanisms, and introduce a novel learning-based method, implicit neural representations (INRs), to solve this implicit coordinates-to-codebook map** problem. Our approach requires only user's coordinates, avoiding reliance on channel models. Additionally, given the significant practical applications of the 1-bit RIS, we formulate the 1-bit codebook configuration as a multi-label classification problem, and propose an encoding strategy for 1-bit RIS to reduce the codebook dimension, thereby improving learning efficiency. Experimental results from simulations and measured data demonstrate significant advantages of our method.
△ Less
Submitted 28 November, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Authors:
Ziqiao Peng,
Haoyu Wu,
Zhenbo Song,
Hao Xu,
Xiangyu Zhu,
Jun He,
Hongyan Liu,
Zhaoxin Fan
Abstract:
Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions.…
▽ More
Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels. Then an emotion-guided feature fusion decoder is employed to generate a 3D talking face with enhanced emotion. The decoder is driven by the disentangled identity, emotional, and content embeddings so as to generate controllable personal and emotional styles. Finally, considering the scarcity of the 3D emotional talking face data, we resort to the supervision of facial blendshapes, which enables the reconstruction of plausible 3D faces from 2D emotional data, and contribute a large-scale 3D emotional talking face dataset (3D-ETF) to train the network. Our experiments and user studies demonstrate that our approach outperforms state-of-the-art methods and exhibits more diverse facial movements. We recommend watching the supplementary video: https://ziqiaopeng.github.io/emotalk
△ Less
Submitted 25 August, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Efficient Computation Sharing for Multi-Task Visual Scene Understanding
Authors:
Sara Shoouri,
Mingyu Yang,
Zichen Fan,
Hun-Seok Kim
Abstract:
Solving multiple visual tasks using individual models can be resource-intensive, while multi-task learning can conserve resources by sharing knowledge across different tasks. Despite the benefits of multi-task learning, such techniques can struggle with balancing the loss for each task, leading to potential performance degradation. We present a novel computation- and parameter-sharing framework th…
▽ More
Solving multiple visual tasks using individual models can be resource-intensive, while multi-task learning can conserve resources by sharing knowledge across different tasks. Despite the benefits of multi-task learning, such techniques can struggle with balancing the loss for each task, leading to potential performance degradation. We present a novel computation- and parameter-sharing framework that balances efficiency and accuracy to perform multiple visual tasks utilizing individually-trained single-task transformers. Our method is motivated by transfer learning schemes to reduce computational and parameter storage costs while maintaining the desired performance. Our approach involves splitting the tasks into a base task and the other sub-tasks, and sharing a significant portion of activations and parameters/weights between the base and sub-tasks to decrease inter-task redundancies and enhance knowledge sharing. The evaluation conducted on NYUD-v2 and PASCAL-context datasets shows that our method is superior to the state-of-the-art transformer-based multi-task learning techniques with higher accuracy and reduced computational resources. Moreover, our method is extended to video stream inputs, further reducing computational costs by efficiently sharing information across the temporal domain as well as the task domain. Our codes and models will be publicly available.
△ Less
Submitted 14 August, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Diameter Estimation of Cylindrical Metal Bar Using Wideband Dual-Polarized Ground-Penetrating Radar
Authors:
Hai-Han Sun,
Weixia Cheng,
Zheng Fan
Abstract:
Ground-penetrating radar (GPR) has been an effective technology for locating metal bars in civil engineering structures. However, the accurate sizing of subsurface metal bars of small diameters remains a challenging problem for the existing reflection pattern-based method due to the limited resolution of GPR. To address the issue, we propose a reflection power-based method by exploring the relatio…
▽ More
Ground-penetrating radar (GPR) has been an effective technology for locating metal bars in civil engineering structures. However, the accurate sizing of subsurface metal bars of small diameters remains a challenging problem for the existing reflection pattern-based method due to the limited resolution of GPR. To address the issue, we propose a reflection power-based method by exploring the relationship between the bar diameter and the maximum power of the bar reflected signal obtained by a wideband dual-polarized GPR, which circumvents the resolution limit of the existing pattern-based method. In the proposed method, the theoretical relationship between the bar diameter and the power ratio of the bar reflected signals acquired by perpendicular and parallel polarized antennas is established via the inherent scattering width of the metal bar and the wideband spectrum of the bar reflected signal. Based on the theoretical relationship, the bar diameter can be estimated using the obtained power ratio in a GPR survey. Simulations and experiments have been conducted with different GPR frequency spectra, subsurface mediums, and metal bars of various diameters and depths to demonstrate the efficacy of the method. Experimental results show that the method achieves high sizing accuracy with errors of less than 10% in different scenarios. With its simple operation and high accuracy, the method can be implemented in real-time in situ examination of subsurface metal bars.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks
Authors:
Liyu Shi,
Xiaoyan Li,
Weiming Hu,
Haoyuan Chen,
**g Chen,
Zizhen Fan,
Minghe Gao,
Yujie **g,
Guotao Lu,
Deguo Ma,
Zhiyu Ma,
Qingtao Meng,
Dechao Tang,
Hongzan Sun,
Marcin Grzegorzek,
Shouliang Qi,
Yueyang Teng,
Chen Li
Abstract:
Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when comp…
▽ More
Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients.
△ Less
Submitted 6 December, 2022; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Authors:
Zhiyun Fan,
Zhenlin Liang,
Linhao Dong,
Yi Liu,
Shiyu Zhou,
Meng Cai,
Jun Zhang,
Zejun Ma,
Bo Xu
Abstract:
In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to segment the audio and then transcribe each segmentation. These two stages are addressed separately by speaker change detection (SCD) and automatic speech recognition (ASR). Most previous SCD systems rely solely on speaker information and ignore the importance of speech content. In this p…
▽ More
In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to segment the audio and then transcribe each segmentation. These two stages are addressed separately by speaker change detection (SCD) and automatic speech recognition (ASR). Most previous SCD systems rely solely on speaker information and ignore the importance of speech content. In this paper, we propose a novel SCD system that considers both cues of speaker difference and speech content. These two cues are converted into token-level representations by the continuous integrate-and-fire (CIF) mechanism and then combined for detecting speaker changes on the token acoustic boundaries. We evaluate the performance of our approach on a public real-recorded meeting dataset, AISHELL-4. The experiment results show that our method outperforms a competitive frame-level baseline system by 2.45% equal coverage-purity (ECP). In addition, we demonstrate the importance of speech content and speaker difference to the SCD task, and the advantages of conducting SCD on the token acoustic boundaries compared with conducting SCD frame by frame.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Ziyao Yi,
Yan Xiang,
Zibin Liu,
Shaoqing Li,
Keming Shi,
Dehui Kong,
Ke Xu,
Minsu Kwon,
Yaqi Wu,
Jiesi Zheng,
Zhihao Fan,
Xun Wu,
Feng Zhang,
Albert No,
Minhyeok Cho,
Zewen Chen,
Xiaze Zhang,
Ran Li
, et al. (13 additional authors not shown)
Abstract:
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th…
▽ More
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework
Authors:
Zong Fan,
** Gong,
Shanshan Tang,
Christine U. Lee,
Xiaohui Zhang,
Pengfei Song,
Shigao Chen,
Hua Li
Abstract:
Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep learning-based methods have been proposed for joint localization and classification of breast lesions using BUS images. In these methods, features extracted by a shared network trun…
▽ More
Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep learning-based methods have been proposed for joint localization and classification of breast lesions using BUS images. In these methods, features extracted by a shared network trunk are appended by two independent network branches to achieve classification and localization. Improper information sharing might cause conflicts in feature optimization in the two branches and leads to performance degradation. Also, these methods generally require large amounts of pixel-level annotated data for model training. To overcome these limitations, we proposed a novel joint localization and classification model based on the attention mechanism and disentangled semi-supervised learning strategy. The model used in this study is composed of a classification network and an auxiliary lesion-aware network. By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance. The disentangled semi-supervised learning strategy only requires incomplete training datasets for model training. The proposed modularized framework allows flexible network replacement to be generalized for various applications. Experimental results on two different breast ultrasound image datasets demonstrate the effectiveness of the proposed method. The impacts of various network factors on model performance are also investigated to gain deep insights into the designed framework.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning
Authors:
Cephas Samende,
Zhong Fan,
Jun Cao
Abstract:
Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation an…
▽ More
Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation and achieve desired system economics and environmental goals. In this paper, we introduce a hybrid energy storage system composed of battery and hydrogen energy storage to handle the uncertainties related to electricity prices, renewable energy production and consumption. We aim to improve renewable energy utilisation and minimise energy costs and carbon emissions while ensuring energy reliability and stability within the network. To achieve this, we propose a multi-agent deep deterministic policy gradient approach, which is a deep reinforcement learning-based control strategy to optimise the scheduling of the hybrid energy storage system and energy demand in real-time. The proposed approach is model-free and does not require explicit knowledge and rigorous mathematical models of the smart energy network environment. Simulation results based on real-world data show that: (i) integration and optimised operation of the hybrid energy storage system and energy demand reduces carbon emissions by 78.69%, improves cost savings by 23.5% and renewable energy utilisation by over 13.2% compared to other baseline models and (ii) the proposed algorithm outperforms the state-of-the-art self-learning algorithms like deep-Q network.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation
Authors:
Guijie Zhu,
Zhun Fan,
Jiacheng Liu,
Duan Yuan,
Peili Ma,
Meihua Wang,
Weihua Sheng,
Kelvin C. P. Wang
Abstract:
The acquisition and evaluation of pavement surface data play an essential role in pavement condition evaluation. In this paper, an efficient and effective end-to-end network for automatic pavement crack segmentation, called RHA-Net, is proposed to improve the pavement crack segmentation accuracy. The RHA-Net is built by integrating residual blocks (ResBlocks) and hybrid attention blocks into the e…
▽ More
The acquisition and evaluation of pavement surface data play an essential role in pavement condition evaluation. In this paper, an efficient and effective end-to-end network for automatic pavement crack segmentation, called RHA-Net, is proposed to improve the pavement crack segmentation accuracy. The RHA-Net is built by integrating residual blocks (ResBlocks) and hybrid attention blocks into the encoder-decoder architecture. The ResBlocks are used to improve the ability of RHA-Net to extract high-level abstract features. The hybrid attention blocks are designed to fuse both low-level features and high-level features to help the model focus on correct channels and areas of cracks, thereby improving the feature presentation ability of RHA-Net. An image data set containing 789 pavement crack images collected by a self-designed mobile robot is constructed and used for training and evaluating the proposed model. Compared with other state-of-the-art networks, the proposed model achieves better performance and the functionalities of adding residual blocks and hybrid attention mechanisms are validated in a comprehensive ablation study. Additionally, a light-weighted version of the model generated by introducing depthwise separable convolution achieves better a performance and a much faster processing speed with 1/30 of the number of U-Net parameters. The developed system can segment pavement crack in real-time on an embedded device Jetson TX2 (25 FPS). The video taken in real-time experiments is released at https://youtu.be/3XIogk0fiG4.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Authors:
Zhiyun Fan,
Linhao Dong,
Meng Cai,
Zejun Ma,
Bo Xu
Abstract:
Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrat…
▽ More
Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Global Sensing and Measurements Reuse for Image Compressed Sensing
Authors:
Zi-En Fan,
Feng Lian,
Jia-Ni Quan
Abstract:
Recently, deep network-based image compressed sensing methods achieved high reconstruction quality and reduced computational overhead compared with traditional methods. However, existing methods obtain measurements only from partial features in the network and use them only once for image reconstruction. They ignore there are low, mid, and high-level features in the network\cite{zeiler2014visualiz…
▽ More
Recently, deep network-based image compressed sensing methods achieved high reconstruction quality and reduced computational overhead compared with traditional methods. However, existing methods obtain measurements only from partial features in the network and use them only once for image reconstruction. They ignore there are low, mid, and high-level features in the network\cite{zeiler2014visualizing} and all of them are essential for high-quality reconstruction. Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. Finally, experimental results on three benchmark datasets show that our model can significantly outperform state-of-the-art methods.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
A novel adversarial learning strategy for medical image classification
Authors:
Zong Fan,
Xiaohui Zhang,
Jacob A. Gasienica,
Jennifer Potts,
Su Ruan,
Wade Thorstad,
Hiram Gay,
Pengfei Song,
Xiaowei Wang,
Hua Li
Abstract:
Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to lever…
▽ More
Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to leverage informative features from intermediate layers that might benefit classification performance and reduce the risk of overfitting. Recently, auxiliary convolutional neural networks (AuxCNNs) have been employed on top of traditional classification networks to facilitate the training of intermediate layers to improve classification performance and robustness. In this study, we proposed an adversarial learning-based AuxCNN to support the training of deep neural networks for medical image classification. Two main innovations were adopted in our AuxCNN classification framework. First, the proposed AuxCNN architecture includes an image generator and an image discriminator for extracting more informative image features for medical image classification, motivated by the concept of generative adversarial network (GAN) and its impressive ability in approximating target data distribution. Second, a hybrid loss function is designed to guide the model training by incorporating different objectives of the classification network and AuxCNN to reduce overfitting. Comprehensive experimental studies demonstrated the superior classification performance of the proposed model. The effect of the network-related factors on classification performance was investigated.
△ Less
Submitted 7 July, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Learning to Remove Clutter in Real-World GPR Images Using Hybrid Data
Authors:
Hai-Han Sun,
Weixia Cheng,
Zheng Fan
Abstract:
The clutter in the ground-penetrating radar (GPR) radargram disguises or distorts subsurface target responses, which severely affects the accuracy of target detection and identification. Existing clutter removal methods either leave residual clutter or deform target responses when facing complex and irregular clutter in the real-world radargram. To tackle the challenge of clutter removal in real s…
▽ More
The clutter in the ground-penetrating radar (GPR) radargram disguises or distorts subsurface target responses, which severely affects the accuracy of target detection and identification. Existing clutter removal methods either leave residual clutter or deform target responses when facing complex and irregular clutter in the real-world radargram. To tackle the challenge of clutter removal in real scenarios, a clutter-removal neural network (CR-Net) trained on a large-scale hybrid dataset is presented in this study. The CR-Net integrates residual dense blocks into the U-Net architecture to enhance its capability in clutter suppression and target reflection restoration. The combination of the mean absolute error (MAE) loss and the multi-scale structural similarity (MS-SSIM) loss is used to effectively drive the optimization of the network. To train the proposed CR-Net to remove complex and diverse clutter in real-world radargrams, the first large-scale hybrid dataset named CLT-GPR dataset containing clutter collected by different GPR systems in multiple scenarios is built. The CLT-GPR dataset significantly improves the generalizability of the network to remove clutter in real-world GPR radargrams. Extensive experimental results demonstrate that the CR-Net achieves superior performance over existing methods in removing clutter and restoring target responses in diverse real-world scenarios. Moreover, the CR-Net with its end-to-end design does not require manual parameter tuning, making it highly suitable for automatically producing clutter-free radargrams in GPR applications. The CLT-GPR dataset and the code implemented in the paper can be found at https://haihan-sun.github.io/GPR.html.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Domain Adversarial Graph Convolutional Network Based on RSSI and Crowdsensing for Indoor Localization
Authors:
Mingxin Zhang,
Zipei Fan,
Ryosuke Shibasaki,
Xuan Song
Abstract:
In recent years, the use of WiFi fingerprints for indoor positioning has grown in popularity, largely due to the widespread availability of WiFi and the proliferation of mobile communication devices. However, many existing methods for constructing fingerprint datasets rely on labor-intensive and time-consuming processes of collecting large amounts of data. Additionally, these methods often focus o…
▽ More
In recent years, the use of WiFi fingerprints for indoor positioning has grown in popularity, largely due to the widespread availability of WiFi and the proliferation of mobile communication devices. However, many existing methods for constructing fingerprint datasets rely on labor-intensive and time-consuming processes of collecting large amounts of data. Additionally, these methods often focus on ideal laboratory environments, rather than considering the practical challenges of large multi-floor buildings. To address these issues, we present a novel WiDAGCN model that can be trained using a small number of labeled site survey data and large amounts of unlabeled crowdsensed WiFi fingerprints. By constructing heterogeneous graphs based on received signal strength indicators (RSSIs) between waypoints and WiFi access points (APs), our model is able to effectively capture the topological structure of the data. We also incorporate graph convolutional networks (GCNs) to extract graph-level embeddings, a feature that has been largely overlooked in previous WiFi indoor localization studies. To deal with the challenges of large amounts of unlabeled data and multiple data domains, we employ a semi-supervised domain adversarial training scheme to effectively utilize unlabeled data and align the data distributions across domains. Our system is evaluated using a public indoor localization dataset that includes multiple buildings, and the results show that it performs competitively in terms of localization accuracy in large buildings.
△ Less
Submitted 31 March, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
The role of living laboratories in unlocking the potential of low-carbon energy technologies on the journey to net-zero
Authors:
Zhong Fan,
Jun Cao,
Taskin Jamal,
Chris Fogwill,
Cephas Samende,
Zoe Robinson,
Fiona Polack,
Mark Ormerod,
Sharon George,
Adam Peacock,
David Healey
Abstract:
We demonstrate the potential role of one of the largest at scale multi-vector Smart Energy Network Demonstrator (SEND).
We demonstrate the potential role of one of the largest at scale multi-vector Smart Energy Network Demonstrator (SEND).
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning
Authors:
Xiang Chen,
Zhentao Fan,
Pengpeng Li,
Longgang Dai,
Caihua Kong,
Zhuoran Zheng,
Yufeng Huang,
Yufeng Li
Abstract:
We offer a practical unpaired learning based image dehazing network from an unpaired set of clear and hazy images. This paper provides a new perspective to treat image dehazing as a two-class separated factor disentanglement task, i.e, the task-relevant factor of clear image reconstruction and the task-irrelevant factor of haze-relevant distribution. To achieve the disentanglement of these two-cla…
▽ More
We offer a practical unpaired learning based image dehazing network from an unpaired set of clear and hazy images. This paper provides a new perspective to treat image dehazing as a two-class separated factor disentanglement task, i.e, the task-relevant factor of clear image reconstruction and the task-irrelevant factor of haze-relevant distribution. To achieve the disentanglement of these two-class factors in deep feature space, contrastive learning is introduced into a CycleGAN framework to learn disentangled representations by guiding the generated images to be associated with latent factors. With such formulation, the proposed contrastive disentangled dehazing method (CDD-GAN) employs negative generators to cooperate with the encoder network to update alternately, so as to produce a queue of challenging negative adversaries. Then these negative adversaries are trained end-to-end together with the backbone representation network to enhance the discriminative information and promote factor disentanglement performance by maximizing the adversarial contrastive loss. During the training, we further show that hard negative examples can suppress the task-irrelevant factors and unpaired clear exemples can enhance the task-relevant factors, in order to better facilitate haze removal and help image restoration. Extensive experiments on both synthetic and real-world datasets demonstrate that our method performs favorably against existing unpaired dehazing baselines.
△ Less
Submitted 12 July, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning
Authors:
Daniel J. B. Harrold,
Jun Cao,
Zhong Fan
Abstract:
In this paper, multi-agent reinforcement learning is used to control a hybrid energy storage system working collaboratively to reduce the energy costs of a microgrid through maximising the value of renewable energy and trading. The agents must learn to control three different types of energy storage system suited for short, medium, and long-term storage under fluctuating demand, dynamic wholesale…
▽ More
In this paper, multi-agent reinforcement learning is used to control a hybrid energy storage system working collaboratively to reduce the energy costs of a microgrid through maximising the value of renewable energy and trading. The agents must learn to control three different types of energy storage system suited for short, medium, and long-term storage under fluctuating demand, dynamic wholesale energy prices, and unpredictable renewable energy generation. Two case studies are considered: the first looking at how the energy storage systems can better integrate renewable energy generation under dynamic pricing, and the second with how those same agents can be used alongside an aggregator agent to sell energy to self-interested external microgrids looking to reduce their own energy bills. This work found that the centralised learning with decentralised execution of the multi-agent deep deterministic policy gradient and its state-of-the-art variants allowed the multi-agent methods to perform significantly better than the control from a single global agent. It was also found that using separate reward functions in the multi-agent approach performed much better than using a single control agent. Being able to trade with the other microgrids, rather than just selling back to the utility grid, also was found to greatly increase the grid's savings.
△ Less
Submitted 5 December, 2021; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Multi-Agent Deep Deterministic Policy Gradient Algorithm for Peer-to-Peer Energy Trading Considering Distribution Network Constraints
Authors:
Cephas Samende,
Jun Cao,
Zhong Fan
Abstract:
In this paper, we investigate an energy cost minimization problem for prosumers participating in peer-to-peer energy trading. Due to (i) uncertainties caused by renewable energy generation and consumption, (ii) difficulties in develo** an accurate and efficient energy trading model, and (iii) the need to satisfy distribution network constraints, it is challenging for prosumers to obtain optimal…
▽ More
In this paper, we investigate an energy cost minimization problem for prosumers participating in peer-to-peer energy trading. Due to (i) uncertainties caused by renewable energy generation and consumption, (ii) difficulties in develo** an accurate and efficient energy trading model, and (iii) the need to satisfy distribution network constraints, it is challenging for prosumers to obtain optimal energy trading decisions that minimize their individual energy costs. To address the challenge, we first formulate the above problem as a Markov decision process and propose a multi-agent deep deterministic policy gradient algorithm to learn optimal energy trading decisions. To satisfy the distribution network constraints, we propose distribution network tariffs which we incorporate in the algorithm as incentives to incentivize energy trading decisions that help to satisfy the constraints and penalize the decisions that violate them. The proposed algorithm is model-free and allows the agents to learn the optimal energy trading decisions without having prior information about other agents in the network. Simulation results based on real-world datasets show the effectiveness and robustness of the proposed algorithm.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
The energy revolution: cyber physical advances and opportunities for smart local energy systems
Authors:
Nandor Verba,
Elena Gaura,
Stephen McArthur,
George Konstantopoulos,
Jianzhoug Wu,
Zhong Fan,
Dimitrios Athanasiadis,
Pablo Rodolfo Baldivieso Monasterios,
Euan Morris,
Jeffrey Hardy
Abstract:
We have designed a two-stage, 10-step process to give organisations a method to analyse small local energy systems (SLES) projects based on their Cyber Physical System components in order to develop future-proof energy systems.
SLES are often developed for a specific range of use cases and functions, and these match the specific requirements and needs of the community, location or site under con…
▽ More
We have designed a two-stage, 10-step process to give organisations a method to analyse small local energy systems (SLES) projects based on their Cyber Physical System components in order to develop future-proof energy systems.
SLES are often developed for a specific range of use cases and functions, and these match the specific requirements and needs of the community, location or site under consideration. During the design and commissioning, new and specific cyber physical architectures are developed. These are the control and data systems that are needed to bridge the gap between the physical assets, the captured data and the control signals. Often, the cyber physical architecture and infrastructure is focused on functionality and the delivery of the specific applications.
But we find that technologies and approaches have arisen from other fields that, if used within SLES, could support the flexibility, scalability and reusability vital to their success. As these can improve the operational data systems then they can also be used to enhance predictive functions If used and deployed effectively, these new approaches can offer longer term improvements in the use and effectiveness of SLES, while allowing the concepts and designs to be capitalised upon through wider roll-out and the offering of commercial services or products.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
A Prioritized Trajectory Planning Algorithm for Connected and Automated Vehicle Mandatory Lane Changes
Authors:
Nachuan Li,
Austen Z. Fan,
Riley Fischer,
Wissam Kontar,
Bin Ran
Abstract:
We introduce a prioritized system-optimal algorithm for mandatory lane change (MLC) behavior of connected and automated vehicles (CAV) from a dedicated lane. Our approach applies a cooperative lane change that prioritizes the decisions of lane changing vehicles which are closer to the end of the diverging zone (DZ), and optimizes the predicted total system travel time. Our experiments on synthetic…
▽ More
We introduce a prioritized system-optimal algorithm for mandatory lane change (MLC) behavior of connected and automated vehicles (CAV) from a dedicated lane. Our approach applies a cooperative lane change that prioritizes the decisions of lane changing vehicles which are closer to the end of the diverging zone (DZ), and optimizes the predicted total system travel time. Our experiments on synthetic data show that the proposed algorithm improves the traffic network efficiency by attaining higher speeds in the dedicated lane and earlier MLC positions while ensuring a low computational time. Our approach outperforms the traditional gap acceptance model.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Exploring wav2vec 2.0 on speaker verification and language identification
Authors:
Zhiyun Fan,
Meng Li,
Shiyu Zhou,
Bo Xu
Abstract:
Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary ex…
▽ More
Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language. Then we demonstrate the effectiveness of wav2vec 2.0 on the two tasks respectively. For speaker verification, we obtain a new state-of-the-art result, Equal Error Rate (EER) of 3.61% on the VoxCeleb1 dataset. For language identification, we obtain an EER of 12.02% on 1 second condition and an EER of 3.47% on full-length condition of the AP17-OLR dataset. Finally, we utilize one model to achieve the unified modeling by the multi-task learning for the two tasks.
△ Less
Submitted 14 January, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Genetic U-Net: Automatically Designed Deep Networks for Retinal Vessel Segmentation Using a Genetic Algorithm
Authors:
Jiahong Wei,
Zhun Fan
Abstract:
Recently, many methods based on hand-designed convolutional neural networks (CNNs) have achieved promising results in automatic retinal vessel segmentation. However, these CNNs remain constrained in capturing retinal vessels in complex fundus images. To improve their segmentation performance, these CNNs tend to have many parameters, which may lead to overfitting and high computational complexity.…
▽ More
Recently, many methods based on hand-designed convolutional neural networks (CNNs) have achieved promising results in automatic retinal vessel segmentation. However, these CNNs remain constrained in capturing retinal vessels in complex fundus images. To improve their segmentation performance, these CNNs tend to have many parameters, which may lead to overfitting and high computational complexity. Moreover, the manual design of competitive CNNs is time-consuming and requires extensive empirical knowledge. Herein, a novel automated design method, called Genetic U-Net, is proposed to generate a U-shaped CNN that can achieve better retinal vessel segmentation but with fewer architecture-based parameters, thereby addressing the above issues. First, we devised a condensed but flexible search space based on a U-shaped encoder-decoder. Then, we used an improved genetic algorithm to identify better-performing architectures in the search space and investigated the possibility of finding a superior network architecture with fewer parameters. The experimental results show that the architecture obtained using the proposed method offered a superior performance with less than 1% of the number of the original U-Net parameters in particular and with significantly fewer parameters than other state-of-the-art models. Furthermore, through in-depth investigation of the experimental results, several effective operations and patterns of networks to generate superior retinal vessel segmentations were identified.
△ Less
Submitted 11 June, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks
Authors:
Ziqi Fan,
Vibhav Vineet,
Chenshen Lu,
T. W. Wu,
Kyla McMullen
Abstract:
Acoustic scattering is strongly influenced by boundary geometry of objects over which sound scatters. The present work proposes a method to infer object geometry from scattering features by training convolutional neural networks. The training data is generated from a fast numerical solver developed on CUDA. The complete set of simulations is sampled to generate multiple datasets containing differe…
▽ More
Acoustic scattering is strongly influenced by boundary geometry of objects over which sound scatters. The present work proposes a method to infer object geometry from scattering features by training convolutional neural networks. The training data is generated from a fast numerical solver developed on CUDA. The complete set of simulations is sampled to generate multiple datasets containing different amounts of channels and diverse image resolutions. The robustness of our approach in response to data degradation is evaluated by comparing the performance of networks trained using the datasets with varying levels of data degradation. The present work has found that the predictions made from our models match ground truth with high accuracy. In addition, accuracy does not degrade when fewer data channels or lower resolutions are used.
△ Less
Submitted 10 February, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
MeshMVS: Multi-View Stereo Guided Mesh Reconstruction
Authors:
Rakesh Shrestha,
Zhiwen Fan,
Qingkun Su,
Zuozhuo Dai,
Siyu Zhu,
** Tan
Abstract:
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry…
▽ More
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry information explicitly by using the features from intermediate depth representations of multi-view stereo and regularizing the 3D shapes against these depth images. First, our system predicts a coarse 3D volume from the color images by probabilistically merging voxel occupancy grids from the prediction of individual views. Then the depth images from multi-view stereo along with the rendered depth images of the coarse shape are used as a contrastive input whose features guide the refinement of the coarse shape through a series of graph convolution networks. Notably, we achieve superior results than state-of-the-art multi-view shape generation methods with 34% decrease in Chamfer distance to ground truth and 14% increase in F1-score on ShapeNet dataset.Our source code is available at https://git.io/Jmalg
△ Less
Submitted 11 April, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Automatic Crack Detection on Road Pavements Using Encoder Decoder Architecture
Authors:
Zhun Fan,
Chong Li,
Ying Chen,
Jiahong Wei,
Giuseppe Loprencipe,
Xiaopeng Chen,
Paola Di Mascio
Abstract:
Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and…
▽ More
Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection. Then, a multi-dilation module embedded in an encoder-decoder architecture is proposed. The crack features of multiple context sizes can be integrated into the multi-dilation module by dilation convolution with different dilatation rates, which can obtain much more cracks information. Finally, the hierarchical feature learning module is designed to obtain a multi-scale features from the high to low-level convolutional layers, which are integrated to predict pixel-wise crack detection. Some experiments on public crack databases using 118 images were performed and the results were compared with those obtained with other methods on the same images. The results show that the proposed U-HDN method achieves high performance because it can extract and fuse different context sizes and different levels of feature maps than other algorithms.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Addressing the confounds of accompaniments in singer identification
Authors:
Tsung-Han Hsieh,
Kai-Hsiang Cheng,
Zhe-Cheng Fan,
Yu-Ching Yang,
Yi-Hsuan Yang
Abstract:
Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer on…
▽ More
Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer only sings in certain musical contexts (e.g., genres). The model cannot therefore generalize well when the singer sings in unseen contexts. In this paper, we attempt to address this issue. Specifically, we employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data where we "shuffle-and-remix" the separated vocal tracks and instrumental tracks of different songs to artificially make the singers sing in different contexts. We also incorporate melodic features learned from the vocal melody contour for better performance. Evaluation results on a benchmark dataset called the artist20 shows that this data augmentation method greatly improves the accuracy of singer identification.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Ensemble of Deep Convolutional Neural Networks for Automatic Pavement Crack Detection and Measurement
Authors:
Zhun Fan,
Chong Li,
Ying Chen,
Paola Di Mascio,
Xiaopeng Chen,
Guijie Zhu,
Giuseppe Loprencipe
Abstract:
Automated pavement crack detection and measurement are important road issues. Agencies have to guarantee the improvement of road safety. Conventional crack detection and measurement algorithms can be extremely time-consuming and low efficiency. Therefore, recently, innovative algorithms have received increased attention from researchers. In this paper, we propose an ensemble of convolutional neura…
▽ More
Automated pavement crack detection and measurement are important road issues. Agencies have to guarantee the improvement of road safety. Conventional crack detection and measurement algorithms can be extremely time-consuming and low efficiency. Therefore, recently, innovative algorithms have received increased attention from researchers. In this paper, we propose an ensemble of convolutional neural networks (without a pooling layer) based on probability fusion for automated pavement crack detection and measurement. Specifically, an ensemble of convolutional neural networks was employed to identify the structure of small cracks with raw images. Secondly, outputs of the individual convolutional neural network model for the ensemble were averaged to produce the final crack probability value of each pixel, which can obtain a predicted probability map. Finally, the predicted morphological features of the cracks were measured by using the skeleton extraction algorithm. To validate the proposed method, some experiments were performed on two public crack databases (CFD and AigleRN) and the results of the different state-of-the-art methods were compared. The experimental results show that the proposed method outperforms the other methods. For crack measurement, the crack length and width can be measure based on different crack types (complex, common, thin, and intersecting cracks.). The results show that the proposed algorithm can be effectively applied for crack measurement.
△ Less
Submitted 8 February, 2020;
originally announced February 2020.
-
Evolutionary Neural Architecture Search for Retinal Vessel Segmentation
Authors:
Zhun Fan,
Jiahong Wei,
Guijie Zhu,
Jiajie Mo,
Wenji Li
Abstract:
The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing…
▽ More
The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing neural network, we propose novel approach which applies neural architecture search (NAS) to optimize an encoder-decoder architecture for retinal vessel segmentation. A modified evolutionary algorithm is used to evolve the architectures of encoder-decoder framework with limited computing resources. The evolved model obtained by the proposed approach achieves top performance among all compared methods on the three datasets, namely DRIVE, STARE and CHASE_DB1, but with much fewer parameters. Moreover, the results of cross-training show that the evolved model is with considerable scalability, which indicates a great potential for clinical disease diagnosis.
△ Less
Submitted 18 March, 2020; v1 submitted 18 January, 2020;
originally announced January 2020.
-
Speaker-aware speech-transformer
Authors:
Zhiyun Fan,
Jie Li,
Shiyu Zhou,
Bo Xu
Abstract:
Recently, end-to-end (E2E) models become a competitive alternative to the conventional hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker mismatch in training and testing condition. In this paper, we use Speech-Transformer (ST) as the study platform to investigate speaker aware training of E2E models. We propose a model called Speaker-Aware Speech-Transforme…
▽ More
Recently, end-to-end (E2E) models become a competitive alternative to the conventional hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker mismatch in training and testing condition. In this paper, we use Speech-Transformer (ST) as the study platform to investigate speaker aware training of E2E models. We propose a model called Speaker-Aware Speech-Transformer (SAST), which is a standard ST equipped with a speaker attention module (SAM). The SAM has a static speaker knowledge block (SKB) that is made of i-vectors. At each time step, the encoder output attends to the i-vectors in the block, and generates a weighted combined speaker embedding vector, which helps the model to normalize the speaker variations. The SAST model trained in this way becomes independent of specific training speakers and thus generalizes better to unseen testing speakers. We investigate different factors of SAM. Experimental results on the AISHELL-1 task show that SAST achieves a relative 6.5% CER reduction (CERR) over the speaker-independent (SI) baseline. Moreover, we demonstrate that SAST still works quite well even if the i-vectors in SKB all come from a different data source other than the acoustic training set.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Fully Automated Multi-Organ Segmentation in Abdominal Magnetic Resonance Imaging with Deep Neural Networks
Authors:
Yuhua Chen,
Dan Ruan,
Jiayu Xiao,
Lixia Wang,
Bin Sun,
Rola Saouaf,
Wensha Yang,
Debiao Li,
Zhaoyang Fan
Abstract:
Segmentation of multiple organs-at-risk (OARs) is essential for radiation therapy treatment planning and other clinical applications. We developed an Automated deep Learning-based Abdominal Multi-Organ segmentation (ALAMO) framework based on 2D U-net and a densely connected network structure with tailored design in data augmentation and training procedures such as deep connection, auxiliary superv…
▽ More
Segmentation of multiple organs-at-risk (OARs) is essential for radiation therapy treatment planning and other clinical applications. We developed an Automated deep Learning-based Abdominal Multi-Organ segmentation (ALAMO) framework based on 2D U-net and a densely connected network structure with tailored design in data augmentation and training procedures such as deep connection, auxiliary supervision, and multi-view. The model takes in multi-slice MR images and generates the output of segmentation results. Three-Tesla T1 VIBE (Volumetric Interpolated Breath-hold Examination) images of 102 subjects were collected and used in our study. Ten OARs were studied, including the liver, spleen, pancreas, left/right kidneys, stomach, duodenum, small intestine, spinal cord, and vertebral bodies. Two radiologists manually labeled and obtained the consensus contours as the ground-truth. In the complete cohort of 102, 20 samples were held out for independent testing, and the rest were used for training and validation. The performance was measured using volume overlap** and surface distance. The ALAMO framework generated segmentation labels in good agreement with the manual results. Specifically, among the 10 OARs, 9 achieved high Dice Similarity Coefficients (DSCs) in the range of 0.87-0.96, except for the duodenum with a DSC of 0.80. The inference completes within one minute for a 3D volume of 320x288x180. Overall, the ALAMO model matches the state-of-the-art performance. The proposed ALAMO framework allows for fully automated abdominal MR segmentation with high accuracy and low memory and computation time demands.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
High-Freedom Inverse Design with Deep Neural Network for Metasurface Filter in the Visible
Authors:
Xiao Han,
Ziyang Fan,
Chao Li,
Zeyang Liu,
L. Jay Guo
Abstract:
In order to obtain a metasurface structure capable of filtering the light of a specific wavelength in the visible band, traditional method usually traverses the space consisting of possible designs, searching for a potentially satisfying device by performing iterative calculations to solve Maxwell's equations. In this paper, we propose a neural network that can complete an inverse design process t…
▽ More
In order to obtain a metasurface structure capable of filtering the light of a specific wavelength in the visible band, traditional method usually traverses the space consisting of possible designs, searching for a potentially satisfying device by performing iterative calculations to solve Maxwell's equations. In this paper, we propose a neural network that can complete an inverse design process to solve the problem. Compared with the traditional method, our method is much faster while competent of generating better devices with the desired spectrum. One of the most significant advantages is that it can handle a real spectrum as well as an artificial one. Besides, our method encompasses a high degree of freedom to generate devices, ensuring their generated spectra resemble desired ones and meeting the accuracy requirements without losing practicability in the manufacturing process.
△ Less
Submitted 8 December, 2019;
originally announced December 2019.
-
Fast acoustic scattering using convolutional neural networks
Authors:
Ziqi Fan,
Vibhav Vineet,
Hannes Gamper,
Nikunj Raghuvanshi
Abstract:
Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scatterer's cross-section to a 2D slice of the resulting spatial loudness distribution. We show that employing a full-resolution residual network for…
▽ More
Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scatterer's cross-section to a 2D slice of the resulting spatial loudness distribution. We show that employing a full-resolution residual network for the resulting image-to-image regression problem yields spatially detailed loudness fields with a root-mean-squared error of less than 1 dB, at over 100x speedup compared to full wave simulation.
△ Less
Submitted 15 February, 2020; v1 submitted 30 October, 2019;
originally announced November 2019.
-
Unsupervised pre-training for sequence to sequence speech recognition
Authors:
Zhiyun Fan,
Shiyu Zhou,
Bo Xu
Abstract:
This paper proposes a novel approach to pre-train encoder-decoder sequence-to-sequence (seq2seq) model with unpaired speech and transcripts respectively. Our pre-training method is divided into two stages, named acoustic pre-trianing and linguistic pre-training. In the acoustic pre-training stage, we use a large amount of speech to pre-train the encoder by predicting masked speech feature chunks w…
▽ More
This paper proposes a novel approach to pre-train encoder-decoder sequence-to-sequence (seq2seq) model with unpaired speech and transcripts respectively. Our pre-training method is divided into two stages, named acoustic pre-trianing and linguistic pre-training. In the acoustic pre-training stage, we use a large amount of speech to pre-train the encoder by predicting masked speech feature chunks with its context. In the linguistic pre-training stage, we generate synthesized speech from a large number of transcripts using a single-speaker text to speech (TTS) system, and use the synthesized paired data to pre-train decoder. This two-stage pre-training method integrates rich acoustic and linguistic knowledge into seq2seq model, which will benefit downstream automatic speech recognition (ASR) tasks. The unsupervised pre-training is finished on AISHELL-2 dataset and we apply the pre-trained model to multiple paired data ratios of AISHELL-1 and HKUST. We obtain relative character error rate reduction (CERR) from 38.24% to 7.88% on AISHELL-1 and from 12.00% to 1.20% on HKUST. Besides, we apply our pretrained model to a cross-lingual case with CALLHOME dataset. For all six languages in CALLHOME dataset, our pre-training method makes model outperform baseline consistently.
△ Less
Submitted 1 January, 2020; v1 submitted 27 October, 2019;
originally announced October 2019.
-
Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network
Authors:
Zhun Fan,
Jiajie Mo,
Benzhang Qiu,
Wenji Li,
Guijie Zhu,
Chong Li,
Jianye Hu,
Yibiao Rong,
Xinjian Chen
Abstract:
Retinal vessel segmentation is a crucial step in diagnosing and screening various diseases, including diabetes, ophthalmologic diseases, and cardiovascular diseases. In this paper, we propose an effective and efficient method for vessel segmentation in color fundus images using encoder-decoder based octave convolution networks. Compared with other convolution networks utilizing standard convolutio…
▽ More
Retinal vessel segmentation is a crucial step in diagnosing and screening various diseases, including diabetes, ophthalmologic diseases, and cardiovascular diseases. In this paper, we propose an effective and efficient method for vessel segmentation in color fundus images using encoder-decoder based octave convolution networks. Compared with other convolution networks utilizing standard convolution for feature extraction, the proposed method utilizes octave convolutions and octave transposed convolutions for learning multiple-spatial-frequency features, thus can better capture retinal vasculatures with varying sizes and shapes. To provide the network the capability of learning how to decode multifrequency features, we extend octave convolution and propose a new operation named octave transposed convolution. A novel architecture of convolutional neural network, named as Octave UNet integrating both octave convolutions and octave transposed convolutions is proposed based on the encoder-decoder architecture of UNet, which can generate high resolution vessel segmentation in one single forward feeding without post-processing steps. Comprehensive experimental results demonstrate that the proposed Octave UNet outperforms the baseline UNet achieving better or comparable performance to the state-of-the-art methods with fast processing speed. Specifically, the proposed method achieves 0.9664 / 0.9713 / 0.9759 / 0.9698 accuracy, 0.8374 / 0.8664 / 0.8670 / 0.8076 sensitivity, 0.9790 / 0.9798 / 0.9840 / 0.9831 specificity, 0.8127 / 0.8191 / 0.8313 / 0.7963 F1 score, and 0.9835 / 0.9875 / 0.9905 / 0.9845 Area Under Receiver Operating Characteristic curve, on DRIVE, STARE, CHASE_DB1, and HRF datasets, respectively.
△ Less
Submitted 22 September, 2020; v1 submitted 28 June, 2019;
originally announced June 2019.
-
Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks
Authors:
Zhun Fan,
Jiewei Lu,
Benzhang Qiu,
Tao Jiang,
Kang An,
Alex Noel Josephraj,
Chuliang Wei
Abstract:
Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framew…
▽ More
Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framework called CNN-DC is proposed to achieve automated steel bar counting and center localization simultaneously. The proposed framework CNN-DC first detects the candidate center points with a deep CNN. Then an effective clustering algorithm named as Distance Clustering(DC) is proposed to cluster the candidate center points and locate the true centers of steel bars. The proposed CNN-DC can achieve 99.26% accuracy for steel bar counting and 4.1% center offset for center localization on the established steel bar dataset, which demonstrates that the proposed CNN-DC can perform well on automated steel bar counting and center localization. Code is made publicly available at: https://github.com/BenzhangQiu/Steel-bar-Detection.
△ Less
Submitted 19 June, 2019; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Guaranteed-cost consensus for multiagent networks with Lipschitz nonlinear dynamics and switching topologies
Authors:
Jianxiang Xi,
Zhiliang Fan,
Hao Liu,
Tang Zheng
Abstract:
Guaranteed-cost consensus for high-order nonlinear multi-agent networks with switching topologies is investigated. By constructing a time-varying nonsingular matrix with a specific structure, the whole dynamics of multi-agent networks is decomposed into the consensus and disagreement parts with nonlinear terms, which is the key challenge to be dealt with. An explicit expression of the consensus dy…
▽ More
Guaranteed-cost consensus for high-order nonlinear multi-agent networks with switching topologies is investigated. By constructing a time-varying nonsingular matrix with a specific structure, the whole dynamics of multi-agent networks is decomposed into the consensus and disagreement parts with nonlinear terms, which is the key challenge to be dealt with. An explicit expression of the consensus dynamics, which contains the nonlinear term, is given and its initial state is determined. Furthermore, by the structure property of the time-varying nonsingular transformation matrix and the Lipschitz condition, the impacts of the nonlinear term on the disagreement dynamics are linearized and the gain matrix of the consensus protocol is determined on the basis of the Riccati equation. Moreover, an approach to minimize the guaranteed cost is given in terms of linear matrix inequalities. Finally, the numerical simulation is shown to demonstrate the effectiveness of theoretical results.
△ Less
Submitted 22 February, 2018;
originally announced February 2018.