Search | arXiv e-print repository

arXiv:2405.14964 [pdf]

Black Start Operation of Grid-Forming Converters Based on Generalized Three-phase Droop Control Under Unbalanced Conditions

Authors: Zexian Zeng, Prajwal Bhagwat, Maryam Saeedifard, Dominic Groß

Abstract: This paper focuses on the challenging task of bottom-up restoration in a complete blackout system using Grid-forming (GFM) converters. Challenges arise due to the limited current capability of power converters, resulting in distinct dynamic responses and fault current characteristics compared to synchronous generators. Additionally, GFM control needs to address the presence of unbalanced condition… ▽ More This paper focuses on the challenging task of bottom-up restoration in a complete blackout system using Grid-forming (GFM) converters. Challenges arise due to the limited current capability of power converters, resulting in distinct dynamic responses and fault current characteristics compared to synchronous generators. Additionally, GFM control needs to address the presence of unbalanced conditions commonly found in distribution systems. To address these challenges, this paper explores the black start capability of GFM converters with a generalized three-phase GFM droop control. This approach integrates GFM controls individually for each phase, incorporating phase-balancing feedback and enabling current limiting for each phase during unbalanced faults or overloading. The introduction of a phase-balancing gain provides flexibility to trade-off between voltage and power imbalances. The study further investigates bottom-up black start operations using GFM converters, incorporating advanced load relays into breakers for gradual load energization without central coordination. The effectiveness of bottom-up black start operations with GFM converters, utilizing the generalized three-phase GFM droop, is evaluated through electromagnetic transient (EMT) simulations in MATLAB/Simulink. The results confirm the performance and effectiveness of this approach in achieving successful black start operations under unbalanced conditions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2402.17780 [pdf, other]

Constraint Latent Space Matters: An Anti-anomalous Waveform Transformation Solution from Photoplethysmography to Arterial Blood Pressure

Authors: Cheng Bian, Xiaoyu Li, Qi Bi, Guangpu Zhu, Jiegeng Lyu, Weile Zhang, Yelei Li, Zi**g Zeng

Abstract: Arterial blood pressure (ABP) holds substantial promise for proactive cardiovascular health management. Notwithstanding its potential, the invasive nature of ABP measurements confines their utility primarily to clinical environments, limiting their applicability for continuous monitoring beyond medical facilities. The conversion of photoplethysmography (PPG) signals into ABP equivalents has garner… ▽ More Arterial blood pressure (ABP) holds substantial promise for proactive cardiovascular health management. Notwithstanding its potential, the invasive nature of ABP measurements confines their utility primarily to clinical environments, limiting their applicability for continuous monitoring beyond medical facilities. The conversion of photoplethysmography (PPG) signals into ABP equivalents has garnered significant attention due to its potential in revolutionizing cardiovascular disease management. Recent strides in PPG-to-ABP prediction encompass the integration of generative and discriminative models. Despite these advances, the efficacy of these models is curtailed by the latent space shift predicament, stemming from alterations in PPG data distribution across disparate hardware and individuals, potentially leading to distorted ABP waveforms. To tackle this problem, we present an innovative solution named the Latent Space Constraint Transformer (LSCT), leveraging a quantized codebook to yield robust latent spaces by employing multiple discretizing bases. To facilitate improved reconstruction, the Correlation-boosted Attention Module (CAM) is introduced to systematically query pertinent bases on a global scale. Furthermore, to enhance expressive capacity, we propose the Multi-Spectrum Enhancement Knowledge (MSEK), which fosters local information flow within the channels of latent code and provides additional embedding for reconstruction. Through comprehensive experimentation on both publicly available datasets and a private downstream task dataset, the proposed approach demonstrates noteworthy performance enhancements compared to existing methods. Extensive ablation studies further substantiate the effectiveness of each introduced module. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by AAAI-2024, main track

arXiv:2312.15575 [pdf, other]

Neural Born Series Operator for Biomedical Ultrasound Computed Tomography

Authors: Zhijun Zeng, Yihang Zheng, Youjia Zheng, Yubing Li, Zuoqiang Shi, He Sun

Abstract: Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilita… ▽ More Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilitating a more efficient USCT image reconstruction process through an NBSO-based FWI pipeline. Thoroughly validated on comprehensive brain and breast datasets, simulated under experimental USCT conditions, the NBSO proves to be accurate and efficient in both forward simulation and image reconstruction. This advancement demonstrates the potential of neural operators in facilitating near real-time USCT reconstruction, making the clinical application of USCT increasingly viable and promising. △ Less

Submitted 24 December, 2023; originally announced December 2023.

ACM Class: I.4.5; J.3

arXiv:2312.10052 [pdf, other]

ESTformer: Transformer Utilizing Spatiotemporal Dependencies for EEG Super-resolution

Authors: Dongdong Li, Zhongliang Zeng, Zhe Wang, Hai Yang

Abstract: Towards practical applications of Electroencephalography (EEG) data, lightweight acquisition devices, equipped with a few electrodes, result in a predicament where analysis methods can only leverage EEG data with extremely low spatial resolution. Recent methods mainly focus on using mathematical interpolation methods and Convolutional Neural Networks for EEG super-resolution (SR), but they suffer… ▽ More Towards practical applications of Electroencephalography (EEG) data, lightweight acquisition devices, equipped with a few electrodes, result in a predicament where analysis methods can only leverage EEG data with extremely low spatial resolution. Recent methods mainly focus on using mathematical interpolation methods and Convolutional Neural Networks for EEG super-resolution (SR), but they suffer from high computation costs, extra bias, and few insights in spatiotemporal dependency modeling. To this end, we propose the ESTformer, an EEG SR framework utilizing spatiotemporal dependencies based on the Transformer. The ESTformer applies positional encoding methods and the Multi-head Self-attention mechanism to the space and time dimensions, which can learn spatial structural information and temporal functional variation. The ESTformer, with the fixed masking strategy, adopts a mask token to up-sample the low-resolution (LR) EEG data in case of disturbance from mathematical interpolation methods. On this basis, we design various Transformer blocks to construct the Spatial Interpolation Module (SIM) and the Temporal Reconstruction Module (TRM). Finally, the ESTformer cascades the SIM and the TRM to capture and model spatiotemporal dependencies for EEG SR with fidelity. Extensive experimental results on two EEG datasets show the effectiveness of the ESTformer against previous state-of-the-art methods and verify the superiority of the SR data to the LR data in EEG-based downstream tasks of person identification and emotion recognition. The proposed ESTformer demonstrates the versatility of the Transformer for EEG SR tasks. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.12840 [pdf, other]

Wafer Map Defect Patterns Semi-Supervised Classification Using Latent Vector Representation

Authors: Qiyu Wei, Wei Zhao, Xiaoyan Zheng, Zeng Zeng

Abstract: As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample i… ▽ More As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample images, which are then assessed by experts for defects. This approach is labor-intensive and inefficient. Consequently, there is a pressing need to develop a model capable of automatically detecting defects as an alternative to manual operations. In this paper, we propose a method that initially employs a pre-trained VAE model to obtain the fault distribution information of the wafer map. This information serves as guidance, combined with the original image set for semi-supervised model training. During the semi-supervised training, we utilize a teacher-student network for iterative learning. The model presented in this paper is validated on the benchmark dataset WM-811K wafer dataset. The experimental results demonstrate superior classification accuracy and detection performance compared to state-of-the-art models, fulfilling the requirements for industrial applications. Compared to the original architecture, we have achieved significant performance improvement. △ Less

Submitted 6 October, 2023; originally announced November 2023.

Comments: 6 pages, 2 figures, CIS confernece

arXiv:2309.14608 [pdf, other]

doi 10.1109/TCST.2023.3338110

A Demand-Supply Cooperative Responding Strategy in Power System with High Renewable Energy Penetration

Authors: Yuanzheng Li, Xinxin Long, Yang Li, Yizhou Ding, Tao Yang, Zhigang Zeng

Abstract: Industrial demand response (IDR) plays an important role in promoting the utilization of renewable energy (RE) in power systems. However, it will lead to power adjustments on the supply side, which is also a non-negligible factor in affecting RE utilization. To comprehensively analyze this impact while enhancing RE utilization, this paper proposes a power demand-supply cooperative response (PDSCR)… ▽ More Industrial demand response (IDR) plays an important role in promoting the utilization of renewable energy (RE) in power systems. However, it will lead to power adjustments on the supply side, which is also a non-negligible factor in affecting RE utilization. To comprehensively analyze this impact while enhancing RE utilization, this paper proposes a power demand-supply cooperative response (PDSCR) strategy based on both day-ahead and intraday time scales. The day-ahead PDSCR determines a long-term scheme for responding to the predictable trends in RE supply. However, this long-term scheme may not be suitable when uncertain RE fluctuations occur on an intraday basis. Regarding intraday PDSCR, we formulate a profit-driven cooperation approach to address the issue of RE fluctuations. In this context, unreasonable profit distributions on the demand-supply side would lead to the conflict of interests and diminish the effectiveness of cooperative responses. To mitigate this issue, we derive multi-individual profit distribution marginal solutions (MIPDMSs) based on satisfactory profit distributions, which can also maximize cooperative profits. Case studies are conducted on an modified IEEE 24-bus system and an actual power system in China. The results verify the effectiveness of the proposed strategy for enhancing RE utilization, via optimizing the coordination of IDR flexibility with generation resources. △ Less

Submitted 1 December, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE Transactions on Control Systems Technology

Journal ref: IEEE Transactions on Control Systems Technology 32 (2024) 874-890

arXiv:2308.01040 [pdf, other]

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Authors: Xinfeng Li, Chen Yan, Xuancun Lu, Zihan Zeng, Xiaoyu Ji, Wenyuan Xu

Abstract: Automatic speech recognition (ASR) systems have been shown to be vulnerable to adversarial examples (AEs). Recent success all assumes that users will not notice or disrupt the attack process despite the existence of music/noise-like sounds and spontaneous responses from voice assistants. Nonetheless, in practical user-present scenarios, user awareness may nullify existing attack attempts that laun… ▽ More Automatic speech recognition (ASR) systems have been shown to be vulnerable to adversarial examples (AEs). Recent success all assumes that users will not notice or disrupt the attack process despite the existence of music/noise-like sounds and spontaneous responses from voice assistants. Nonetheless, in practical user-present scenarios, user awareness may nullify existing attack attempts that launch unexpected sounds or ASR usage. In this paper, we seek to bridge the gap in existing research and extend the attack to user-present scenarios. We propose VRIFLE, an inaudible adversarial perturbation (IAP) attack via ultrasound delivery that can manipulate ASRs as a user speaks. The inherent differences between audible sounds and ultrasounds make IAP delivery face unprecedented challenges such as distortion, noise, and instability. In this regard, we design a novel ultrasonic transformation model to enhance the crafted perturbation to be physically effective and even survive long-distance delivery. We further enable VRIFLE's robustness by adopting a series of augmentation on user and real-world variations during the generation process. In this way, VRIFLE features an effective real-time manipulation of the ASR output from different distances and under any speech of users, with an alter-and-mute strategy that suppresses the impact of user disruption. Our extensive experiments in both digital and physical worlds verify VRIFLE's effectiveness under various configurations, robustness against six kinds of defenses, and universality in a targeted manner. We also show that VRIFLE can be delivered with a portable attack device and even everyday-life loudspeakers. △ Less

Submitted 12 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: Accepted by NDSS Symposium 2024. Please cite this paper as "Xinfeng Li, Chen Yan, Xuancun Lu, Zihan Zeng, Xiaoyu Ji, Wenyuan Xu. Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time. In Network and Distributed System Security (NDSS) Symposium 2024."

arXiv:2305.06978 [pdf, other]

doi 10.1007/978-3-031-16443-9_13

Meta-hallucinator: Towards Few-Shot Cross-Modality Cardiac Image Segmentation

Authors: Ziyuan Zhao, Fangcheng Zhou, Zeng Zeng, Cuntai Guan, S. Kevin Zhou

Abstract: Domain shift and label scarcity heavily limit deep learning applications to various medical image analysis tasks. Unsupervised domain adaptation (UDA) techniques have recently achieved promising cross-modality medical image segmentation by transferring knowledge from a label-rich source domain to an unlabeled target domain. However, it is also difficult to collect annotations from the source domai… ▽ More Domain shift and label scarcity heavily limit deep learning applications to various medical image analysis tasks. Unsupervised domain adaptation (UDA) techniques have recently achieved promising cross-modality medical image segmentation by transferring knowledge from a label-rich source domain to an unlabeled target domain. However, it is also difficult to collect annotations from the source domain in many clinical applications, rendering most prior works suboptimal with the label-scarce source domain, particularly for few-shot scenarios, where only a few source labels are accessible. To achieve efficient few-shot cross-modality segmentation, we propose a novel transformation-consistent meta-hallucination framework, meta-hallucinator, with the goal of learning to diversify data distributions and generate useful examples for enhancing cross-modality performance. In our framework, hallucination and segmentation models are jointly trained with the gradient-based meta-learning strategy to synthesize examples that lead to good segmentation performance on the target domain. To further facilitate data hallucination and cross-domain knowledge transfer, we develop a self-ensembling model with a hallucination-consistent property. Our meta-hallucinator can seamlessly collaborate with the meta-segmenter for learning to hallucinate with mutual benefits from a combined view of meta-learning and self-ensembling learning. Extensive studies on MM-WHS 2017 dataset for cross-modality cardiac segmentation demonstrate that our method performs favorably against various approaches by a lot in the few-shot UDA scenario. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted by MICCAI 2022 (top 13% paper; early accept)

Journal ref: Medical Image Computing and Computer Assisted Intervention, MICCAI 2022. Lecture Notes in Computer Science, vol 13435. Springer, Cham

arXiv:2303.01927 [pdf, other]

A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Authors: Zhexuan Zeng, Ye Yuan

Abstract: The sampling theorem plays a fundamental role for the recovery of continuous-time signals from discrete-time samples in the field of signal processing. The sampling theorem of non-band-limited signals has evolved into one of the most challenging problems. In this work, a generalized sampling theorem -- which builds on the Koopman operator -- is proved for signals in generator-bounded space (Theore… ▽ More The sampling theorem plays a fundamental role for the recovery of continuous-time signals from discrete-time samples in the field of signal processing. The sampling theorem of non-band-limited signals has evolved into one of the most challenging problems. In this work, a generalized sampling theorem -- which builds on the Koopman operator -- is proved for signals in generator-bounded space (Theorem 1). It naturally extends the Nyquist-Shannon sampling theorem that, 1) for band-limited signals, the lower bounds of sampling frequency given by these two theorems are exactly the same; 2) the Koopman operator-based sampling theorem can also provide finite bound of sampling frequency for certain types of non-band-limited signals, which can not be addressed by Nyquist-Shannon sampling theorem. These types of non-band-limited signals include but not limited to, for example, inverse Laplace transform with limited imaginary interval of integration, and linear combinations of complex exponential functions. Moreover, the Koopman operator-based reconstruction algorithm is provided with theoretical result of convergence. By this algorithm, the sampling theorem is effectively illustrated on several signals related to sine, exponential and polynomial signals. △ Less

Submitted 6 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2301.07414 [pdf]

A Smart Adaptively Reconfigurable DC Battery for Higher Efficiency of Electric Vehicle Drive Trains

Authors: Zhongxi Li, Aobo Yang, Gerry Chen, Nima Tashakor, Zhiyong Zeng, Angel V. Peterchev, Stefan M. Goetz

Abstract: This paper proposes a drive train topology with a dynamically reconfigurable dc battery, which breaks hard-wired batteries into smaller subunits. It can rapidly control the output voltage and even contribute to voltage sha** of the inverter. Based upon the rapid development of low-voltage transistors and modular circuit topologies in the recent years, the proposed technology uses recent 48 V pow… ▽ More This paper proposes a drive train topology with a dynamically reconfigurable dc battery, which breaks hard-wired batteries into smaller subunits. It can rapidly control the output voltage and even contribute to voltage sha** of the inverter. Based upon the rapid development of low-voltage transistors and modular circuit topologies in the recent years, the proposed technology uses recent 48 V power electronics to achieve higher-voltage output and reduce losses in electric vehicle (EV) drive trains. The fast switching capability and low loss of low-voltage field effect transistors (FET) allow sharing the modulation with the main drive inverter. As such, the slower insulated-gate bipolar transistors (IGBT) of the inverter can operate at ideal duty cycle and aggressively reduced switching, while the adaptive dc battery provides an adjustable voltage and all common-mode contributions at the dc link with lower loss. Up to 2/3 of the switching of the main inverter is avoided. At high drive speeds and thus large modulation indices, the proposed converter halves the total loss compared to using the inverter alone; at lower speeds and thus smaller modulation indices, the advantage is even more prominent because of the dynamically lowered dc-link voltage. Furthermore, it can substantially reduce the distortion, particularly at lower modulation indices, e.g., down to 1/2 compared to conventional space-vector modulation and even 1/3 for discontinuous pulse-width modulation with hard-wired battery. Other benefits include alleviated insulation stress for motor windings, active battery balancing, and eliminating the vulnerability of large hard-wired battery packs to weak cells. We demonstrate the proposed motor drive on a 3-kW setup with eight battery modules. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 11 pages, 9 figures

arXiv:2301.00641 [pdf, other]

doi 10.1109/TNNLS.2022.3232630

Federated Multi-Agent Deep Reinforcement Learning Approach via Physics-Informed Reward for Multi-Microgrid Energy Management

Authors: Yuanzheng Li, Shangyang He, Yang Li, Yang Shi, Zhigang Zeng

Abstract: The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of develo** an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability… ▽ More The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of develo** an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability. However, its training requires massive energy operation data of microgrids (MGs), while gathering these data from different MGs would threaten their privacy and data security. Therefore, this paper tackles this practical yet challenging issue by proposing a federated multi-agent deep reinforcement learning (F-MADRL) algorithm via the physics-informed reward. In this algorithm, the federated learning (FL) mechanism is introduced to train the F-MADRL algorithm thus ensures the privacy and the security of data. In addition, a decentralized MMG model is built, and the energy of each participated MG is managed by an agent, which aims to minimize economic costs and keep self energy-sufficiency according to the physics-informed reward. At first, MGs individually execute the self-training based on local energy operation data to train their local agent models. Then, these local models are periodically uploaded to a server and their parameters are aggregated to build a global agent, which will be broadcasted to MGs and replace their local agents. In this way, the experience of each MG agent can be shared and the energy operation data is not explicitly transmitted, thus protecting the privacy and ensuring data security. Finally, experiments are conducted on Oak Ridge national laboratory distributed energy control communication lab microgrid (ORNL-MG) test system, and the comparisons are carried out to verify the effectiveness of introducing the FL mechanism and the outperformance of our proposed F-MADRL. △ Less

Submitted 29 December, 2022; originally announced January 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

Journal ref: IEEE Transactions on Neural Networks and Learning Systems 35 (2024) 5902-5914

arXiv:2212.03814 [pdf, other]

iQuery: Instruments as Queries for Audio-Visual Sound Separation

Authors: Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, Jianbo Shi

Abstract: Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multi-modal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument: one must finetune the entire visual and audio network for all music… ▽ More Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multi-modal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument: one must finetune the entire visual and audio network for all musical instruments. We re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism. Our approach ensures cross-modal consistency and cross-instrument disentanglement. We utilize "visually named" queries to initiate the learning of audio queries and use cross-modal attention to remove potential sound source interference at the estimated waveforms. To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism. Experimental results on three benchmarks demonstrate that our iQuery improves audio-visual sound source separation performance. △ Less

Submitted 8 December, 2022; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2212.02078 [pdf, other]

doi 10.1109/TMI.2022.3214766

LE-UDA: Label-efficient unsupervised domain adaptation for medical image segmentation

Authors: Ziyuan Zhao, Fangcheng Zhou, Kaixin Xu, Zeng Zeng, Cuntai Guan, S. Kevin Zhou

Abstract: While deep learning methods hitherto have achieved considerable success in medical image segmentation, they are still hampered by two limitations: (i) reliance on large-scale well-labeled datasets, which are difficult to curate due to the expert-driven and time-consuming nature of pixel-level annotations in clinical practices, and (ii) failure to generalize from one domain to another, especially w… ▽ More While deep learning methods hitherto have achieved considerable success in medical image segmentation, they are still hampered by two limitations: (i) reliance on large-scale well-labeled datasets, which are difficult to curate due to the expert-driven and time-consuming nature of pixel-level annotations in clinical practices, and (ii) failure to generalize from one domain to another, especially when the target domain is a different modality with severe domain shifts. Recent unsupervised domain adaptation~(UDA) techniques leverage abundant labeled source data together with unlabeled target data to reduce the domain gap, but these methods degrade significantly with limited source annotations. In this study, we address this underexplored UDA problem, investigating a challenging but valuable realistic scenario, where the source domain not only exhibits domain shift~w.r.t. the target domain but also suffers from label scarcity. In this regard, we propose a novel and generic framework called ``Label-Efficient Unsupervised Domain Adaptation"~(LE-UDA). In LE-UDA, we construct self-ensembling consistency for knowledge transfer between both domains, as well as a self-ensembling adversarial learning module to achieve better feature alignment for UDA. To assess the effectiveness of our method, we conduct extensive experiments on two different tasks for cross-modality segmentation between MRI and CT images. Experimental results demonstrate that the proposed LE-UDA can efficiently leverage limited source labels to improve cross-domain segmentation performance, outperforming state-of-the-art UDA approaches in the literature. Code is available at: https://github.com/jacobzhaoziyuan/LE-UDA. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted by IEEE Transactions on Medical Imaging, 2022

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2208.04035 [pdf, other]

doi 10.1109/ASRU51503.2021.9688088

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Authors: Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, **g Xiao

Abstract: Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a conditional autoencoder based method, achieved excellent conversion results by disentangling the speaker identity and the speech content using information-constraining bottlenecks. However, due to the pure autoencoder training method, it is difficult to evaluate the separat… ▽ More Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a conditional autoencoder based method, achieved excellent conversion results by disentangling the speaker identity and the speech content using information-constraining bottlenecks. However, due to the pure autoencoder training method, it is difficult to evaluate the separation effect of content and speaker identity. In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content. In addition, the adversarial training is applied to eliminate the speaker identity information in the estimated content embedding extracted from speech. Under the guidance of the expected content embedding and the adversarial training, the content encoder is trained to extract speaker-independent content embedding from speech. Experiments on AIShell-3 dataset show that the proposed model outperforms AutoVC in terms of naturalness and similarity of converted speech. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: ASRU 6 pages

Journal ref: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 938-945

arXiv:2207.10284 [pdf, other]

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Authors: Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

Abstract: Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combination… ▽ More Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: ICML2022

arXiv:2207.01900 [pdf, other]

doi 10.1109/ICIP46576.2022.9897494

ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised Memory-efficient Medical Image Segmentation

Authors: Ziyuan Zhao, Andong Zhu, Zeng Zeng, Bharadwaj Veeravalli, Cuntai Guan

Abstract: While deep models have shown promising performance in medical image segmentation, they heavily rely on a large amount of well-annotated data, which is difficult to access, especially in clinical practice. On the other hand, high-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. In this work, we propose a novel asymmetric co-teacher framework, ACT-… ▽ More While deep models have shown promising performance in medical image segmentation, they heavily rely on a large amount of well-annotated data, which is difficult to access, especially in clinical practice. On the other hand, high-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. In this work, we propose a novel asymmetric co-teacher framework, ACT-Net, to alleviate the burden on both expensive annotations and computational costs for semi-supervised knowledge distillation. We advance teacher-student learning with a co-teacher network to facilitate asymmetric knowledge distillation from large models to small ones by alternating student and teacher roles, obtaining tiny but accurate models for clinical employment. To verify the effectiveness of our ACT-Net, we employ the ACDC dataset for cardiac substructure segmentation in our experiments. Extensive experimental results demonstrate that ACT-Net outperforms other knowledge distillation methods and achieves lossless segmentation performance with 250x fewer parameters. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2207.01883 [pdf, other]

doi 10.1109/ICIP46576.2022.9897591

MMGL: Multi-Scale Multi-View Global-Local Contrastive learning for Semi-supervised Cardiac Image Segmentation

Authors: Ziyuan Zhao, **xuan Hu, Zeng Zeng, Xulei Yang, Peisheng Qian, Bharadwaj Veeravalli, Cuntai Guan

Abstract: With large-scale well-labeled datasets, deep learning has shown significant success in medical image segmentation. However, it is challenging to acquire abundant annotations in clinical practice due to extensive expertise requirements and costly labeling efforts. Recently, contrastive learning has shown a strong capacity for visual representation learning on unlabeled data, achieving impressive pe… ▽ More With large-scale well-labeled datasets, deep learning has shown significant success in medical image segmentation. However, it is challenging to acquire abundant annotations in clinical practice due to extensive expertise requirements and costly labeling efforts. Recently, contrastive learning has shown a strong capacity for visual representation learning on unlabeled data, achieving impressive performance rivaling supervised learning in many domains. In this work, we propose a novel multi-scale multi-view global-local contrastive learning (MMGL) framework to thoroughly explore global and local features from different scales and views for robust contrastive learning performance, thereby improving segmentation performance with limited annotations. Extensive experiments on the MM-WHS dataset demonstrate the effectiveness of MMGL framework on semi-supervised cardiac image segmentation, outperforming the state-of-the-art contrastive learning methods by a large margin. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE International Conference on Image Processing (ICIP 2022)

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2206.08544 [pdf, other]

doi 10.20517/ir.2021.08

Bio-inspired Intelligence with Applications to Robotics: A Survey

Authors: Junfei Li, Zhe Xu, Danjie Zhu, Kevin Dong, Tao Yan, Zhu Zeng, Simon X. Yang

Abstract: In the past decades, considerable attention has been paid to bio-inspired intelligence and its applications to robotics. This paper provides a comprehensive survey of bio-inspired intelligence, with a focus on neurodynamics approaches, to various robotic applications, particularly to path planning and control of autonomous robotic systems. Firstly, the bio-inspired shunting model and its variants… ▽ More In the past decades, considerable attention has been paid to bio-inspired intelligence and its applications to robotics. This paper provides a comprehensive survey of bio-inspired intelligence, with a focus on neurodynamics approaches, to various robotic applications, particularly to path planning and control of autonomous robotic systems. Firstly, the bio-inspired shunting model and its variants (additive model and gated dipole model) are introduced, and their main characteristics are given in detail. Then, two main neurodynamics applications to real-time path planning and control of various robotic systems are reviewed. A bio-inspired neural network framework, in which neurons are characterized by the neurodynamics models, is discussed for mobile robots, cleaning robots, and underwater robots. The bio-inspired neural network has been widely used in real-time collision-free navigation and cooperation without any learning procedures, global cost functions, and prior knowledge of the dynamic environment. In addition, bio-inspired backstep** controllers for various robotic systems, which are able to eliminate the speed jump when a large initial tracking error occurs, are further discussed. Finally, the current challenges and future research directions are discussed in this paper. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2205.10758 [pdf, other]

doi 10.1109/EMBC48229.2022.9871233

Residual Channel Attention Network for Brain Glioma Segmentation

Authors: Yiming Yao, Peisheng Qian, Ziyuan Zhao, Zeng Zeng

Abstract: A glioma is a malignant brain tumor that seriously affects cognitive functions and lowers patients' life quality. Segmentation of brain glioma is challenging because of interclass ambiguities in tumor regions. Recently, deep learning approaches have achieved outstanding performance in the automatic segmentation of brain glioma. However, existing algorithms fail to exploit channel-wise feature inte… ▽ More A glioma is a malignant brain tumor that seriously affects cognitive functions and lowers patients' life quality. Segmentation of brain glioma is challenging because of interclass ambiguities in tumor regions. Recently, deep learning approaches have achieved outstanding performance in the automatic segmentation of brain glioma. However, existing algorithms fail to exploit channel-wise feature interdependence to select semantic attributes for glioma segmentation. In this study, we implement a novel deep neural network that integrates residual channel attention modules to calibrate intermediate features for glioma segmentation. The proposed channel attention mechanism adaptively weights feature channel-wise to optimize the latent representation of gliomas. We evaluate our method on the established dataset BraTS2017. Experimental results indicate the superiority of our method. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2205.10757 [pdf, other]

doi 10.1109/EMBC48229.2022.9871848

Deep Feature Fusion via Graph Convolutional Network for Intracranial Artery Labeling

Authors: Yaxin Zhu, Peisheng Qian, Ziyuan Zhao, Zeng Zeng

Abstract: Intracranial arteries are critical blood vessels that supply the brain with oxygenated blood. Intracranial artery labels provide valuable guidance and navigation to numerous clinical applications and disease diagnoses. Various machine learning algorithms have been carried out for automation in the anatomical labeling of cerebral arteries. However, the task remains challenging because of the high c… ▽ More Intracranial arteries are critical blood vessels that supply the brain with oxygenated blood. Intracranial artery labels provide valuable guidance and navigation to numerous clinical applications and disease diagnoses. Various machine learning algorithms have been carried out for automation in the anatomical labeling of cerebral arteries. However, the task remains challenging because of the high complexity and variations of intracranial arteries. This study investigates a novel graph convolutional neural network with deep feature fusion for cerebral artery labeling. We introduce stacked graph convolutions in an encoder-core-decoder architecture, extracting high-level representations from graph nodes and their neighbors. Furthermore, we efficiently aggregate intermediate features from different hierarchies to enhance the proposed model's representation capability and labeling performance. We perform extensive experiments on public datasets, in which the results prove the superiority of our approach over baselines by a clear margin. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2205.07021 [pdf, other]

doi 10.1109/EMBC48229.2022.9871734

Self-supervised Assisted Active Learning for Skin Lesion Segmentation

Authors: Ziyuan Zhao, Wen**g Lu, Zeng Zeng, Kaixin Xu, Bharadwaj Veeravalli, Cuntai Guan

Abstract: Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with so… ▽ More Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with some randomly selected samples followed by active selection based on various criteria, such as uncertainty and diversity. Such random-start initialization methods inevitably introduce under-value redundant samples and unnecessary annotation costs. For the purpose of addressing the issue, we propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning (SSL), and then SSL features are used for sample selection via latent feature clustering without accessing labels. We assess our proposed methodology on skin lesions segmentation task. Extensive experiments demonstrate that our approach is capable of achieving promising performance with substantial improvements over existing baselines. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2204.14021 [pdf, ps, other]

A Sampling Theorem for Exact Identification of Continuous-time Nonlinear Dynamical Systems

Authors: Zhexuan Zeng, Zuogong Yue, Alexandre Mauroy, Jorge Goncalves, Ye Yuan

Abstract: Low sampling frequency challenges the exact identification of the continuous-time (CT) dynamical system from sampled data, even when its model is identifiable. The necessary and sufficient condition is proposed -- which is built from Koopman operator -- to the exact identification of the CT system from sampled data. The condition gives a Nyquist-Shannon-like critical frequency for exact identifica… ▽ More Low sampling frequency challenges the exact identification of the continuous-time (CT) dynamical system from sampled data, even when its model is identifiable. The necessary and sufficient condition is proposed -- which is built from Koopman operator -- to the exact identification of the CT system from sampled data. The condition gives a Nyquist-Shannon-like critical frequency for exact identification of CT nonlinear dynamical systems with Koopman invariant subspaces: 1) it establishes a sufficient condition for a sampling frequency that permits a discretized sequence of samples to discover the underlying system and 2) it also establishes a necessary condition for a sampling frequency that leads to system aliasing that the underlying system is indistinguishable; and 3) the original CT signal does not have to be band-limited as required in the Nyquist-Shannon Theorem. The theoretical criterion has been demonstrated on a number of simulated examples, including linear systems, nonlinear systems with equilibria, and limit cycles. △ Less

Submitted 29 April, 2022; originally announced April 2022.

arXiv:2204.13851 [pdf, other]

COVID-Net US-X: Enhanced Deep Neural Network for Detection of COVID-19 Patient Cases from Convex Ultrasound Imaging Through Extended Linear-Convex Ultrasound Augmentation Learning

Authors: E. Zhixuan Zeng, Adrian Florea, Alexander Wong

Abstract: As the global population continues to face significant negative impact by the on-going COVID-19 pandemic, there has been an increasing usage of point-of-care ultrasound (POCUS) imaging as a low-cost and effective imaging modality of choice in the COVID-19 clinical workflow. A major barrier with widespread adoption of POCUS in the COVID-19 clinical workflow is the scarcity of expert clinicians that… ▽ More As the global population continues to face significant negative impact by the on-going COVID-19 pandemic, there has been an increasing usage of point-of-care ultrasound (POCUS) imaging as a low-cost and effective imaging modality of choice in the COVID-19 clinical workflow. A major barrier with widespread adoption of POCUS in the COVID-19 clinical workflow is the scarcity of expert clinicians that can interpret POCUS examinations, leading to considerable interest in deep learning-driven clinical decision support systems to tackle this challenge. A major challenge to building deep neural networks for COVID-19 screening using POCUS is the heterogeneity in the types of probes used to capture ultrasound images (e.g., convex vs. linear probes), which can lead to very different visual appearances. In this study, we explore the impact of leveraging extended linear-convex ultrasound augmentation learning on producing enhanced deep neural networks for COVID-19 assessment, where we conduct data augmentation on convex probe data alongside linear probe data that have been transformed to better resemble convex probe data. Experimental results using an efficient deep columnar anti-aliased convolutional neural network designed via a machined-driven design exploration strategy (which we name COVID-Net US-X) show that the proposed extended linear-convex ultrasound augmentation learning significantly increases performance, with a gain of 5.1% in test accuracy and 13.6% in AUC. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 6 pages

arXiv:2204.08661 [pdf]

Dir-MUSIC Algorithm for DOA Estimation of Partial Discharge Based on Signal Strength represented by Antenna Gain Array Manifold

Authors: Wencong Xu, Yandong Li, Bingshu Chen, Yue Hu, Jianxu Li, Zi**g Zeng

Abstract: Inspection robots are widely used in the field of smart grid monitoring in substations, and partial discharge (PD) is an important sign of the insulation state of equipments. PD direction of arrival (DOA) algorithms using conventional beamforming and time difference of arrival (TDOA) require large-scale antenna arrays and high computational complexity, which make them difficult to implement on ins… ▽ More Inspection robots are widely used in the field of smart grid monitoring in substations, and partial discharge (PD) is an important sign of the insulation state of equipments. PD direction of arrival (DOA) algorithms using conventional beamforming and time difference of arrival (TDOA) require large-scale antenna arrays and high computational complexity, which make them difficult to implement on inspection robots. To address this problem, a novel directional multiple signal classification (Dir-MUSIC) algorithm for PD direction finding based on signal strength is proposed, and a miniaturized directional spiral antenna circular array is designed in this paper. First, the Dir-MUSIC algorithm is derived based on the array manifold characteristics. This method uses strength intensity information rather than the TDOA information, which could reduce the computational difficulty and the requirement of array size. Second, the effects of signal-to-noise ratio (SNR) and array manifold error on the performance of the algorithm are discussed through simulations in detail. Then according to the positioning requirements, the antenna array and its arrangement are developed, optimized, and simulation results suggested that the algorithm has reliable direction-finding performance in the form of 6 elements. Finally, the effectiveness of the algorithm is tested by using the designed spiral circular array in real scenarios. The experimental results show that the PD direction-finding error is 3.39°, which can meet the need for Partial discharge DOA estimation using inspection robots in substations. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: 8 pages,13 figures,24 references

arXiv:2204.07905 [pdf, other]

doi 10.1109/TIV.2022.3168577

Probabilistic Charging Power Forecast of EVCS: Reinforcement Learning Assisted Deep Learning Approach

Authors: Yuanzheng Li, Shangyang He, Yang Li, Leijiao Ge, Suhua Lou, Zhigang Zeng

Abstract: The electric vehicle (EV) and electric vehicle charging station (EVCS) have been widely deployed with the development of large-scale transportation electrifications. However, since charging behaviors of EVs show large uncertainties, the forecasting of EVCS charging power is non-trivial. This paper tackles this issue by proposing a reinforcement learning assisted deep learning framework for the pro… ▽ More The electric vehicle (EV) and electric vehicle charging station (EVCS) have been widely deployed with the development of large-scale transportation electrifications. However, since charging behaviors of EVs show large uncertainties, the forecasting of EVCS charging power is non-trivial. This paper tackles this issue by proposing a reinforcement learning assisted deep learning framework for the probabilistic EVCS charging power forecasting to capture its uncertainties. Since the EVCS charging power data are not standard time-series data like electricity load, they are first converted to the time-series format. On this basis, one of the most popular deep learning models, the long short-term memory (LSTM) is used and trained to obtain the point forecast of EVCS charging power. To further capture the forecast uncertainty, a Markov decision process (MDP) is employed to model the change of LSTM cell states, which is solved by our proposed adaptive exploration proximal policy optimization (AePPO) algorithm based on reinforcement learning. Finally, experiments are carried out on the real EVCSs charging data from Caltech, and Jet Propulsion Laboratory, USA, respectively. The results and comparative analysis verify the effectiveness and outperformance of our proposed framework. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE Transactions on Intelligent Vehicles

Journal ref: IEEE Transactions on Intelligent Vehicles 8 (2023) 344-357

arXiv:2204.03329 [pdf]

Information-driven Path Planning for Hybrid Aerial Underwater Vehicles

Authors: Zheng Zeng, Chengke Xiong, Xinyi Yuan, Yulin Bai, Yufei **, Di Lu, Lian Lian

Abstract: This paper presents a novel Rapidly-exploring Adaptive Sampling Tree (RAST) algorithm for the adaptive sampling mission of a hybrid aerial underwater vehicle (HAUV) in an air-sea 3D environment. This algorithm innovatively combines the tournament-based point selection sampling strategy, the information heuristic search process and the framework of Rapidly-exploring Random Tree (RRT) algorithm. Hen… ▽ More This paper presents a novel Rapidly-exploring Adaptive Sampling Tree (RAST) algorithm for the adaptive sampling mission of a hybrid aerial underwater vehicle (HAUV) in an air-sea 3D environment. This algorithm innovatively combines the tournament-based point selection sampling strategy, the information heuristic search process and the framework of Rapidly-exploring Random Tree (RRT) algorithm. Hence can guide the vehicle to the region of interest to scientists for sampling and generate a collision-free path for maximizing information collection by the HAUV under the constraints of environmental effects of currents or wind and limited budget. The simulation results show that the fast search adaptive sampling tree algorithm has higher optimization performance, faster solution speed and better stability than the Rapidly-exploring Information Gathering Tree (RIGT) algorithm and the particle swarm optimization (PSO) algorithm. △ Less

Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

arXiv:2203.17156 [pdf, other]

doi 10.1109/ICME52920.2022.9859703

Adaptive Mean-Residue Loss for Robust Facial Age Estimation

Authors: Ziyuan Zhao, Peisheng Qian, Yubo Hou, Zeng Zeng

Abstract: Automated facial age estimation has diverse real-world applications in multimedia analysis, e.g., video surveillance, and human-computer interaction. However, due to the randomness and ambiguity of the aging process, age assessment is challenging. Most research work over the topic regards the task as one of age regression, classification, and ranking problems, and cannot well leverage age distribu… ▽ More Automated facial age estimation has diverse real-world applications in multimedia analysis, e.g., video surveillance, and human-computer interaction. However, due to the randomness and ambiguity of the aging process, age assessment is challenging. Most research work over the topic regards the task as one of age regression, classification, and ranking problems, and cannot well leverage age distribution in representing labels with age ambiguity. In this work, we propose a simple yet effective loss function for robust facial age estimation via distribution learning, i.e., adaptive mean-residue loss, in which, the mean loss penalizes the difference between the estimated age distribution's mean and the ground-truth age, whereas the residue loss penalizes the entropy of age probability out of dynamic top-K in the distribution. Experimental results in the datasets FG-NET and CLAP2016 have validated the effectiveness of the proposed loss. Our code is available at https://github.com/jacobzhaoziyuan/AMR-Loss. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022)

Journal ref: 2022 IEEE International Conference on Multimedia and Expo (ICME)

arXiv:2203.12454 [pdf, other]

doi 10.1007/978-3-030-87193-2_28

MT-UDA: Towards Unsupervised Cross-modality Medical Image Segmentation with Limited Source Labels

Authors: Ziyuan Zhao, Kaixin Xu, Shumeng Li, Zeng Zeng, Cuntai Guan

Abstract: The success of deep convolutional neural networks (DCNNs) benefits from high volumes of annotated data. However, annotating medical images is laborious, expensive, and requires human expertise, which induces the label scarcity problem. Especially when encountering the domain shift, the problem becomes more serious. Although deep unsupervised domain adaptation (UDA) can leverage well-established so… ▽ More The success of deep convolutional neural networks (DCNNs) benefits from high volumes of annotated data. However, annotating medical images is laborious, expensive, and requires human expertise, which induces the label scarcity problem. Especially when encountering the domain shift, the problem becomes more serious. Although deep unsupervised domain adaptation (UDA) can leverage well-established source domain annotations and abundant target domain data to facilitate cross-modality image segmentation and also mitigate the label paucity problem on the target domain, the conventional UDA methods suffer from severe performance degradation when source domain annotations are scarce. In this paper, we explore a challenging UDA setting - limited source domain annotations. We aim to investigate how to efficiently leverage unlabeled data from the source and target domains with limited source annotations for cross-modality image segmentation. To achieve this, we propose a new label-efficient UDA framework, termed MT-UDA, in which the student model trained with limited source labels learns from unlabeled data of both domains by two teacher models respectively in a semi-supervised manner. More specifically, the student model not only distills the intra-domain semantic knowledge by encouraging prediction consistency but also exploits the inter-domain anatomical information by enforcing structural consistency. Consequently, the student model can effectively integrate the underlying knowledge beneath available data resources to mitigate the impact of source label scarcity and yield improved cross-modality segmentation performance. We evaluate our method on MM-WHS 2017 dataset and demonstrate that our approach outperforms the state-of-the-art methods by a large margin under the source-label scarcity scenario. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Accept by MICCAI 2021, code at: https://github.com/jacobzhaoziyuan/MT-UDA

Journal ref: Medical Image Computing and Computer Assisted Intervention, MICCAI 2021. Lecture Notes in Computer Science, vol 12901. Springer, Cham

arXiv:2201.01895 [pdf, ps, other]

Event-based EV Charging Scheduling in A Microgrid of Buildings

Authors: Qilong Huang, Li Yang, Chen Hou, Zhiyong Zeng, Yaowen Qi

Abstract: With the popularization of the electric vehicles (EVs), EV charging demand is becoming an important load in the building. Considering the mobility of EVs from building to building and their uncertain charging demand, it is of great practical interest to control the EV charging process in a microgrid of buildings to optimize the total operation cost while ensuring the transmission safety between th… ▽ More With the popularization of the electric vehicles (EVs), EV charging demand is becoming an important load in the building. Considering the mobility of EVs from building to building and their uncertain charging demand, it is of great practical interest to control the EV charging process in a microgrid of buildings to optimize the total operation cost while ensuring the transmission safety between the microgrid and the main grid. We consider this important problem in this paper and make the following contributions. First, we formulate this problem as a Markov decision process to capture the uncertain supply and EV charging demand in the microgrid of buildings. Besides reducing the total operation cost of buildings, the model also considers the power exchange limitation to ensure transmission safety. Second, this model is reformulated under event-based optimization framework to alleviate the impact of large state and action space. By appropriately defining the event and event-based action, the EV charging process can be optimized by searching a randomized parametric event-based control policy in the microgrid controller and implementing a selecting-to-charging rule in each building controller. Third, a constrained gradient-based policy optimzation method with adjusting mechanism is proposed to iteratively find the optimal event-based control policy for EV charging demand in each building. Numerical experiments considering a microgrid of three buildings are conducted to analyze the structure and the performance of the event-based control policy for EV charging. △ Less

Submitted 5 September, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

arXiv:2111.05133 [pdf, other]

Approaching the Limit of Image Rescaling via Flow Guidance

Authors: Shang Li, Guixuan Zhang, Zhengxiong Luo, Jie Liu, Zhi Zeng, Shuwu Zhang

Abstract: Image downscaling and upscaling are two basic rescaling operations. Once the image is downscaled, it is difficult to be reconstructed via upscaling due to the loss of information. To make these two processes more compatible and improve the reconstruction performance, some efforts model them as a joint encoding-decoding task, with the constraint that the downscaled (i.e. encoded) low-resolution (LR… ▽ More Image downscaling and upscaling are two basic rescaling operations. Once the image is downscaled, it is difficult to be reconstructed via upscaling due to the loss of information. To make these two processes more compatible and improve the reconstruction performance, some efforts model them as a joint encoding-decoding task, with the constraint that the downscaled (i.e. encoded) low-resolution (LR) image must preserve the original visual appearance. To implement this constraint, most methods guide the downscaling module by supervising it with the bicubically downscaled LR version of the original high-resolution (HR) image. However, this bicubic LR guidance may be suboptimal for the subsequent upscaling (i.e. decoding) and restrict the final reconstruction performance. In this paper, instead of directly applying the LR guidance, we propose an additional invertible flow guidance module (FGM), which can transform the downscaled representation to the visually plausible image during downscaling and transform it back during upscaling. Benefiting from the invertibility of FGM, the downscaled representation could get rid of the LR guidance and would not disturb the downscaling-upscaling process. It allows us to remove the restrictions on the downscaling module and optimize the downscaling and upscaling modules in an end-to-end manner. In this way, these two modules could cooperate to maximize the HR reconstruction performance. Extensive experiments demonstrate that the proposed method can achieve state-of-the-art (SotA) performance on both downscaled and reconstructed images. △ Less

Submitted 8 January, 2023; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: BMVC 2021

arXiv:2110.04084 [pdf, other]

doi 10.3390/photonics9120940

DeepGOMIMO: Deep Learning-Aided Generalized Optical MIMO with CSI-Free Blind Detection

Authors: Xin Zhong, Chen Chen, Shu Fu, Zhihong Zeng, Min Liu

Abstract: Generalized optical multiple-input multiple-output (GOMIMO) techniques have been recently shown to be promising for high-speed optical wireless communication (OWC) systems. In this paper, we propose a novel deep learning-aided GOMIMO (DeepGOMIMO) framework for GOMIMO systems, where channel state information (CSI)-free blind detection can be enabled by employing a specially designed deep neural net… ▽ More Generalized optical multiple-input multiple-output (GOMIMO) techniques have been recently shown to be promising for high-speed optical wireless communication (OWC) systems. In this paper, we propose a novel deep learning-aided GOMIMO (DeepGOMIMO) framework for GOMIMO systems, where channel state information (CSI)-free blind detection can be enabled by employing a specially designed deep neural network (DNN)-based MIMO detector. The CSI-free blind DNN detector mainly consists of two modules: one is the pre-processing module which is designed to address both the path loss and channel crosstalk issues caused by MIMO transmission, and the other is the feed-forward DNN module which is used for joint detection of spatial and constellation information by learning the statistics of both the input signal and the additive noise. Our simulation results clearly verify that, in a typical indoor 4 $\times$ 4 MIMO-OWC system using both generalized optical spatial modulation (GOSM) and generalized optical spatial multiplexing (GOSMP) with unipolar non-zero 4-ary pulse amplitude modulation (4-PAM) modulation, the proposed CSI-free blind DNN detector achieves near the same bit error rate (BER) performance as the optimal joint maximum-likelihood (ML) detector, but with much reduced computational complexity. Moreover, since the CSI-free blind DNN detector does not require instantaneous channel estimation to obtain accurate CSI, it enjoys the unique advantages of improved achievable data rate and reduced communication time delay in comparison to the CSI-based zero-forcing DNN (ZF-DNN) detector. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2108.06763 [pdf, other]

doi 10.1109/EMBC46164.2021.9630812

Two Eyes Are Better Than One: Exploiting Binocular Correlation for Diabetic Retinopathy Severity Grading

Authors: Peisheng Qian, Ziyuan Zhao, Cong Chen, Zeng Zeng, Xiaoli Li

Abstract: Diabetic retinopathy (DR) is one of the most common eye conditions among diabetic patients. However, vision loss occurs primarily in the late stages of DR, and the symptoms of visual impairment, ranging from mild to severe, can vary greatly, adding to the burden of diagnosis and treatment in clinical practice. Deep learning methods based on retinal images have achieved remarkable success in automa… ▽ More Diabetic retinopathy (DR) is one of the most common eye conditions among diabetic patients. However, vision loss occurs primarily in the late stages of DR, and the symptoms of visual impairment, ranging from mild to severe, can vary greatly, adding to the burden of diagnosis and treatment in clinical practice. Deep learning methods based on retinal images have achieved remarkable success in automatic DR grading, but most of them neglect that the presence of diabetes usually affects both eyes, and ophthalmologists usually compare both eyes concurrently for DR diagnosis, leaving correlations between left and right eyes unexploited. In this study, simulating the diagnostic process, we propose a two-stream binocular network to capture the subtle correlations between left and right eyes, in which, paired images of eyes are fed into two identical subnetworks separately during training. We design a contrastive grading loss to learn binocular correlation for five-class DR detection, which maximizes inter-class dissimilarity while minimizing the intra-class difference. Experimental results on the EyePACS dataset show the superiority of the proposed binocular model, outperforming monocular methods by a large margin. △ Less

Submitted 15 August, 2021; originally announced August 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021

Journal ref: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2108.06761 [pdf, other]

doi 10.1109/EMBC46164.2021.9629698

Multi-Slice Dense-Sparse Learning for Efficient Liver and Tumor Segmentation

Authors: Ziyuan Zhao, Zeyu Ma, Yanjie Liu, Zeng Zeng, Pierce KH Chow

Abstract: Accurate automatic liver and tumor segmentation plays a vital role in treatment planning and disease monitoring. Recently, deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation. However, 2D DCNNs cannot fully leverage the inter-slice information, while 3D DCNNs are computationally expensive and memory intensive. To address these issues, w… ▽ More Accurate automatic liver and tumor segmentation plays a vital role in treatment planning and disease monitoring. Recently, deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation. However, 2D DCNNs cannot fully leverage the inter-slice information, while 3D DCNNs are computationally expensive and memory intensive. To address these issues, we first propose a novel dense-sparse training flow from a data perspective, in which, densely adjacent slices and sparsely adjacent slices are extracted as inputs for regularizing DCNNs, thereby improving the model performance. Moreover, we design a 2.5D light-weight nnU-Net from a network perspective, in which, depthwise separable convolutions are adopted to improve the efficiency. Extensive experiments on the LiTS dataset have demonstrated the superiority of the proposed method. △ Less

Submitted 15 August, 2021; originally announced August 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021

Journal ref: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2108.06086 [pdf, ps, other]

A VCSEL Array Transmission System with Novel Beam Activation Mechanisms

Authors: Zhihong Zeng, Mohammad Dehghani Soltani, Majid Safari, Harald Haas

Abstract: Optical wireless communication (OWC) is considered to be a promising technology which will alleviate traffic burden caused by the increasing number of mobile devices. In this study, a novel vertical-cavity surface-emitting laser (VCSEL) array is proposed for indoor OWC systems. To activate the best beam for a mobile user, two beam activation methods are proposed for the system. The method based on… ▽ More Optical wireless communication (OWC) is considered to be a promising technology which will alleviate traffic burden caused by the increasing number of mobile devices. In this study, a novel vertical-cavity surface-emitting laser (VCSEL) array is proposed for indoor OWC systems. To activate the best beam for a mobile user, two beam activation methods are proposed for the system. The method based on a corner-cube retroreflector (CCR) provides very low latency and allows real-time activation for high-speed users. The other method uses the omnidirectional transmitter (ODTx). The ODTx can serve the purpose of uplink transmission and beam activation simultaneously. Moreover, systems with ODTx are very robust to the random orientation of a user equipment (UE). System level analyses are carried out for the proposed VCSEL array system. For a single user scenario, the probability density function (PDF) of the signal-to-noise ratio (SNR) for the central beam of the VCSEL array system can be approximated as a uniform distribution. In addition, the average data rate of the central beam and its upper bound are given analytically and verified by Monte-Carlo simulations. For a multi-user scenario, an analytical upper bound for the average data rate is given. The effects of the cell size and the full width at half maximum (FWHM) angle on the system performance are studied. The results show that the system with a FWHM angle of $4^\circ$ outperforms the others. △ Less

Submitted 13 August, 2021; originally announced August 2021.

Comments: 30 pages, 15 figures, journal

arXiv:2108.06025 [pdf, ps, other]

Interference Mitigation using Optimized Angle Diversity Receiver in LiFi Cellular Network

Authors: Zhihong Zeng, Chen Chen, Svetislav Savovi, Mohammad Dehghani Soltani, Cheng Chen, Majid Safari, Harald Haas

Abstract: Light-fidelity (LiFi) is an emerging technology for high-speed short-range mobile communications. Inter-cell interference (ICI) is an important issue that limits the system performance in an optical attocell network. Angle diversity receivers (ADRs) have been proposed to mitigate ICI. In this paper, the structure of pyramid receivers (PRs) and truncated pyramid receivers (TPRs) are studied. The co… ▽ More Light-fidelity (LiFi) is an emerging technology for high-speed short-range mobile communications. Inter-cell interference (ICI) is an important issue that limits the system performance in an optical attocell network. Angle diversity receivers (ADRs) have been proposed to mitigate ICI. In this paper, the structure of pyramid receivers (PRs) and truncated pyramid receivers (TPRs) are studied. The coverage problems of PRs and TPRs are defined and investigated, and the lower bound of field of view (FOV) for each PD is given analytically. The impact of random device orientation and diffuse link signal propagation are taken into consideration. The performances of PRs and TPRs are compared and then optimized ADR structures are proposed. The performance comparison between the select best combining (SBC) and maximum ratio combining (MRC) is given under different noise levels. It is shown that SBC will outperform MRC in an interference limited system, otherwise, MRC is a preferred scheme. In addition, the double source system, where each LiFi AP consists of two sources transmitting the same information signals but with opposite polarity, is proved to outperform the single source (SS) system under certain conditions. △ Less

Submitted 12 June, 2024; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: 15 pages, 16 figures, journal

arXiv:2106.10401 [pdf]

Parallel frequency function-deep neural network for efficient complex broadband signal approximation

Authors: Zhi Zeng, Pengpeng Shi, Fulei Ma, Peihan Qi

Abstract: A neural network is essentially a high-dimensional complex map** model by adjusting network weights for feature fitting. However, the spectral bias in network training leads to unbearable training epochs for fitting the high-frequency components in broadband signals. To improve the fitting efficiency of high-frequency components, the PhaseDNN was proposed recently by combining complex frequency… ▽ More A neural network is essentially a high-dimensional complex map** model by adjusting network weights for feature fitting. However, the spectral bias in network training leads to unbearable training epochs for fitting the high-frequency components in broadband signals. To improve the fitting efficiency of high-frequency components, the PhaseDNN was proposed recently by combining complex frequency band extraction and frequency shift techniques [Cai et al. SIAM J. SCI. COMPUT. 42, A3285 (2020)]. Our paper is devoted to an alternative candidate for fitting complex signals with high-frequency components. Here, a parallel frequency function-deep neural network (PFF-DNN) is proposed to suppress computational overhead while ensuring fitting accuracy by utilizing fast Fourier analysis of broadband signals and the spectral bias nature of neural networks. The effectiveness and efficiency of the proposed PFF-DNN method are verified based on detailed numerical experiments for six typical broadband signals. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2105.10369 [pdf, other]

doi 10.1109/EMBC46164.2021.9629941

Hierarchical Consistency Regularized Mean Teacher for Semi-supervised 3D Left Atrium Segmentation

Authors: Shumeng Li, Ziyuan Zhao, Kaixin Xu, Zeng Zeng, Cuntai Guan

Abstract: Deep learning has achieved promising segmentation performance on 3D left atrium MR images. However, annotations for segmentation tasks are expensive, costly and difficult to obtain. In this paper, we introduce a novel hierarchical consistency regularized mean teacher framework for 3D left atrium segmentation. In each iteration, the student model is optimized by multi-scale deep supervision and hie… ▽ More Deep learning has achieved promising segmentation performance on 3D left atrium MR images. However, annotations for segmentation tasks are expensive, costly and difficult to obtain. In this paper, we introduce a novel hierarchical consistency regularized mean teacher framework for 3D left atrium segmentation. In each iteration, the student model is optimized by multi-scale deep supervision and hierarchical consistency regularization, concurrently. Extensive experiments have shown that our method achieves competitive performance as compared with full annotation, outperforming other state-of-the-art semi-supervised segmentation methods. △ Less

Submitted 15 August, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021

Journal ref: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2104.05418 [pdf, other]

Contrastive Learning of Global-Local Video Representations

Authors: Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Abstract: Contrastive learning has delivered impressive results for various tasks in the self-supervised regime. However, existing approaches optimize for learning representations specific to downstream scenarios, i.e., \textit{global} representations suitable for tasks such as classification or \textit{local} representations for tasks such as detection and localization. While they produce satisfactory resu… ▽ More Contrastive learning has delivered impressive results for various tasks in the self-supervised regime. However, existing approaches optimize for learning representations specific to downstream scenarios, i.e., \textit{global} representations suitable for tasks such as classification or \textit{local} representations for tasks such as detection and localization. While they produce satisfactory results in the intended downstream scenarios, they often fail to generalize to tasks that they were not originally designed for. In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e.g., classification) and the tasks that require local fine-grained spatio-temporal information (e.g., localization). We achieve this by optimizing two contrastive objectives that together encourage our model to learn global-local visual information given audio signals. We show that the two objectives mutually improve the generalizability of the learned global-local representations, significantly outperforming their disjointly learned counterparts. We demonstrate our approach on various tasks including action/sound classification, lip reading, deepfake detection, event and sound localization (https://github.com/yunyikristy/global\_local). △ Less

Submitted 27 October, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2102.10815 [pdf, other]

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, **g Xiao

Abstract: In this paper, we propose a novel conditional convolution network, named location-variable convolution, to model the dependencies of the waveform sequence. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveform, the location-variable convolution uses convolution kernels with different coefficients to perform convolution operations on diff… ▽ More In this paper, we propose a novel conditional convolution network, named location-variable convolution, to model the dependencies of the waveform sequence. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveform, the location-variable convolution uses convolution kernels with different coefficients to perform convolution operations on different waveform intervals, where the coefficients of kernels is predicted according to conditioning acoustic features, such as Mel-spectrograms. Based on location-variable convolutions, we design LVCNet for waveform generation, and apply it in Parallel WaveGAN to design more efficient vocoder. Experiments on the LJSpeech dataset show that our proposed model achieves a four-fold increase in synthesis speed compared to the original Parallel WaveGAN without any degradation in sound quality, which verifies the effectiveness of location-variable convolutions. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2012.01684

arXiv:2101.09057 [pdf, other]

doi 10.1109/JBHI.2021.3052320

DSAL: Deeply Supervised Active Learning from Strong and Weak Labelers for Biomedical Image Segmentation

Authors: Ziyuan Zhao, Zeng Zeng, Kaixin Xu, Cen Chen, Cuntai Guan

Abstract: Image segmentation is one of the most essential biomedical image processing problems for different imaging modalities, including microscopy and X-ray in the Internet-of-Medical-Things (IoMT) domain. However, annotating biomedical images is knowledge-driven, time-consuming, and labor-intensive, making it difficult to obtain abundant labels with limited costs. Active learning strategies come into ea… ▽ More Image segmentation is one of the most essential biomedical image processing problems for different imaging modalities, including microscopy and X-ray in the Internet-of-Medical-Things (IoMT) domain. However, annotating biomedical images is knowledge-driven, time-consuming, and labor-intensive, making it difficult to obtain abundant labels with limited costs. Active learning strategies come into ease the burden of human annotation, which queries only a subset of training data for annotation. Despite receiving attention, most of active learning methods generally still require huge computational costs and utilize unlabeled data inefficiently. They also tend to ignore the intermediate knowledge within networks. In this work, we propose a deep active semi-supervised learning framework, DSAL, combining active learning and semi-supervised learning strategies. In DSAL, a new criterion based on deep supervision mechanism is proposed to select informative samples with high uncertainties and low uncertainties for strong labelers and weak labelers respectively. The internal criterion leverages the disagreement of intermediate features within the deep learning network for active sample selection, which subsequently reduces the computational costs. We use the proposed criteria to select samples for strong and weak labelers to produce oracle labels and pseudo labels simultaneously at each active learning iteration in an ensemble learning manner, which can be examined with IoMT Platform. Extensive experiments on multiple medical image datasets demonstrate the superiority of the proposed method over state-of-the-art active learning methods. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: Published as a journal paper at IEEE J-BHI

arXiv:2012.02626 [pdf, other]

GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis

Authors: Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, **g Xiao

Abstract: This paper introduces a graphical representation approach of prosody boundary (GraphPB) in the task of Chinese speech synthesis, intending to parse the semantic and syntactic relationship of input sequences in a graphical domain for improving the prosody performance. The nodes of the graph embedding are formed by prosodic words, and the edges are formed by the other prosodic boundaries, namely pro… ▽ More This paper introduces a graphical representation approach of prosody boundary (GraphPB) in the task of Chinese speech synthesis, intending to parse the semantic and syntactic relationship of input sequences in a graphical domain for improving the prosody performance. The nodes of the graph embedding are formed by prosodic words, and the edges are formed by the other prosodic boundaries, namely prosodic phrase boundary (PPH) and intonation phrase boundary (IPH). Different Graph Neural Networks (GNN) like Gated Graph Neural Network (GGNN) and Graph Long Short-term Memory (G-LSTM) are utilised as graph encoders to exploit the graphical prosody boundary information. Graph-to-sequence model is proposed and formed by a graph encoder and an attentional decoder. Two techniques are proposed to embed sequential information into the graph-to-sequence text-to-speech model. The experimental results show that this proposed approach can encode the phonetic and prosody rhythm of an utterance. The mean opinion score (MOS) of these GNN models shows comparative results with the state-of-the-art sequence-to-sequence models with better performance in the aspect of prosody. This provides an alternative approach for prosody modelling in end-to-end speech synthesis. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: Accepted to SLT 2021

arXiv:2012.01684 [pdf, other]

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, **g Xiao

Abstract: Recent neural vocoders usually use a WaveNet-like network to capture the long-term dependencies of the waveform, but a large number of parameters are required to obtain good modeling capabilities. In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms. Different from the use of unified convolution kernels in WaveNet to capture t… ▽ More Recent neural vocoders usually use a WaveNet-like network to capture the long-term dependencies of the waveform, but a large number of parameters are required to obtain good modeling capabilities. In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveforms, location-variable convolutions utilizes a kernel predictor to generate multiple sets of convolution kernels based on the mel-spectrum, where each set of convolution kernels is used to perform convolution operations on the associated waveform intervals. Combining WaveGlow and location-variable convolutions, an efficient vocoder, named MelGlow, is designed. Experiments on the LJSpeech dataset show that MelGlow achieves better performance than WaveGlow at small model sizes, which verifies the effectiveness and potential optimization space of location-variable convolutions. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: will be presented in SLT 2021

arXiv:2011.00101 [pdf, ps, other]

EEG-Based Brain-Computer Interfaces Are Vulnerable to Backdoor Attacks

Authors: Lubin Meng, Jian Huang, Zhigang Zeng, Xue Jiang, Shan Yu, Tzyy-** Jung, Chin-Teng Lin, Ricardo Chavarriaga, Dongrui Wu

Abstract: Research and development of electroencephalogram (EEG) based brain-computer interfaces (BCIs) have advanced rapidly, partly due to deeper understanding of the brain and wide adoption of sophisticated machine learning approaches for decoding the EEG signals. However, recent studies have shown that machine learning algorithms are vulnerable to adversarial attacks. This article proposes to use narrow… ▽ More Research and development of electroencephalogram (EEG) based brain-computer interfaces (BCIs) have advanced rapidly, partly due to deeper understanding of the brain and wide adoption of sophisticated machine learning approaches for decoding the EEG signals. However, recent studies have shown that machine learning algorithms are vulnerable to adversarial attacks. This article proposes to use narrow period pulse for poisoning attack of EEG-based BCIs, which is implementable in practice and has never been considered before. One can create dangerous backdoors in the machine learning model by injecting poisoning samples into the training set. Test samples with the backdoor key will then be classified into the target class specified by the attacker. What most distinguishes our approach from previous ones is that the backdoor key does not need to be synchronized with the EEG trials, making it very easy to implement. The effectiveness and robustness of the backdoor attack approach is demonstrated, highlighting a critical security concern for EEG-based BCIs and calling for urgent attention to address it. △ Less

Submitted 2 January, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

Journal ref: IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023

arXiv:2010.15344 [pdf, other]

doi 10.1109/ICIP40778.2020.9191345

Sea-Net: Squeeze-And-Excitation Attention Net For Diabetic Retinopathy Grading

Authors: Ziyuan Zhao, Kartik Chopra, Zeng Zeng, Xiaoli Li

Abstract: Diabetes is one of the most common disease in individuals. \textit{Diabetic retinopathy} (DR) is a complication of diabetes, which could lead to blindness. Automatic DR grading based on retinal images provides a great diagnostic and prognostic value for treatment planning. However, the subtle differences among severity levels make it difficult to capture important features using conventional metho… ▽ More Diabetes is one of the most common disease in individuals. \textit{Diabetic retinopathy} (DR) is a complication of diabetes, which could lead to blindness. Automatic DR grading based on retinal images provides a great diagnostic and prognostic value for treatment planning. However, the subtle differences among severity levels make it difficult to capture important features using conventional methods. To alleviate the problems, a new deep learning architecture for robust DR grading is proposed, referred to as SEA-Net, in which, spatial attention and channel attention are alternatively carried out and boosted with each other, improving the classification performance. In addition, a hybrid loss function is proposed to further maximize the inter-class distance and reduce the intra-class variability. Experimental results have shown the effectiveness of the proposed architecture. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: Accepted to ICIP 2020

Journal ref: 2020 IEEE International Conference on Image Processing (ICIP), pp. 2496-2500

arXiv:2008.05656 [pdf, other]

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, **g Xiao

Abstract: Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together. In this paper, a prosody learning mechanism is proposed to model the prosody of speech based on TTS system, where the prosody information of speech is ext… ▽ More Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together. In this paper, a prosody learning mechanism is proposed to model the prosody of speech based on TTS system, where the prosody information of speech is extracted from the melspectrum by a prosody learner and combined with the phoneme sequence to reconstruct the mel-spectrum. Meanwhile, the sematic features of text from the pre-trained language model is introduced to improve the prosody prediction results. In addition, a novel self-attention structure, named as local attention, is proposed to lift this restriction of input text length, where the relative position information of the sequence is modeled by the relative position matrices so that the position encodings is no longer needed. Experiments on English and Mandarin show that speech with more satisfactory prosody has obtained in our model. Especially in Mandarin synthesis, our proposed model outperforms baseline model with a MOS gap of 0.08, and the overall naturalness of the synthesized speech has been significantly improved. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: will be presented in INTERSPEECH 2020

arXiv:2007.13284 [pdf]

Research Progress of Convolutional Neural Network and its Application in Object Detection

Authors: Wei Zhang, Zuoxiang Zeng

Abstract: With the improvement of computer performance and the increase of data volume, the object detection based on convolutional neural network (CNN) has become the main algorithm for object detection. This paper summarizes the research progress of convolutional neural networks and their applications in object detection, and focuses on analyzing and discussing a specific idea and method of applying convo… ▽ More With the improvement of computer performance and the increase of data volume, the object detection based on convolutional neural network (CNN) has become the main algorithm for object detection. This paper summarizes the research progress of convolutional neural networks and their applications in object detection, and focuses on analyzing and discussing a specific idea and method of applying convolutional neural networks for object detection, pointing out the current deficiencies and future development direction. △ Less

Submitted 26 July, 2020; originally announced July 2020.

Comments: 11 pages, journal paper

ACM Class: I.2

arXiv:2007.03746 [pdf, ps, other]

doi 10.1016/j.neunet.2022.06.008

Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A Complete Pipeline

Authors: Dongrui Wu, Xue Jiang, Ruimin Peng, Wanzeng Kong, Jian Huang, Zhigang Zeng

Abstract: Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. While a closed-loop MI-based BCI system, after electroencephalogram (EEG) signal acquisition and temporal filtering, includes spatial filtering, feature engineering, and classification blocks before send… ▽ More Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. While a closed-loop MI-based BCI system, after electroencephalogram (EEG) signal acquisition and temporal filtering, includes spatial filtering, feature engineering, and classification blocks before sending out the control signal to an external device, previous approaches only considered TL in one or two such components. This paper proposes that TL could be considered in all three components (spatial filtering, feature engineering, and classification) of MI-based BCIs. Furthermore, it is also very important to specifically add a data alignment component before spatial filtering to make the data from different subjects more consistent, and hence to facilitate subsequential TL. Offline calibration experiments on two MI datasets verified our proposal. Especially, integrating data alignment and sophisticated TL approaches can significantly improve the classification performance, and hence greatly reduces the calibration effort. △ Less

Submitted 22 January, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

Journal ref: Neural Networks, 153:235-253, 2022

arXiv:2006.01045 [pdf, other]

A Hierarchical Deep Convolutional Neural Network and Gated Recurrent Unit Framework for Structural Damage Detection

Authors: Jianxi Yang, Likai Zhang, Cen Chen, Yangfan Li, Ren Li, Gui** Wang, Shixin Jiang, Zeng Zeng

Abstract: Structural damage detection has become an interdisciplinary area of interest for various engineering fields, while the available damage detection methods are being in the process of adapting machine learning concepts. Most machine learning based methods heavily depend on extracted ``hand-crafted" features that are manually selected in advance by domain experts and then, fixed. Recently, deep learn… ▽ More Structural damage detection has become an interdisciplinary area of interest for various engineering fields, while the available damage detection methods are being in the process of adapting machine learning concepts. Most machine learning based methods heavily depend on extracted ``hand-crafted" features that are manually selected in advance by domain experts and then, fixed. Recently, deep learning has demonstrated remarkable performance on traditional challenging tasks, such as image classification, object detection, etc., due to the powerful feature learning capabilities. This breakthrough has inspired researchers to explore deep learning techniques for structural damage detection problems. However, existing methods have considered either spatial relation (e.g., using convolutional neural network (CNN)) or temporal relation (e.g., using long short term memory network (LSTM)) only. In this work, we propose a novel Hierarchical CNN and Gated recurrent unit (GRU) framework to model both spatial and temporal relations, termed as HCG, for structural damage detection. Specifically, CNN is utilized to model the spatial relations and the short-term temporal dependencies among sensors, while the output features of CNN are fed into the GRU to learn the long-term temporal dependencies jointly. Extensive experiments on IASC-ASCE structural health monitoring benchmark and scale model of three-span continuous rigid frame bridge structure datasets have shown that our proposed HCG outperforms other existing methods for structural damage detection significantly. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: The work has been accepted by Information Sciences!

arXiv:2005.10407 [pdf, other]

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

Authors: Zhi** Zeng, Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

Abstract: In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the L… ▽ More In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures. △ Less

Submitted 28 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Showing 1–50 of 67 results for author: Zeng, Z