Search | arXiv e-print repository

CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

Authors: Tingwei Liu, Miao Zhang, Leiye Liu, Jialong Zhong, Shuyao Wang, Yongri Piao, Huchuan Lu

Abstract: Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion… ▽ More Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion model features poses a great challenge to prostate segmentation. In this paper, we proposed CriDiff, a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation. The CIS maximizes the use of multi-level features by efficiently harnessing the complementarity of high and low-level features. To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS: the Boundary Enhance Conditioner (BEC) and the Core Enhance Conditioner (CEC), which discriminatively model the image edge regions and non-edge regions, respectively. Moreover, the GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted in MICCAI 2024

arXiv:2406.11006 [pdf, other]

SPEAR: Receiver-to-Receiver Acoustic Neural War** Field

Authors: Yuhang He, Shitong Xu, Jia-Xing Zhong, Sangyun Shin, Niki Trigoni, Andrew Markham

Abstract: We present SPEAR, a continuous receiver-to-receiver acoustic neural war** field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by war** the spat… ▽ More We present SPEAR, a continuous receiver-to-receiver acoustic neural war** field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by war** the spatial acoustic effects from one reference receiver position to another target receiver position, so that the warped audio essentially accommodates all spatial acoustic effects belonging to the target position. SPEAR can be trained in a data much more readily accessible manner, in which we simply ask two robots to independently record spatial audio at different positions. We further theoretically prove the universal existence of the war** field if and only if one audio source presents. Three physical principles are incorporated to guide SPEAR network design, leading to the learned war** field physically meaningful. We demonstrate SPEAR superiority on both synthetic, photo-realistic and real-world dataset, showing the huge potential of SPEAR to various down-stream robotic tasks. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures in main paper

arXiv:2405.15519 [pdf]

Confocal structured illumination microscopy

Authors: Weishuai Zhou, Manhong Yao, Xi Lin, Quan Yu, Junzheng Peng, **gang Zhong

Abstract: Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t… ▽ More Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce the concept of confocal imaging into OS-SIM and propose confocal structured illumination microscopy (CSIM) to enhance the imaging performance of OS-SIM. CSIM exploits the principle of dual photography to reconstruct a dual image from each pixel of the camera. The reconstructed dual image is equivalent to the image obtained by using the spatial light modulator (SLM) as a virtual camera, enabling the separation of the conjugate and non-conjugate signals recorded by the camera pixel. We can reject the non-conjugate signals by extracting the conjugate signal from each dual image to reconstruct a confocal image when establishing the conjugate relationship between the camera and the SLM. We have constructed the theoretical framework of CSIM. Optical-sectioning experimental results demonstrate that CSIM can reconstruct images with superior SNR and greater imaging depth compared with existing OS-SIM. CSIM is expected to expand the application scope of OS-SIM. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.09192 [pdf, other]

Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, **zuomu Zhong, Benlai Tang

Abstract: Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues to speech. The research community has shown growing interest in the study of the frontend component, recognizing its pivotal role in text-to-speech systems, including Text Normalization (TN), Prosody Boundary Prediction (PBP), and Polyphone Disambiguation (PD). Nonetheless, the limitations posed by insufficient annotated textual data and the reliance on homogeneous text signals significantly undermine the effectiveness of its supervised learning. To evade this obstacle, a novel two-stage TTS frontend prediction pipeline, named TAP-FM, is proposed in this paper. Specifically, during the first learning phase, we present a Multi-scale Contrastive Text-audio Pre-training protocol (MC-TAP), which hammers at acquiring richer insights via multi-granularity contrastive pre-training in an unsupervised manner. Instead of mining homogeneous features in prior pre-training approaches, our framework demonstrates the ability to delve deep into both global and local text-audio semantic and acoustic representations. Furthermore, a parallelized TTS frontend model is delicately devised to execute TN, PD, and PBP prediction tasks, respectively in the second stage. Finally, extensive experiments illustrate the superiority of our proposed method, achieving state-of-the-art performance. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2403.01257 [pdf]

Secure and Scalable Network Slicing with Plug-and-Play Support for Power Distribution System Communication Networks

Authors: Jian Zhong, Chen Chen, Yuqi Qian, Yiheng Bian, Yuxiong Huang, Zhaohong Bie

Abstract: With the rapid development of power distribution systems (PDSs), the number of terminal devices and the types of delivered services involved are constantly growing. These trends make the operations of PDSs highly dependent on the support of advanced communication networks, which face two related challenges. The first is to provide sufficient flexibility, resilience, and security to meet varying de… ▽ More With the rapid development of power distribution systems (PDSs), the number of terminal devices and the types of delivered services involved are constantly growing. These trends make the operations of PDSs highly dependent on the support of advanced communication networks, which face two related challenges. The first is to provide sufficient flexibility, resilience, and security to meet varying demands and ensure the proper operation of gradually diversifying network services. The second is to realize the automatic identification of terminal devices, thus reducing the network maintenance burden. To solve these problems, this paper presents a novel multiservice network integration and device authentication slice-based network slicing scheme. In this scheme, the integration of PDS communication networks enables network resource sharing, and recovery from communication interruption is achieved through network slicing in the integrated network. Authentication servers periodically poll terminal devices, adjusting network slice ranges based on authentication results, thereby facilitating dynamic network slicing. Additionally, secure plug-and-play support for PDS terminal devices and network protection are achieved through device identification and dynamic adjustment of network slices. On this basis, a network optimization and upgrading methodology for load balancing and robustness enhancement is further proposed. This approach is designed to improve the performance of PDS communication networks, adapting to ongoing PDS development and the evolution of PDS services. The simulation results show that the proposed schemes endow a PDS communication network with favorable resource utilization, fault recovery, terminal device plug-and-play support, load balancing, and improved network robustness. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.01256 [pdf]

Resilient Microgrid Formation Considering Communication Interruptions

Authors: Jian Zhong, Chen Chen, Young-** Kim, Yuxiong Huang, Mengjie Teng, Yiheng Bian, Zhaohong Bie

Abstract: Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By us… ▽ More Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By using the known DS information, the operating states of unobservable DS branches and buses can be inferred, providing complete information for DS performance restoration before full communication recovery △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.01253 [pdf]

Strategic SDN-based Microgrid Formation for Managing Communication Failures in Distribution System Restoration

Authors: Jian Zhong, Chen Chen, Zhaohong Bie, Mohammad Shahidehpour

Abstract: Grid modernization has increased the reliance of power networks on cyber networks within distribution systems (DSs), heightening their vulnerability to disasters. Communication network failures significantly impede DS load recovery by diminishing observation and control. Prior research has largely ignored the need for integrated recovery of DS power and cyber networks' centralized control. Indeed,… ▽ More Grid modernization has increased the reliance of power networks on cyber networks within distribution systems (DSs), heightening their vulnerability to disasters. Communication network failures significantly impede DS load recovery by diminishing observation and control. Prior research has largely ignored the need for integrated recovery of DS power and cyber networks' centralized control. Indeed, communication network restoration is critical for speedy load recovery through DS automation based microgrid formation. This paper exploits the data routing capabilities of software-defined networking (SDN) to enhance centralized control recovery in DS communication networks, incorporating it into a comprehensive DS restoration model. This model, tailored to the control requirements of load restoration, strategically allocates limited communication resources to re-establish connections between the operation center and terminal devices. Subsequently, DS automation is employed to orchestrate DS microgrid formation for power resupply. Additionally, we introduce a cyclic algorithm designed to optimize the load recovery via a multi-step, cooperative process. The efficacy of the proposed method is demonstrated on IEEE 33-node and IEEE 123-node test feeders. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.01250 [pdf]

Resilient Mobile Energy Storage Resources Based Distribution Network Restoration in Interdependent Power-Transportation-Information Networks

Authors: Jian Zhong, Chen Chen, Qiming Yang, Dafu Liu, Wentao Shen, Chenlin Ji, Zhaohong Bie

Abstract: The interactions between power, transportation, and information networks (PTIN), are becoming more profound with the advent of smart city technologies. Existing mobile energy storage resource (MESR)-based power distribution network (PDN) restoration schemes often neglect the interdependencies among PTIN, thus, efficient PDN restoration cannot be achieved. This paper outlines the interacting factor… ▽ More The interactions between power, transportation, and information networks (PTIN), are becoming more profound with the advent of smart city technologies. Existing mobile energy storage resource (MESR)-based power distribution network (PDN) restoration schemes often neglect the interdependencies among PTIN, thus, efficient PDN restoration cannot be achieved. This paper outlines the interacting factors of power supply demand, traffic operation efficiency, communication coverage, electric vehicle (EV) deployment capability, and PDN controllability among PTIN and further develops a PTIN-interacting model to reflect the chained recovery effect of the MESR-based restoration process. On this basis, a two-stage PDN restoration scheme is proposed that utilizes three emergency resources, including EVs, mobile energy storage systems (MESSs), and unmanned aerial vehicles (UAVs), to restore the power supply and communication of PDNs. This scheme first improves the distribution automation function, EV deployment capability, and traffic operation efficiency by prioritizing the recovery of communication network (CN) and urban traffic network (UTN) loads. Then, EVs and MESSs are further scheduled to achieve a better PDN restoration effect with the support of the restored CNs and UTNs. Case studies on a PDN, CN, and UTN integrated test system are conducted to verify the effectiveness of the proposed scheme. The results show that the prioritized load recovery operation for CN and UTN facilities in this scheme greatly improves the PDN restoration effect. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2312.13523 [pdf]

doi 10.1002/mrm.29990

High-resolution myelin-water fraction and quantitative relaxation map** using 3D ViSTa-MR fingerprinting

Authors: Congyu Liao, Xiaozhi Cao, Siddharth Srinivasan Iyer, Sophie Schauman, Zihan Zhou, Xiaoqian Yan, Quan Chen, Zhitao Li, Nan Wang, Ting Gong, Zhe Wu, Hongjian He, Jianhui Zhong, Yang Yang, Adam Kerr, Kalanit Grill-Spector, Kawin Setsompop

Abstract: Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR… ▽ More Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MRF), to achieve high-fidelity whole-brain MWF and T1/T2/PD map** on a clinical 3T scanner. To achieve fast acquisition and memory-efficient reconstruction, the ViSTa-MRF sequence leverages an optimized 3D tiny-golden-angle-shuffling spiral-projection acquisition and joint spatial-temporal subspace reconstruction with optimized preconditioning algorithm. With the proposed ViSTa-MRF approach, high-fidelity direct MWF map** was achieved without a need for multi-compartment fitting that could introduce bias and/or noise from additional assumptions or priors. Results: The in-vivo results demonstrate the effectiveness of the proposed acquisition and reconstruction framework to provide fast multi-parametric map** with high SNR and good quality. The in-vivo results of 1mm- and 0.66mm-iso datasets indicate that the MWF values measured by the proposed method are consistent with standard ViSTa results that are 30x slower with lower SNR. Furthermore, we applied the proposed method to enable 5-minute whole-brain 1mm-iso assessment of MWF and T1/T2/PD map**s for infant brain development and for post-mortem brain samples. Conclusions: In this work, we have developed a 3D ViSTa-MRF technique that enables the acquisition of whole-brain MWF, quantitative T1, T2, and PD maps at 1mm and 0.66mm isotropic resolution in 5 and 15 minutes, respectively. This advancement allows for quantitative investigations of myelination changes in the brain. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 38 pages, 12 figures and 1 table

Journal ref: Magnetic Resonance in Medicine 2023

arXiv:2312.07818 [pdf]

Brain Computer Interface Technology for Future Battlefield

Authors: Guodong Xiong, Xinyan Ma, Wei Li, Jiaqi Cao, Jian Zhong, Yicong Su

Abstract: With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations. With the development of big data and various situational awareness technologies, while enhancing the breadth and depth of information, decision-making has also become more complex. The operation mode of existing unmanned equipment often require… ▽ More With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations. With the development of big data and various situational awareness technologies, while enhancing the breadth and depth of information, decision-making has also become more complex. The operation mode of existing unmanned equipment often requires complex manual input, which is not conducive to the battlefield environment. How to reduce the cognitive load of information exchange between soldiers and various unmanned equipment is an important issue in future intelligent warfare. This paper proposes a brain computer interface communication system for soldier combat, which takes into account the characteristics of soldier combat scenarios in design. The stimulation paradigm is combined with helmets, portable computers, and firearms, and brain computer interface technology is used to achieve fast, barrier free, and hands-free communication between humans and machines. Intelligent algorithms are combined to assist decision-making in fully perceiving and fusing situational information on the battlefield, and a large amount of data is processed quickly, understanding and integrating a large amount of data from human and machine networks, achieving real-time perception of battlefield information, making intelligent decisions, and achieving the effect of direct control of drone swarms and other equipment by the human brain to assist in soldier scenarios. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 4 pages, 1 figure

arXiv:2309.05423 [pdf, other]

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

Authors: **zuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, **g Guo, Benlai Tang, Fengjie Zhu

Abstract: In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silenc… ▽ More In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs to enhance prosodic information in latent representations. In the second stage, we build a multi-modal prosody annotator, comprising pretrained encoders, a text-speech fusing scheme, and a sequence classifier. Experiments on English prosodic boundaries demonstrate that our method achieves state-of-the-art (SOTA) performance with 0.72 and 0.93 f1 score for Prosodic Word and Prosodic Phrase boundary respectively, while bearing remarkable robustness to data scarcity. △ Less

Submitted 11 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.06288 [pdf, other]

Spatial Pathomics Toolkit for Quantitative Analysis of Podocyte Nuclei with Histology and Spatial Transcriptomics Data in Renal Pathology

Authors: Jiayuan Chen, Yu Wang, Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Yilin Liu, Jianyong Zhong, Agnes B. Fogo, Haichun Yang, Shilin Zhao, Yuankai Huo

Abstract: Podocytes, specialized epithelial cells that envelop the glomerular capillaries, play a pivotal role in maintaining renal health. The current description and quantification of features on pathology slides are limited, prompting the need for innovative solutions to comprehensively assess diverse phenotypic attributes within Whole Slide Images (WSIs). In particular, understanding the morphological c… ▽ More Podocytes, specialized epithelial cells that envelop the glomerular capillaries, play a pivotal role in maintaining renal health. The current description and quantification of features on pathology slides are limited, prompting the need for innovative solutions to comprehensively assess diverse phenotypic attributes within Whole Slide Images (WSIs). In particular, understanding the morphological characteristics of podocytes, terminally differentiated glomerular epithelial cells, is crucial for studying glomerular injury. This paper introduces the Spatial Pathomics Toolkit (SPT) and applies it to podocyte pathomics. The SPT consists of three main components: (1) instance object segmentation, enabling precise identification of podocyte nuclei; (2) pathomics feature generation, extracting a comprehensive array of quantitative features from the identified nuclei; and (3) robust statistical analyses, facilitating a comprehensive exploration of spatial relationships between morphological and spatial transcriptomics features.The SPT successfully extracted and analyzed morphological and textural features from podocyte nuclei, revealing a multitude of podocyte morphomic features through statistical analysis. Additionally, we demonstrated the SPT's ability to unravel spatial information inherent to podocyte distribution, shedding light on spatial patterns associated with glomerular injury. By disseminating the SPT, our goal is to provide the research community with a powerful and user-friendly resource that advances cellular spatial pathomics in renal pathology. The implementation and its complete source code of the toolkit are made openly accessible at https://github.com/hrlblab/spatial_pathomics. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.03420 [pdf]

A Safe DRL Method for Fast Solution of Real-Time Optimal Power Flow

Authors: Pengfei Wu, Chen Chen, Dexiang Lai, Jian Zhong

Abstract: High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) based method for fast solution of real-time optimal power flow (RT-OPF) problems. The proposed method… ▽ More High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) based method for fast solution of real-time optimal power flow (RT-OPF) problems. The proposed method considers the volatility of RESs and temporal constraints, and formulates the RT-OPF as a Constrained Markov Decision Process (CMDP). In the training process, the proposed method hybridizes the proximal policy optimization (PPO) and the primal-dual method. Instead of integrating the constraint violation penalty with the reward function, its actor gradients are estimated by a Lagrange advantage function which is derived from two critic systems based on economic reward and violation cost. The decoupling of reward and cost alleviates reward sparsity while improving critic approximation accuracy. Moreover, the introduction of Lagrange multipliers enables the agent to comprehend the trade-off between optimality and feasibility. Numerical tests are carried out and compared with penalty-based DRL methods on the IEEE 9-bus, 30-bus, and 118-bus test systems. The results show that the well-trained SDRL agent can significantly improve the computation efficiency while satisfying the security constraints and optimality requirements. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.01981 [pdf, other]

doi 10.1016/j.media.2023.103035

CartiMorph: a framework for automated knee articular cartilage morphometrics

Authors: Yongcheng Yao, Junru Zhong, Li** Zhang, Sheheryar Khan, Weitian Chen

Abstract: We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learnin… ▽ More We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learning models were trained and validated for tissue segmentation, template construction, and template-to-image registration. We established methods for surface-normal-based cartilage thickness map**, FCL estimation, and rule-based cartilage parcellation. Our cartilage thickness map showed less error in thin and peripheral regions. We evaluated the effectiveness of the adopted segmentation model by comparing the quantitative metrics obtained from model segmentation and those from manual segmentation. The root-mean-squared deviation of the FCL measurements was less than 8%, and strong correlations were observed for the mean thickness (Pearson's correlation coefficient $ρ\in [0.82,0.97]$), surface area ($ρ\in [0.82,0.98]$) and volume ($ρ\in [0.89,0.98]$) measurements. We compared our FCL measurements with those from a previous study and found that our measurements deviated less from the ground truths. We observed superior performance of the proposed rule-based cartilage parcellation method compared with the atlas-based approach. CartiMorph has the potential to promote imaging biomarkers discovery for knee osteoarthritis. △ Less

Submitted 20 November, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: This preprint is an proofread version of a paper published in Medical Image Analysis (2023), which can be found at https://doi.org/10.1016/j.media.2023.103035

arXiv:2306.15433 [pdf, other]

Recursive LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems to Save Computations and Memories

Authors: Hufei Zhu, Fuqin Deng, Yikui Zhai, Jiaming Zhong, Yanyang Liang

Abstract: Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the ha… ▽ More Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the hard decision (HD)-based ordered successive interference cancellation (OSIC) scheme, to draw the conclusion that the former is the extension of the latter. Finally, the recursive scheme for HD-OSIC with reduced complexity and memory saving is extended to propose the recursive scheme for LMMSE-ISIC, where the required computations and memories are reduced by computing the filtering bias and the estimate from the Hermitian inverse matrix and the symbol estimate vector, and updating the Hermitian inverse matrix and the symbol estimate vector efficiently. Assume N transmitters and M (no less than N) receivers in the MIMO system. Compared to the existing low-complexity LMMSE-ISIC scheme, the proposed recursive LMMSE-ISIC scheme requires no more than 1/6 computations and no more than 1/5 memory units. △ Less

Submitted 5 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2304.10891 [pdf, other]

Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

Authors: Juan Zhong, Zheng Liu, Xi Chen

Abstract: Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based… ▽ More Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.07607 [pdf, other]

LitCall: Learning Implicit Topology for CNN-based Aortic Landmark Localization

Authors: Zhangxing Bian, Jiayang Zhong, Yanglong Lu, Charles R. Hatt, Nicholas S. Burris

Abstract: Landmark detection is a critical component of the image processing pipeline for automated aortic size measurements. Given that the thoracic aorta has a relatively conserved topology across the population and that a human annotator with minimal training can estimate the location of unseen landmarks from limited examples, we proposed an auxiliary learning task to learn the implicit topology of aorti… ▽ More Landmark detection is a critical component of the image processing pipeline for automated aortic size measurements. Given that the thoracic aorta has a relatively conserved topology across the population and that a human annotator with minimal training can estimate the location of unseen landmarks from limited examples, we proposed an auxiliary learning task to learn the implicit topology of aortic landmarks through a CNN-based network. Specifically, we created a network to predict the location of missing landmarks from the visible ones by minimizing the Implicit Topology loss in an end-to-end manner. The proposed learning task can be easily adapted and combined with Unet-style backbones. To validate our method, we utilized a dataset consisting of 207 CTAs, labeling four landmarks on each aorta. Our method outperforms the state-of-the-art Unet-style architectures (ResUnet, UnetR) in terms of localization accuracy, with only a light (#params=0.4M) overhead. We also demonstrate our approach in two clinically meaningful applications: aortic sub-region division and automatic centerline generation. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: Accepted to Medical Imaging 2022: Image Processing

arXiv:2302.08660 [pdf, ps, other]

Improved Recursive Algorithms for V-BLAST to Save Computations and Memories

Authors: Hufei Zhu, Yanyang Liang, Fuqin Deng, Genquan Chen, Jiaming Zhong

Abstract: For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. This paper pr… ▽ More For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. This paper proposes Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, Improvement V uses another lemma to speed up the matrix inversion step by 167%. Then the formulas adopted in our Improvement V are applied to deduce Improvement VI for interference cancellation, which saves memories without sacrificing speed compared to Improvement II. In the existing algorithm with speed advantage, the proposed algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. Both proposed algorithms speed up the existing algorithm with speed advantage by 130%, while the proposed algorithm II achieves the speedup of 186% and saves about half memories, compared to the existing algorithm with memory saving. △ Less

Submitted 5 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2212.07023 [pdf]

doi 10.21037/qims-23-704

Unsupervised Domain Adaptation for Automated Knee Osteoarthritis Phenotype Classification

Authors: Junru Zhong, Yongcheng Yao, Donal G. Cahill, Fan Xiao, Siyue Li, Jack Lee, Kevin Ki-Wai Ho, Michael Tim-Yun Ong, James F. Griffith, Weitian Chen

Abstract: Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D t… ▽ More Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D turbo/fast spin-echo MR images from our institute (in 2020 and 2021) as the source and target datasets, respectively. For each patient, the degree of knee OA was initially graded according to the MRI Osteoarthritis Knee Score (MOAKS) before being converted to binary OA phenotype labels. The proposed UDA pipeline included (a) pre-processing, which involved automatic segmentation and region-of-interest crop**; (b) source classifier training, which involved pre-training phenotype classifiers on the source dataset; (c) target encoder adaptation, which involved unsupervised adaption of the source encoder to the target encoder and (d) target classifier validation, which involved statistical analysis of the target classification performance evaluated by the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity and accuracy. Additionally, a classifier was trained without UDA for comparison. Results: The target classifier trained with UDA achieved improved AUROC, sensitivity, specificity and accuracy for both knee OA phenotypes compared with the classifier trained without UDA. Conclusion: The proposed UDA approach improves the performance of automated knee OA phenotype classification for small target datasets by utilising a large, high-quality source dataset for training. The results successfully demonstrated the advantages of the UDA approach in classification on small datasets. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Junru Zhong and Yongcheng Yao share the same contribution. 17 pages, 4 figures, 4 tables

arXiv:2209.06434 [pdf, other]

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Authors: Qiaowei Ma, **ghui Zhong, Yitao Yang, Weiheng Liu, Ying Gao, Wing W. Y. Ng

Abstract: With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the pr… ▽ More With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187 for the ASVSpoof 2019 LA evaluation dataset, which outperforms the state-of-the-art systems. △ Less

Submitted 21 December, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 6 pages

arXiv:2108.05985 [pdf]

doi 10.1002/mrm.29194

Optimized multi-axis spiral projection MR fingerprinting with subspace reconstruction for rapid whole-brain high-isotropic-resolution quantitative imaging

Authors: Xiaozhi Cao, Congyu Liao, Siddharth Srinivasan Iyer, Zhixing Wang, Zihan Zhou, Erpeng Dai, Gilad Liberman, Zi**g Dong, Ting Gong, Hongjian He, Jianhui Zhong, Berkin Bilgic, Kawin Setsompop

Abstract: Purpose: To improve image quality and accelerate the acquisition of 3D MRF. Methods: Building on the multi-axis spiral-projection MRF technique, a subspace reconstruction with locally low rank (LLR) constraint and a modified spiral-projection spatiotemporal encoding scheme termed tiny-golden-angle-shuffling (TGAS) were implemented for rapid whole-brain high-resolution quantitative map**. The LLR… ▽ More Purpose: To improve image quality and accelerate the acquisition of 3D MRF. Methods: Building on the multi-axis spiral-projection MRF technique, a subspace reconstruction with locally low rank (LLR) constraint and a modified spiral-projection spatiotemporal encoding scheme termed tiny-golden-angle-shuffling (TGAS) were implemented for rapid whole-brain high-resolution quantitative map**. The LLR regularization parameter and the number of subspace bases were tuned using retrospective in-vivo data and simulated examinations, respectively. B0 inhomogeneity correction using multi-frequency interpolation was incorporated into the subspace reconstruction to further improve the image quality by mitigating blurring caused by off-resonance effect. Results: The proposed MRF acquisition and reconstruction framework can produce provide high quality 1-mm isotropic whole-brain quantitative maps in a total acquisition time of 1 minute 55 seconds, with higher-quality results than ones obtained from the previous approach in 6 minutes. The comparison of quantitative results indicates that neither the subspace reconstruction nor the TGAS trajectory induce bias for T1 and T2 map**. High quality whole-brain MRF data were also obtained at 0.66-mm isotropic resolution in 4 minutes using the proposed technique, where the increased resolution was shown to improve visualization of subtle brain structures. Conclusion: The proposed TGAS-SPI-MRF with optimized spiral-projection trajectory and subspace reconstruction can enable high-resolution quantitative map** with faster acquisition speed. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: 40 pages, 11 figures, 2 tables

Journal ref: Magnetic Resonance in Medicine, 2022

arXiv:2108.02317 [pdf]

Efficient Fourier single-pixel imaging with Gaussian random sampling

Authors: Ziheng Qiu, Xinyi Guo, Tianao Lu, Pan Qi, Zibang Zhang, **gang Zhong

Abstract: Fourier single-pixel imaging (FSI) is a branch of single-pixel imaging techniques. It uses Fourier basis patterns as structured patterns for spatial information acquisition in the Fourier domain. However, the spatial resolution of the image reconstructed by FSI mainly depends on the number of Fourier coefficients sampled. The reconstruction of a high-resolution image typically requires a number of… ▽ More Fourier single-pixel imaging (FSI) is a branch of single-pixel imaging techniques. It uses Fourier basis patterns as structured patterns for spatial information acquisition in the Fourier domain. However, the spatial resolution of the image reconstructed by FSI mainly depends on the number of Fourier coefficients sampled. The reconstruction of a high-resolution image typically requires a number of Fourier coefficients to be sampled, and therefore takes a long data acquisition time. Here we propose a new sampling strategy for FSI. It allows FSI to reconstruct a clear and sharp image with a reduced number of measurements. The core of the proposed sampling strategy is to perform a variable density sampling in the Fourier space and, more importantly, the density with respect to the importance of Fourier coefficients is subject to a one-dimensional Gaussian function. Combined with compressive sensing, the proposed sampling strategy enables better reconstruction quality than conventional sampling strategies, especially when the sampling ratio is low. We experimentally demonstrate compressive FSI combined with the proposed sampling strategy is able to reconstruct a sharp and clear image of 256-by-256 pixels with a sampling ratio of 10%. The proposed method enables fast single-pixel imaging and provides a new approach for efficient spatial information acquisition. △ Less

Submitted 28 June, 2021; originally announced August 2021.

arXiv:2107.14521 [pdf, other]

Model-based Synthetic Data-driven Learning (MOST-DL): Application in Single-shot T2 Map** with Severe Head Motion Using Overlap**-echo Acquisition

Authors: Qinqin Yang, Yanhong Lin, Jiechao Wang, Jianfeng Bao, Xiaoyin Wang, Lingceng Ma, Zihan Zhou, Qizhi Yang, Shuhui Cai, Hongjian He, Congbo Cai, Jiyang Dong, **gliang Cheng, Zhong Chen, Jianhui Zhong

Abstract: Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we prop… ▽ More Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we propose a framework for addressing the lack of training data in supervised learning scenarios, termed MOST-DL. A challenging application is demonstrated to verify the proposed framework and achieve motion-robust T2 map** using single-shot overlap**-echo acquisition. We decompose the process into two main steps: (1) calibrationless parallel reconstruction for ultra-fast pulse sequence and (2) intra-shot motion correction for T2 map**. To bridge the domain gap, realistic textures from a public database and various imperfection simulations were explored. The neural network was first trained with pure synthetic data and then evaluated with in vivo human brain. Both simulation and in vivo experiments show that the MOST-DL method significantly reduces ghosting and motion artifacts in T2 maps in the presence of unpredictable subject movement and has the potential to be applied to motion-prone patients in the clinic. △ Less

Submitted 29 May, 2022; v1 submitted 30 July, 2021; originally announced July 2021.

Comments: 15 pages, 13 figures

arXiv:2106.04624 [pdf, other]

SpeechBrain: A General-Purpose Speech Toolkit

Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Preprint

arXiv:2102.02772 [pdf]

Single-Shell NODDI Using Dictionary Learner Estimated Isotropic Volume Fraction

Authors: Abrar Faiyaz, Marvin Doyley, Giovanni Schifitto, Jianhui Zhong, Md Nasir Uddin

Abstract: Neurite orientation dispersion and density imaging (NODDI) enables the assessment of intracellular, extracellular and free water signals from multi-shell diffusion MRI data. It is an insightful approach to characterize brain tissue microstructure. Single-shell reconstruction for NODDI parameters has been discouraged in previous studies caused by failure when fitting, especially for the neurite den… ▽ More Neurite orientation dispersion and density imaging (NODDI) enables the assessment of intracellular, extracellular and free water signals from multi-shell diffusion MRI data. It is an insightful approach to characterize brain tissue microstructure. Single-shell reconstruction for NODDI parameters has been discouraged in previous studies caused by failure when fitting, especially for the neurite density index (NDI). Here, we investigated the possibility of creating robust NODDI parameter maps with single-shell data, using the isotropic volume fraction (fISO) as prior. Prior estimation was made independent of the NODDI model constraint using a dictionary learning approach. First, we used a stochastic sparse dictionary-based network (DictNet) in predicting fISO which is trained with data obtained from in vivo and simulated diffusion MRI data. In single-shell cases, the mean diffusivity (MD) and raw T2 signal with no diffusion weighting (S0) was incorporated in the dictionary for the fISO estimation. Then, the NODDI framework was used with the known fISO to estimate the NDI and orientation dispersion index (ODI). The fISO estimated by our model was compared with other fISO estimators in the simulation. Further, using both synthetic data simulation and human data collected on a 3T scanner, we compared the performance of our dictionary-based learning prior NODDI (DLpN) with the original NODDI for both single-shell and multi-shell data. Our results suggest that DLpN derived NDI and ODI parameters for single-shell protocols are comparable with original multi-shell NODDI, and protocol with b=2000 s/mm2 performs the best (error ~5% in white and grey matter). This may allow NODDI evaluation of studies on single-shell data by multi-shell scanning of two subjects for DictNet fISO training. △ Less

Submitted 4 April, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: 56 pages, 9 Figures, 2 Tables, Supplementary Document (attached)

arXiv:2101.03230

Generation of Traffic Flows in Multi-Agent Traffic Simulation with Agent Behavior Model based on Deep Reinforcement Learning

Authors: Junjie Zhong, Hiromitsu Hattori

Abstract: In multi-agent based traffic simulation, agents are always supposed to move following existing instructions, and mechanically and unnaturally imitate human behavior. The human drivers perform acceleration or deceleration irregularly all the time, which seems unnecessary in some conditions. For letting agents in traffic simulation behave more like humans and recognize other agents' behavior in comp… ▽ More In multi-agent based traffic simulation, agents are always supposed to move following existing instructions, and mechanically and unnaturally imitate human behavior. The human drivers perform acceleration or deceleration irregularly all the time, which seems unnecessary in some conditions. For letting agents in traffic simulation behave more like humans and recognize other agents' behavior in complex conditions, we propose a unified mechanism for agents learn to decide various accelerations by using deep reinforcement learning based on a combination of regenerated visual images revealing some notable features, and numerical vectors containing some important data such as instantaneous speed. By handling batches of sequential data, agents are enabled to recognize surrounding agents' behavior and decide their own acceleration. In addition, we can generate a traffic flow behaving diversely to simulate the real traffic flow by using an architecture of fully decentralized training and fully centralized execution without violating Markov assumptions. △ Less

Submitted 25 January, 2021; v1 submitted 26 December, 2020; originally announced January 2021.

Comments: Experiment data maybe wrong due to the method " Repeated and Partial Training". This method may not converge to the optimal policy

arXiv:2010.13154 [pdf, other]

Attention is All You Need in Speech Separation

Authors: Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong

Abstract: Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the SepFormer, a nov… ▽ More Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance. △ Less

Submitted 8 March, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: Accepted to ICASSP 2021

arXiv:2006.06186 [pdf, other]

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Authors: Xu Li, Na Li, **ghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Abstract: Recently adversarial attacks on automatic speaker verification (ASV) systems attracted widespread attention as they pose severe threats to ASV systems. However, methods to defend against such attacks are limited. Existing approaches mainly focus on retraining ASV systems with adversarial data augmentation. Also, countermeasure robustness against different attack settings are insufficiently investi… ▽ More Recently adversarial attacks on automatic speaker verification (ASV) systems attracted widespread attention as they pose severe threats to ASV systems. However, methods to defend against such attacks are limited. Existing approaches mainly focus on retraining ASV systems with adversarial data augmentation. Also, countermeasure robustness against different attack settings are insufficiently investigated. Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training. A VGG-like binary classification detector is introduced and demonstrated to be effective on detecting adversarial samples. To investigate detector robustness in a realistic defense scenario where unseen attack settings may exist, we analyze various kinds of unseen attack settings' impact and observe that the detector is robust (6.27\% EER_{det} degradation in the worst case) against unseen substitute ASV systems, but it has weak robustness (50.37\% EER_{det} degradation in the worst case) against unseen perturbation methods. The weak robustness against unseen perturbation methods shows a direction for develo** stronger countermeasures. △ Less

Submitted 7 August, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: accepted by Interspeech2020

arXiv:2004.04014 [pdf, other]

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Authors: Xu Li, **ghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Abstract: Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector spe… ▽ More Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector speaker verification system to improve the system's generalization ability. With the weight uncertainty modeling provided by BNNs, we expect the system could generalize better on the evaluation data and make verification decisions more accurately. Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data. Specifically, results show that the system could benefit from BNNs by a relative EER decrease of 2.66% and 2.32% respectively for short- and long-utterance in-domain evaluations. Additionally, the fusion of DNN x-vector and Bayesian x-vector systems could achieve further improvement. Moreover, experiments conducted by out-of-domain evaluations, e.g. models trained on Voxceleb1 while evaluated on NIST SRE10 core test, suggest that BNNs could bring a larger relative EER decrease of around 4.69%. △ Less

Submitted 8 April, 2020; originally announced April 2020.

Comments: Accepted by Speaker Odyssey 2020

arXiv:2003.08360 [pdf]

Compatible Learning for Deep Photonic Neural Network

Authors: Yong-Liang Xiao, Rongguang Liang, Jianxin Zhong, Xianyu Su, Zhisheng You

Abstract: Realization of deep learning with coherent optical field has attracted remarkably attentions presently, which benefits on the fact that optical matrix manipulation can be executed at speed of light with inherent parallel computation as well as low latency. Photonic neural network has a significant potential for prediction-oriented tasks. Yet, real-value Backpropagation behaves somewhat intractably… ▽ More Realization of deep learning with coherent optical field has attracted remarkably attentions presently, which benefits on the fact that optical matrix manipulation can be executed at speed of light with inherent parallel computation as well as low latency. Photonic neural network has a significant potential for prediction-oriented tasks. Yet, real-value Backpropagation behaves somewhat intractably for coherent photonic intelligent training. We develop a compatible learning protocol in complex space, of which nonlinear activation could be selected efficiently depending on the unveiled compatible condition. Compatibility indicates that matrix representation in complex space covers its real counterpart, which could enable a single channel mingled training in real and complex space as a unified model. The phase logical XOR gate with Mach-Zehnder interferometers and diffractive neural network with optical modulation mechanism, implementing intelligent weight learned from compatible learning, are presented to prove the availability. Compatible learning opens an envisaged window for deep photonic neural network. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: Original Manuscript

arXiv:2001.09239 [pdf, other]

Multi-task self-supervised learning for Robust Speech Recognition

Authors: Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require ma… ▽ More Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions. △ Less

Submitted 17 April, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: In Proc. of ICASSP 2020

arXiv:1912.03459 [pdf, other]

Pinning Stabilizer Design for Large-Scale Probabilistic Boolean Networks

Authors: Lin Lin, **de Cao, Jianquan Lu, Jie Zhong

Abstract: This paper investigates the stabilization of probabilistic Boolean networks (PBNs) via a novel pinning control strategy based on network structure. In a PBN, the evolution equation of each gene switches among a collection of candidate Boolean functions with probability distributions that govern the activation frequency of each Boolean function. Owing to the stochasticity, the uniform state feedbac… ▽ More This paper investigates the stabilization of probabilistic Boolean networks (PBNs) via a novel pinning control strategy based on network structure. In a PBN, the evolution equation of each gene switches among a collection of candidate Boolean functions with probability distributions that govern the activation frequency of each Boolean function. Owing to the stochasticity, the uniform state feedback controller, independent of switching signal, might be out of work, and in this case, the non-uniform state feedback controller is required. Subsequently, a criterion is derived to determine whether uniform controllers is applicable to achieve stabilization. It is worth pointing out that the pinning control designed in this paper is based on the network structure, which only requires local in-neighbors' information, rather than global information (state transition matrix). Moreover, this pinning control strategy reduces the computational complexity from $O(2^{2n})$ to $O(n2^α)$, and thus it has the ability to handle some large-scale networks, especially the networks with sparse connections. Finally, the mammalian cell-cycle encountering a mutated phenotype is modelled by a PBN to demonstrate the obtained results. △ Less

Submitted 23 October, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

arXiv:1912.02394 [pdf, other]

doi 10.1109/TAC.2021.3110165

Sensors Design for Large-Scale Boolean Networks via Pinning Observability

Authors: Shiyong Zhu, Jianquan Lu, Jie Zhong, Yang Liu, **de Cao

Abstract: In this paper, a set of sensors is constructed via the pinning observability approach with the help of observability criteria given in [1] and [2], in order to make the given Boolean network (BN) be observable. Given the assumption that system states can be accessible, an efficient pinning control scheme is developed to generate an observable BN by adjusting the network structure rather than just… ▽ More In this paper, a set of sensors is constructed via the pinning observability approach with the help of observability criteria given in [1] and [2], in order to make the given Boolean network (BN) be observable. Given the assumption that system states can be accessible, an efficient pinning control scheme is developed to generate an observable BN by adjusting the network structure rather than just to check system observability. Accordingly, the sensors are constructed, of which the form is consistent with that of state feedback controllers in the designed pinning control. Since this pinning control approach only utilizes node-to-node message communication instead of global state space information, the time complexity is dramatically reduced from $O(2^{2n})$ to $O(n^2+n2^d)$, where where $n$ and $d$ are respectively the node number of the considered BN and the largest in-degree of vertices in its network structure. Finally, we design the sensors for the reduced D. melanogaster segmentation polarity gene network and the T-cell receptor kinetics, respectively. △ Less

Submitted 5 March, 2022; v1 submitted 5 December, 2019; originally announced December 2019.

arXiv:1912.01974 [pdf]

doi 10.1364/OE.392370

Image-free real-time classification of fast moving objects using 'learned' spatial light modulation and a single-pixel detector

Authors: Zibang Zhang, Xiang Li, Manhong Yao, Shujun Zheng, Guoan Zheng, **gang Zhong

Abstract: Objects classification generally relies on image acquisition and analysis. Real-time classification of high-speed moving objects is challenging, as both high temporal resolution in image acquisition and low computational complexity in objects classification algorithms are required. Here we propose and experimentally demonstrate an approach for real-time moving objects classification without image… ▽ More Objects classification generally relies on image acquisition and analysis. Real-time classification of high-speed moving objects is challenging, as both high temporal resolution in image acquisition and low computational complexity in objects classification algorithms are required. Here we propose and experimentally demonstrate an approach for real-time moving objects classification without image acquisition. As objects classification algorithms rely on the feature information of objects, we propose to use spatial light modulation to acquire the feature information directly rather than performing image acquisition followed by features extraction. A convolutional neural network is designed and trained to learn the spatial features of the target objects. The trained network can generate structured patterns for spatial light modulation. Using the resulting structured patterns for spatial light modulation, the feature information of target objects can be compressively encoded into a short light intensity sequence. The resulting one-dimensional signal is collected by a single-pixel detector and fed to the convolutional neural network for objects classification. As experimentally demonstrated, the proposed approach can achieve accurate and real-time classification of fast moving objects. The proposed method has potential applications in the fields where fast moving objects classification in real time and for long duration is required. △ Less

Submitted 4 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

arXiv:1912.01411 [pdf, other]

A New Approach to Pinning Control of Boolean Networks

Authors: Jie Zhong, Daniel W. C. Ho, Jianquan Lu

Abstract: Boolean networks (BNs) are discrete-time systems where nodes are inter-connected (here we call such connection rule among nodes as network structure), and the dynamics of each gene node is determined by logical functions. In this paper, we propose a new approach on pinning control design for global stabilization of BNs based on BNs' network structure, named as network-structure-based distributed p… ▽ More Boolean networks (BNs) are discrete-time systems where nodes are inter-connected (here we call such connection rule among nodes as network structure), and the dynamics of each gene node is determined by logical functions. In this paper, we propose a new approach on pinning control design for global stabilization of BNs based on BNs' network structure, named as network-structure-based distributed pinning control. By deleting the minimum number of edges, the network structure becomes acyclic. Then, an efficient distributed pinning control is designed to achieve global stabilization. Compared with existing literature, the design of pinning control is not based on the state transition matrix of BNs. Hence, the computational complexity in this paper is reduced from $O(2^n\times 2^n)$ to $O(2\times 2^K)$, where $n$ is the number of nodes and $K\leq n$ is the largest number of in-neighbors of nodes. In addition, without using state transition matrix, global state information is no longer needed, the design of pinning control is just based on neighbors' local information, which is easier to be implemented. The proposed method is well demonstrated by several biological networks with different sizes. The results are shown to be simple and concise, while the traditional pinning control can not be applied for BNs with such a large dimension. △ Less

Submitted 30 October, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

arXiv:1911.12141 [pdf]

A Simple Distortion Calibration method for Wide-Angle Lenses Based on Fringe-pattern Phase Analysis

Authors: Weishuai Zhou, Jiawen Weng, Junzheng Peng, **gang Zhong

Abstract: A distortion calibration method for wide-angle lens is proposed based on fringe-pattern phase analysis. Firstly, according to the experimental result of the radial distortion of the image not related to the recording depth of field, but depending on the field of view angle of the wide-angle lens imaging system, two-dimensional image distortion calibration is need to be considered. Four standard si… ▽ More A distortion calibration method for wide-angle lens is proposed based on fringe-pattern phase analysis. Firstly, according to the experimental result of the radial distortion of the image not related to the recording depth of field, but depending on the field of view angle of the wide-angle lens imaging system, two-dimensional image distortion calibration is need to be considered. Four standard sinusoidal fringe-patterns with phase shift step of , which are used as calibration templates, are shown on a Liquid Crystal Display screen, and captured by the wide-angle lens imaging system. A four-step phase-shifting method is employed to obtain the radial phase distribution of the distorted fringe-pattern. Wavelet analysis is applied for the analysis of the instantaneous frequency to show the fundamental frequency of the fringe-pattern in the central region being unchanged. Performing numerical calculation by the central 9 points of the central row of the fringe-pattern, we can get the undistorted radial phase distribution, so, the radial modulated phase is computed. Finally, the radial distortion distribution is determined according to the radial modulated phase. By employing a bilinear interpolation algorithm, the wide-angle lens image calibration is achieved. There is no need to establish any kind of image distortion model for the proposed method. There is no projecting system in the experimental apparatus, which avoids projection shadow problems, and no need to align with the center of the template for distortion measurement for the proposed method. Theoretical description, numerical simulation and experimental results show that the proposed method is simple, automatic and effective. △ Less

Submitted 26 November, 2019; originally announced November 2019.

arXiv:1911.03078 [pdf, other]

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

Authors: Xu Li, **ghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng

Abstract: This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems. In detail, we formulate the GMM i-vector system as a scoring function of enrollment and testing utterance pairs. Then we leverage the fast gradie… ▽ More This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems. In detail, we formulate the GMM i-vector system as a scoring function of enrollment and testing utterance pairs. Then we leverage the fast gradient sign method (FGSM) to optimize testing utterances for adversarial samples generation. These adversarial samples are used to attack both GMM i-vector and x-vector systems. We measure the system vulnerability by the degradation of equal error rate and false acceptance rate. Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e.g. x-vector systems). △ Less

Submitted 12 February, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Accepted by ICASSP 2020

arXiv:1911.03029 [pdf, other]

RoIMix: Proposal-Fusion among Multiple Images for Underwater Object Detection

Authors: Wei-Hong Lin, Jia-Xing Zhong, Shan Liu, Thomas Li, Ge Li

Abstract: Generic object detection algorithms have proven their excellent performance in recent years. However, object detection on underwater datasets is still less explored. In contrast to generic datasets, underwater images usually have color shift and low contrast; sediment would cause blurring in underwater images. In addition, underwater creatures often appear closely to each other on images due to th… ▽ More Generic object detection algorithms have proven their excellent performance in recent years. However, object detection on underwater datasets is still less explored. In contrast to generic datasets, underwater images usually have color shift and low contrast; sediment would cause blurring in underwater images. In addition, underwater creatures often appear closely to each other on images due to their living habits. To address these issues, our work investigates augmentation policies to simulate overlap**, occluded and blurred objects, and we construct a model capable of achieving better generalization. We propose an augmentation method called RoIMix, which characterizes interactions among images. Proposals extracted from different images are mixed together. Previous data augmentation methods operate on a single image while we apply RoIMix to multiple images to create enhanced samples as training data. Experiments show that our proposed method improves the performance of region-based object detectors on both Pascal VOC and URPC datasets. △ Less

Submitted 24 March, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

Comments: ICASSP 2020

arXiv:1909.12999 [pdf]

doi 10.1002/mrm.28872

Efficient T2 map** with Blip-up/down EPI and gSlider-SMS (T2-BUDA-gSlider)

Authors: Xiaozhi Cao, Congyu Liao, Zi**g Zhang, Siddharth Srinivasan Iyer, Kang Wang, Hongjian He, Huafeng Liu, Kawin Setsompop, Jianhui Zhong, Berkin Bilgic

Abstract: Purpose: To rapidly obtain high isotropic-resolution T2 maps with whole-brain coverage and high geometric fidelity. Methods: A T2 blip-up/down echo planar imaging (EPI) acquisition with generalized Slice-dithered enhanced resolution (T2-BUDA-gSlider) is proposed. A radiofrequency (RF)-encoded multi-slab spin-echo EPI acquisition with multiple echo times (TEs) was developed to obtain high SNR eff… ▽ More Purpose: To rapidly obtain high isotropic-resolution T2 maps with whole-brain coverage and high geometric fidelity. Methods: A T2 blip-up/down echo planar imaging (EPI) acquisition with generalized Slice-dithered enhanced resolution (T2-BUDA-gSlider) is proposed. A radiofrequency (RF)-encoded multi-slab spin-echo EPI acquisition with multiple echo times (TEs) was developed to obtain high SNR efficiency with reduced repetition time (TR). This was combined with an interleaved 2-shot EPI acquisition using blip-up/down phase encoding. An estimated field map was incorporated into the joint multi-shot EPI reconstruction with a structured low rank constraint to achieve distortion-free and robust reconstruction for each slab without navigation. A Bloch simulated subspace model was integrated into gSlider reconstruction and utilized for T2 quantification. Results: In vivo results demonstrated that the T2 values estimated by the proposed method were consistent with gold standard spin-echo acquisition. Compared to the reference 3D fast spin echo (FSE) images, distortion caused by off-resonance and eddy current effects were effectively mitigated. Conclusion: BUDA-gSlider SE-EPI acquisition and gSlider-subspace joint reconstruction enabled distortion-free whole-brain T2 map** in 2 min at ~1 mm3 isotropic resolution, which could bring significant benefits to related clinical and neuroscience applications. △ Less

Submitted 20 September, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: 20 pages, 7 figures

Journal ref: Magnetic Resonance in Medicine (2020)

arXiv:1905.04144 [pdf]

doi 10.1063/1.5115448

Optical synthetic sampling imaging: concept and an example of microscopy

Authors: Junzheng Peng, Manhong Yao, Zixin Cai, Xue Qiu, Zibang Zhang, Shi** Li, **gang Zhong

Abstract: Digital two-dimensional (2D) spatial sampling devices (such as charge-coupled device) have been widely used in various imaging systems, especially in computational imaging systems. However, the undersampling of digital sampling devices is a problem that limits the resolution of the acquired images. In this study, we present a synthetic sampling imaging (SSI) concept to solve the undersampling prob… ▽ More Digital two-dimensional (2D) spatial sampling devices (such as charge-coupled device) have been widely used in various imaging systems, especially in computational imaging systems. However, the undersampling of digital sampling devices is a problem that limits the resolution of the acquired images. In this study, we present a synthetic sampling imaging (SSI) concept to solve the undersampling problem. It combines the structured illumination system and conventional 2D image detection system to simultaneously sample the specimen from the illumination and the detection sides. Then, we synthesize the illumination sampling rate and the detection sampling rate to reconstruct a high sampling rate image. The concept of the proposed SSI is demonstrated by an example of microscopy. Experimental results confirm that the proposed method can double the sampling resolution of the microscope. The synthetic sampling scheme, where the sampling task is shared by the illumination and detection sides, provides insight for resolving the undersampling problem of the digital imaging system. △ Less

Submitted 13 May, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

arXiv:1812.08067 [pdf]

doi 10.1002/mrm.27812

Ultrashort Echo Time Magnetic Resonance Fingerprinting (UTE-MRF) for Simultaneous Quantification of Long and Ultrashort T2 Tissues

Authors: Qing Li, Xiaozhi Cao, Huihui Ye, Congyu Liao, Hongjian He, Jianhui Zhong

Abstract: Purpose: To demonstrate an ultrashort echo time magnetic resonance fingerprinting (UTE-MRF) method that can simultaneously quantify tissue relaxometries for muscle and bone in musculoskeletal systems and tissue components in brain and therefore can synthesize pseudo-CT images. Methods: A FISP-MRF sequence with half pulse excitation and half spoke radial acquisition was designed to sample fast T2… ▽ More Purpose: To demonstrate an ultrashort echo time magnetic resonance fingerprinting (UTE-MRF) method that can simultaneously quantify tissue relaxometries for muscle and bone in musculoskeletal systems and tissue components in brain and therefore can synthesize pseudo-CT images. Methods: A FISP-MRF sequence with half pulse excitation and half spoke radial acquisition was designed to sample fast T2 decay signals. Sinusoidal echo time (TE) pattern was applied to enhance MRF sensitivity for tissues with short and ultrashort T2 values. The performance of UTE-MRF was evaluated via simulations, phantoms, and in vivo experiments. Results: A minimal TE of 0.05 ms was achieved in UTE-MRF. Simulations indicated that extension of TE sampling increased T2 quantification accuracy in cortical bone and tendon, and had little impact on long T2 muscle quantifications. For a rubber phantom, an average T1/T2 of 162/1.07 ms from UTE-MRF were compared well with gold standard T2 of 190 ms from IR-UTE and T2* of 1.03 ms from UTE sequence. For a long T2 agarose phantom, the linear regression slope between UTE-MRF and gold standard was 1.07 (R2=0.991) for T1 and 1.04 (R2=0.994) for T2. In vivo experiments showed the detection of cortical bone and Achilles tendon, where the averaged T2 was respectively 1.0 ms and 15 ms. Scalp images were in good agreement with CT. Conclusion: UTE-MRF with sinusoidal TE variations shows its capability to produce pseudo-CT images and simultaneously output T1, T2, proton density, and B0 maps for tissues with long T2 and short/ultrashort T2 in the brain and musculoskeletal system. △ Less

Submitted 27 March, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

Comments: 32 pages, 12 figures, 1 table

Journal ref: Magnetic Resonance in Medicine (2019)

Showing 1–41 of 41 results for author: Zhong, J