Skip to main content

Showing 1–50 of 131 results for author: Jiang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Kai Yu, Aidi Lin, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Wei Chen, Yilong Luo, Yifan Chen, **gcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Yanda Meng, Yukun Zhou , et al. (11 additional authors not shown)

    Abstract: The current retinal artificial intelligence models were trained using data with a limited category of diseases and limited knowledge. In this paper, we present a retinal vision-language foundation model (RetiZero) with knowledge of over 400 fundus diseases. Specifically, we collected 341,896 fundus images paired with text descriptions from 29 publicly available datasets, 180 ophthalmic books, and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2406.01153  [pdf, other

    eess.SY

    Safety-Critical Control of Euler-Lagrange Systems Subject to Multiple Obstacles and Velocity Constraints

    Authors: Zhi Liu, Si Wu, Tengfei Liu, Zhong-** Jiang

    Abstract: This paper studies the safety-critical control problem for Euler-Lagrange (EL) systems subject to multiple ball obstacles and velocity constraints in accordance with affordable velocity ranges. A key strategy is to exploit the underlying inner-outer-loop structure for the design of a new cascade controller for the class of EL systems. In particular, the outer-loop controller is developed based on… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2406.01058  [pdf, other

    eess.SY

    Constructive Safety Control

    Authors: Si Wu, Tengfei Liu, Zhong-** Jiang

    Abstract: This paper proposes a constructive approach to safety control of nonlinear cascade systems subject to multiple state constraints. New design ingredients include a unified characterization of safety and stability for systematic designs of safety controllers, and a novel technique of resha** the feasible sets of quadratically constrained quadratic programming induced from safety control. The propo… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2406.00753  [pdf, ps, other

    math.OC eess.SY

    Singular Perturbation: When the Perturbation Parameter Becomes a State-Dependent Function

    Authors: Tengfei Liu, Zhong-** Jiang

    Abstract: This paper presents a new systematic framework for nonlinear singularly perturbed systems in which state-dependent perturbation functions are used instead of constant perturbation coefficients. Under this framework, general results are obtained for the global robust stability and input-to-state stability of nonlinear singularly perturbed systems. Interestingly, the proposed methodology provides in… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  7. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  8. arXiv:2405.09569  [pdf, other

    eess.SP cs.LG

    GaitMotion: A Multitask Dataset for Pathological Gait Forecasting

    Authors: Wenwen Zhang, Hao Zhang, Zenan Jiang, **g Wang, Amir Servati, Peyman Servati

    Abstract: Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensor… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  9. arXiv:2404.17917  [pdf, other

    cs.CV eess.IV

    EvaNet: Elevation-Guided Flood Extent Map** on Earth Imagery

    Authors: Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou

    Abstract: Accurate and timely map** of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectr… ▽ More

    Submitted 12 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted at the International Joint Conference on Artificial Intelligence (IJCAI, 2024)

  10. arXiv:2404.15009  [pdf, other

    cs.CV eess.IV

    The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033

  11. arXiv:2404.13786  [pdf, other

    eess.SY cs.AI cs.DC cs.LG

    Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

    Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, **gfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

    Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  12. arXiv:2404.00327  [pdf, other

    eess.IV cs.CV cs.LG

    YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT)

    Authors: Wen Sheng, Zhong Zheng, Jiajun Liu, Han Lu, Hanyuan Zhang, Zhengyong Jiang, Zhihong Zhang, Dao** Zhu

    Abstract: Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation d… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 15 pages

  13. arXiv:2403.14523  [pdf, other

    eess.IV cs.CV

    Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration

    Authors: Chenyang Li, Dianye Huang, Angelos Karlas, Nassir Navab, Zhongliang Jiang

    Abstract: In clinical applications that involve ultrasound-guided intervention, the visibility of the needle can be severely impeded due to steep insertion and strong distractors such as speckle noise and anatomical occlusion. To address this challenge, we propose VibNet, a learning-based framework tailored to enhance the robustness and accuracy of needle detection in ultrasound images, even when the target… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  14. arXiv:2403.08154  [pdf, other

    cs.LG eess.SP

    The Effect of Different Optimization Strategies to Physics-Constrained Deep Learning for Soil Moisture Estimation

    Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

    Abstract: Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depe… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2403.07228  [pdf, other

    eess.SP

    Physics-constrained Active Learning for Soil Moisture Estimation and Optimal Sensor Placement

    Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

    Abstract: Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring so… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  16. arXiv:2403.05989  [pdf, other

    cs.SD eess.AS

    HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling

    Authors: Chunhui Wang, Chang Zeng, Bowen Zhang, Ziyang Ma, Yefan Zhu, Zifeng Cai, Jian Zhao, Zhonglin Jiang, Yong Chen

    Abstract: Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train i… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  17. arXiv:2402.18070  [pdf, other

    cs.AR eess.SP

    A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

    Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

    Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 7 figures, conference

  18. arXiv:2402.12208  [pdf, other

    eess.AS cs.SD

    Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

    Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

    Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: We release a more powerful checkpoint in Language-Codec v3

  19. arXiv:2402.09378  [pdf, other

    eess.AS cs.SD

    MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

    Authors: Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

    Abstract: Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, mod… ▽ More

    Submitted 2 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Main Conference)

  20. arXiv:2402.07729  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

    Authors: Qian Yang, ** Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, **gren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the ope… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  21. arXiv:2402.04426  [pdf, other

    eess.IV cs.CV

    Quantitative Metrics for Benchmarking Medical Image Harmonization

    Authors: Abhijeet Parida, Zhifan Jiang, Roger J. Packer, Robert A. Avery, Syed M. Anwar, Marius G. Linguraru

    Abstract: Image harmonization is an important preprocessing strategy to address domain shifts arising from data acquired using different machines and scanning protocols in medical imaging. However, benchmarking the effectiveness of harmonization techniques has been a challenge due to the lack of widely available standardized datasets with ground truths. In this context, we propose three metrics: two intensi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted for presentation at the ISBI 2024

  22. arXiv:2312.07561  [pdf, other

    eess.SP cs.CV cs.CY cs.LG

    Annotating sleep states in children from wrist-worn accelerometer data using Machine Learning

    Authors: Ashwin Ram, Sundar Sripada V. S., Shuvam Keshari, Zizhe Jiang

    Abstract: Sleep detection and annotation are crucial for researchers to understand sleep patterns, especially in children. With modern wrist-worn watches comprising built-in accelerometers, sleep logs can be collected. However, the annotation of these logs into distinct sleep events: onset and wakeup, proves to be challenging. These annotations must be automated, precise, and scalable. We propose to model t… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  23. arXiv:2312.03324  [pdf

    eess.AS

    Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion

    Authors: Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li

    Abstract: Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effec… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 12 pages, 5 figures, 6 tables; accepted for publication in IEEE-ACM TASLP

  24. arXiv:2311.13832  [pdf, other

    eess.SY

    Dynamic Operating Envelopes Embedded Peer-to-Peer-to-Grid Energy Trading

    Authors: Zhisen Jiang, Ye Guo, Hongbin Sun, Jianxiao Wang

    Abstract: A novel decentralized peer-to-peer-to-grid (P2P2G) trading mechanism considering distribution network integrity is proposed. In order to direct prosumers' peer-to-peer (P2P) trading behavior to be grid-friendly, the proposed method incorporates Dynamic Operating Envelopes (DOEs) into the existing P2P2G trading. Moreover, DOEs are determined through negotiations between the distribution system oper… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  25. arXiv:2311.09462  [pdf, other

    eess.SY

    Software-Defined Virtual Synchronous Condenser

    Authors: Zimin Jiang, Peng Zhang, Yifan Zhou, Łukasz Kocewiak, Divya Kurthakoti Chandrashekhara, Marie-Lou Picherit, Zefan Tang, Kenneth B. Bowes, Guangya Yang

    Abstract: Synchronous condensers (SCs) play important roles in integrating wind energy into relatively weak power grids. However, the design of SCs usually depends on specific application requirements and may not be adaptive enough to the frequently-changing grid conditions caused by the transition from conventional to renewable power generation. This paper devises a software-defined virtual synchronous con… ▽ More

    Submitted 17 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

  26. arXiv:2311.07081  [pdf, other

    cs.IT eess.SP

    Sensing Mutual Information with Random Signals in Gaussian Channels

    Authors: Lei Xie, Fan Liu, Zhanyuan Xie, Zheng Jiang, Shenghui Song

    Abstract: Sensing performance is typically evaluated by classical metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the metric for sensing and communication, where researchers have proposed to utilize mutual information (MI) to measure the sensing performance with determ… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  27. arXiv:2311.01919  [pdf

    eess.SP

    Reconfigurable Intelligent Surface & Edge -- An Introduction of an EM manipulation structure on obstacles' edge

    Authors: Tianqi Xiang, Zhiwei Jiang, Weijun Hong, Xin Zhang, Yuehong Gao

    Abstract: Reconfigurable Intelligent Surface (RIS) or metasurface is one of the important enabling technologies in mobile cellular networks that can effectively enhance the signal coverage performance in obstructed regions, and it is generally deployed on surfaces different from obstacles to redirect electromagnetic (EM) waves by reflection, or covered on objects' surfaces to manipulate EM waves by refracti… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  28. arXiv:2310.13906  [pdf, other

    cs.CV eess.IV

    Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer

    Authors: Junwei You, Ying Chen, Zhuoyu Jiang, Zhangchi Liu, Zilin Huang, Yifeng Ding, Bin Ran

    Abstract: Effective classification of autonomous vehicle (AV) driving behavior emerges as a critical area for diagnosing AV operation faults, enhancing autonomous driving algorithms, and reducing accident rates. This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze AV driving behavior. The proposed GAF-ViT model consists of three key components: GAF Transforme… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  29. arXiv:2310.06807  [pdf

    eess.SP

    Longitudinal gOSNR Monitoring by Receiver-side Digital Signal Processing in Multi-Span Optical Transmission System

    Authors: Choloong Hahn, Junho Chang, Zhi** Jiang

    Abstract: We propose the world first longitudinal gOSNR estimation by using correlation template method at Rx, without any monitoring devices located in the middle of the link. The proposed method is experimentally demonstrated in a 12-span link with commercial transceiver.

    Submitted 10 October, 2023; originally announced October 2023.

  30. Secondary frequency control of islanded microgrid considering wind and solar stochastics

    Authors: Cheng Zhong, Zhifu Jiang, Xiangyu Zhang, Jikai Chen, Yang Li

    Abstract: As the high penetration of wind and photovoltaic distributed generation (DG) in the microgrid, the stochastic and low inertia emerge, bringing more challenges especially when the microgrid operates in isolated islands. Nevertheless, the reserve power of DGs in deloading control mode can be utilized for frequency regulation and mitigating frequency excursion. This paper proposed a model predictive… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by Acta energiae solaris sinica [In Chinese]

    Journal ref: Acta Energiae Solaris Sinica 45 (2024) 523-533

  31. arXiv:2310.02930  [pdf, ps, other

    math.OC eess.SY

    Small-Disturbance Input-to-State Stability of Perturbed Gradient Flows: Applications to LQR Problem

    Authors: Leilei Cui, Zhong-** Jiang, Eduardo D. Sontag

    Abstract: This paper studies the effect of perturbations on the gradient flow of a general nonlinear programming problem, where the perturbation may arise from inaccurate gradient estimation in the setting of data-driven optimization. Under suitable conditions on the objective function, the perturbed gradient flow is shown to be small-disturbance input-to-state stable (ISS), which implies that, in the prese… ▽ More

    Submitted 16 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 20 pages

  32. arXiv:2309.11725  [pdf, other

    cs.SD cs.AI eess.AS

    FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

    Authors: Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

    Abstract: Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and gl… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP'2024

  33. arXiv:2309.07293  [pdf

    cs.CV eess.IV

    GAN-based Algorithm for Efficient Image Inpainting

    Authors: Zhengyang Han, Zehao Jiang, Yuan Ju

    Abstract: Global pandemic due to the spread of COVID-19 has post challenges in a new dimension on facial recognition, where people start to wear masks. Under such condition, the authors consider utilizing machine learning in image inpainting to tackle the problem, by complete the possible face that is originally covered in mask. In particular, autoencoder has great potential on retaining important, general… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures

    MSC Class: 68U10

    Journal ref: The 3rd International Conference on Artificial Intelligence and Computer Engineering(ICAICE 2022)

  34. arXiv:2309.02232  [pdf, other

    cs.SD cs.AI eess.AS

    FSD: An Initial Chinese Dataset for Fake Song Detection

    Authors: Yuankun Xie, **g**g Zhou, Xiaolin Lu, Zhenghao Jiang, Yuxin Yang, Haonan Cheng, Long Ye

    Abstract: Singing voice synthesis and singing voice conversion have significantly advanced, revolutionizing musical experiences. However, the rise of "Deepfake Songs" generated by these technologies raises concerns about authenticity. Unlike Audio DeepFake Detection (ADD), the field of song deepfake detection lacks specialized datasets or methods for song authenticity verification. In this paper, we initial… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  35. TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Recently, there has been a growing interest in the field of controllable Text-to-Speech (TTS). While previous studies have relied on users providing specific style factor values based on acoustic knowledge or selecting reference speeches that meet certain requirements, generating speech solely from natural text prompts has emerged as a new challenge for researchers. This challenge arises due to th… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  36. arXiv:2308.11047  [pdf, other

    eess.IV cs.CV cs.LG

    Harmonization Across Imaging Locations(HAIL): One-Shot Learning for Brain MRI

    Authors: Abhijeet Parida, Zhifan Jiang, Syed Muhammad Anwar, Nicholas Foreman, Nicholas Stence, Michael J. Fisher, Roger J. Packer, Robert A. Avery, Marius George Linguraru

    Abstract: For machine learning-based prognosis and diagnosis of rare diseases, such as pediatric brain tumors, it is necessary to gather medical imaging data from multiple clinical sites that may use different devices and protocols. Deep learning-driven harmonization of radiologic images relies on generative adversarial networks (GANs). However, GANs notoriously generate pseudo structures that do not exist… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Under review

  37. arXiv:2308.03865  [pdf, other

    eess.IV cs.CV

    DefCor-Net: Physics-Aware Ultrasound Deformation Correction

    Authors: Zhongliang Jiang, Yue Zhou, Dongliang Cao, Nassir Navab

    Abstract: The recovery of morphologically accurate anatomical images from deformed ones is challenging in ultrasound (US) image acquisition, but crucial to accurate and consistent diagnosis, particularly in the emerging field of computer-assisted diagnosis. This article presents a novel anatomy-aware deformation correction approach based on a coarse-to-fine, multi-scale deep neural network (DefCor-Net). To… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted by MedIA. code is available

  38. arXiv:2307.07218  [pdf, other

    eess.AS cs.SD

    Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

    Authors: Ziyue Jiang, **glin Liu, Yi Ren, **zheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

    Abstract: Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skip** the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which si… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted by ICLR 2024

  39. arXiv:2307.03800  [pdf, other

    eess.IV cs.CV cs.RO

    Thoracic Cartilage Ultrasound-CT Registration using Dense Skeleton Graph

    Authors: Zhongliang Jiang, Chenyang Li, Xuesong Li, Nassir Navab

    Abstract: Autonomous ultrasound (US) imaging has gained increased interest recently, and it has been seen as a potential solution to overcome the limitations of free-hand US examinations, such as inter-operator variations. However, it is still challenging to accurately map planned paths from a generic atlas to individual patients, particularly for thoracic applications with high acoustic-impedance bone stru… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted by IROS23

  40. arXiv:2307.03698  [pdf, other

    eess.IV cs.CV cs.RO

    Motion Magnification in Robotic Sonography: Enabling Pulsation-Aware Artery Segmentation

    Authors: Dianye Huang, Yuan Bi, Nassir Navab, Zhongliang Jiang

    Abstract: Ultrasound (US) imaging is widely used for diagnosing and monitoring arterial diseases, mainly due to the advantages of being non-invasive, radiation-free, and real-time. In order to provide additional information to assist clinicians in diagnosis, the tubular structures are often segmented from US images. To improve the artery segmentation accuracy and stability during scans, this work presents a… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted Paper IROS 2023

  41. arXiv:2306.03509  [pdf, other

    eess.AS cs.AI cs.SD

    Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

    Authors: Ziyue Jiang, Yi Ren, Zhenhui Ye, **glin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

    Abstract: Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS. However, previous works usually encode speech into latent using audio codec and use autoregressive language models or diffusion models to generate it, which ignores the intrinsic nature of speech and may lead to inferior or un… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  42. arXiv:2306.03504  [pdf, other

    cs.CV cs.SD eess.AS

    Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

    Authors: Zhenhui Ye, Ziyue Jiang, Yi Ren, **glin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

    Abstract: We are interested in a novel task, namely low-resource text-to-talking avatar. Given only a few-minute-long talking person video with the audio track as the training data and arbitrary texts as the driving input, we aim to synthesize high-quality talking portrait videos corresponding to the input text. This task has broad application prospects in the digital human industry but has not been technic… ▽ More

    Submitted 2 August, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted by ICML 2023 Workshop, 6 pages, 3 figures

  43. arXiv:2306.00838  [pdf, other

    q-bio.OT eess.IV

    The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI

    Authors: Ahmed W. Moawad, Anastasia Janas, Ujjwal Baid, Divya Ramakrishnan, Rachit Saluja, Nader Ashraf, Leon Jekel, Raisa Amiruddin, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Sanjay Aneja, Syed Muhammad Anwar, Timothy Bergquist, Evan Calabrese, Veronica Chiang, Verena Chung, Gian Marco Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang , et al. (206 additional authors not shown)

    Abstract: The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and chara… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  44. arXiv:2306.00426  [pdf

    eess.AS cs.SD

    Speaker verification using attentive multi-scale convolutional recurrent network

    Authors: Yanxiong Li, Zhongjie Jiang, Wenchang Cao, Qisheng Huang

    Abstract: In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker em… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 21 pages, 6 figures, 8 tables. Accepted for publication in Applied Soft Computing

  45. arXiv:2305.19369  [pdf

    eess.IV cs.CV physics.med-ph

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub-Saharan Africa Patient Population (BraTS-Africa)

    Authors: Maruf Adewole, Jeffrey D. Rudie, Anu Gbadamosi, Oluyemisi Toyobo, Confidence Raymond, Dong Zhang, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, Nancy Ojo, Kenneth Aguh, Chinasa Kalaiwo, Gabriel Babatunde, Afolabi Ogunleye, Yewande Gbadamosi, Kator Iorpagher, Evan Calabrese, Mariam Aboian, Marius Linguraru, Jake Albrecht, Benedikt Wiestler, Florian Kofler, Anastasia Janas, Dominic LaBella , et al. (26 additional authors not shown)

    Abstract: Gliomas are the most common type of primary brain tumors. Although gliomas are relatively rare, they are among the deadliest types of cancer, with a survival rate of less than 2 years after diagnosis. Gliomas are challenging to diagnose, hard to treat and inherently resistant to conventional therapy. Years of extensive research to improve diagnosis and treatment of gliomas have decreased mortality… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2107.02314

  46. arXiv:2305.19269  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Make-A-Voice: Unified Voice Synthesis With Discrete Representation

    Authors: Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Lu** Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

    Abstract: Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, the majority of voice synthesis models currently rely on annotated audio data, but it is crucial to scale them to self-supervised datasets in order to effectively capture the wide range of acoustic variations present in human voice, including speak… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  47. arXiv:2305.17033  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas , et al. (48 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20\%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. The MICCA… ▽ More

    Submitted 23 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

  48. arXiv:2305.16333  [pdf, ps, other

    cs.CL cs.AI cs.LG eess.AS

    Text Generation with Speech Synthesis for ASR Data Augmentation

    Authors: Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen

    Abstract: Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data augmentation, its combination with text generation methods is considerably less explored. In this work, we explore text augmentation for ASR using large-scale pre-tr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  49. arXiv:2305.13612  [pdf, other

    cs.SD eess.AS

    FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

    Authors: Ziyue Jiang, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren, Zhou Zhao

    Abstract: Stutter removal is an essential scenario in the field of speech editing. However, when the speech recording contains stutters, the existing text-based speech editing approaches still suffer from: 1) the over-smoothing problem in the edited speech; 2) lack of robustness due to the noise introduced by stutter; 3) to remove the stutters, users are required to determine the edited region manually. To… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 (Findings)

  50. arXiv:2305.10763  [pdf, other

    cs.SD eess.AS

    CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

    Authors: Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, **glin Liu, **zheng He, Xiang Yin, Zhou Zhao

    Abstract: Improving text representation has attracted much attention to achieve expressive text-to-speech (TTS). However, existing works only implicitly learn the prosody with masked token reconstruction tasks, which leads to low training efficiency and difficulty in prosody modeling. We propose CLAPSpeech, a cross-modal contrastive pre-training framework that explicitly learns the prosody variance of the s… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 (Main Conference)