Skip to main content

Showing 1–50 of 193 results for author: Tang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00596  [pdf, other

    eess.IV cs.CV

    HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.10911  [pdf, other

    cs.SD eess.AS

    SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

    Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin **

    Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.08905  [pdf, other

    cs.SD eess.AS

    SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

    Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin **

    Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin **

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.07725  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

    Authors: Xuankai Chang, Jiatong Shi, **chuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin **

    Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This manuscript has been accepted by Interspeech2024

  7. arXiv:2406.04001  [pdf, other

    math.OC eess.SY math.DS

    Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex Lifting

    Authors: Yang Zheng, Chih-Fan Pai, Yujie Tang

    Abstract: Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and conv… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  8. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.02262  [pdf, other

    eess.SP

    A DAFT Based Unified Waveform Design Framework for High-Mobility Communications

    Authors: Xingyao Zhang, Haoran Yin, Yanqun Tang, Yu Zhou, Yuqing Liu, **ming Du, Yipeng Ding

    Abstract: With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM),… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2405.18356  [pdf, other

    eess.IV cs.CV

    Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

    Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Image Analysis

  11. arXiv:2405.05244  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

    Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

  12. arXiv:2405.00372  [pdf, other

    eess.SP

    High-Precision Positioning with Continuous Delay and Doppler Shift using AFT-MC Waveforms

    Authors: Cong Yi, Haoran Yin, Xianjie Lu, Yanqun Tang

    Abstract: This paper explores a novel integrated localization and communication (ILAC) system using the affine Fourier transform multicarrier (AFT-MC) waveform. Specifically, we consider a multiple-input multiple-output (MIMO) AFT-MC system with ILAC and derive a continuous delay and Doppler shift channel matrix model. Based on the derived signal model, we develop a two-step algorithm with low complexity fo… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  13. arXiv:2404.12595  [pdf, other

    eess.SP

    Deep Reinforcement Learning-aided Transmission Design for Energy-efficient Link Optimization in Vehicular Communications

    Authors: Zhengpeng Wang, Yanqun Tang, Yingzhe Mao, Tao Wang, Xiunan Huang

    Abstract: This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (S… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures

  14. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  15. arXiv:2404.04878  [pdf, other

    eess.IV cs.CV

    CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

    Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

    Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR accepted paper

  16. arXiv:2404.01088  [pdf, other

    eess.SP

    GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems

    Authors: Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan

    Abstract: The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate da… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  17. arXiv:2403.07453  [pdf, other

    eess.SY

    Humans-in-the-Building: Getting Rid of Thermostats for Optimal Thermal Comfort Control in Energy Management Systems

    Authors: Jiali Wang, Yang Tang, Luca Schenato

    Abstract: Given the widespread attention to individual thermal comfort, coupled with significant energy-saving potential inherent in energy management systems for optimizing indoor environments, this paper aims to introduce advanced "Humans-in-the-building" control techniques to redefine the paradigm of indoor temperature design. Firstly, we innovatively redefine the role of individuals in the control loop,… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  18. Joint Sparsity Pattern Learning Based Channel Estimation for Massive MIMO-OTFS Systems

    Authors: Kuo Meng, Shaoshi Yang, Xiao-Yang Wang, Yan Bu, Yurong Tang, Jianhua Zhang, Lajos Hanzo

    Abstract: We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) orthogonal time-frequency-space (OTFS) modulation aided systems. By exploiting the potential joint sparsity of the delay-Doppler-angle (DDA) domain channel, the channel estimation problem is transformed into a sparse recovery problem. To solve it, we first apply the sp… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 6 pages, 6 figures, accepted to appear on IEEE Transactions on Vehicular Technology, Mar. 2024

  19. arXiv:2402.19286  [pdf, other

    eess.IV cs.CV

    PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

  20. arXiv:2402.04356  [pdf, other

    cs.SD cs.CV eess.AS

    Bidirectional Autoregressive Diffusion Model for Dance Generation

    Authors: Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, **g Xiao, Song Wang

    Abstract: Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create… ▽ More

    Submitted 22 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  21. arXiv:2401.17619  [pdf, ps, other

    cs.SD eess.AS

    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

    Authors: Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin **, Shinji Watanabe

    Abstract: In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu… ▽ More

    Submitted 12 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2024

  22. arXiv:2401.08837  [pdf

    cs.CV eess.IV

    Image Fusion in Remote Sensing: An Overview and Meta Analysis

    Authors: Hessah Albanwan, Rongjun Qin, Yang Tang

    Abstract: Image fusion in Remote Sensing (RS) has been a consistent demand due to its ability to turn raw images of different resolutions, sources, and modalities into accurate, complete, and spatio-temporally coherent images. It greatly facilitates downstream applications such as pan-sharpening, change detection, land-cover classification, etc. Yet, image fusion solutions are highly disparate to various re… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 21pages, 10 figures

  23. arXiv:2401.03060  [pdf

    eess.IV cs.CV

    Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement

    Authors: Ho Hin Lee, Adam M. Saunders, Michael E. Kim, Samuel W. Remedios, Lucas W. Remedios, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Chloe Cho, Louise A. Mawn, Tonia S. Rex, Kevin L. Schey, Blake E. Dewey, Jeffrey M. Spraggins, Jerry L. Prince, Yuankai Huo, Bennett A. Landman

    Abstract: Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Revised for submission to SPIE Journal of Medical Imaging. 26 pages, 6 figures

  24. arXiv:2312.15380  [pdf, other

    cs.NI eess.SP

    Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

    Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei

    Abstract: Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: accepted by wcnc2024

  25. arXiv:2312.15332  [pdf, other

    math.OC eess.SY math.DS

    Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

    Authors: Yang Zheng, Chih-fan Pai, Yujie Tang

    Abstract: Direct policy search has achieved great empirical success in reinforcement learning. Many recent studies have revisited its theoretical foundation for continuous control, which reveals elegant nonconvex geometry in various benchmark problems, especially in fully observable state-feedback cases. This paper considers two fundamental optimal and robust control problems with partial observability: the… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 79 pages, 12 figures

  26. arXiv:2312.11125  [pdf, other

    eess.SP

    A Low-Complexity Range Estimation with Adjusted Affine Frequency Division Multiplexing Waveform

    Authors: Jiajun Zhu, Yanqun Tang, Xizhang Wei, Haoran Yin, **ming Du, Zhengpeng Wang, Yuqinng Liu

    Abstract: Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface… ▽ More

    Submitted 29 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: The paper has been submitted to IEEE WCNC 2024 WS-13: Mobile Sensing-Communication-Computation Synergy for 6G Internet of Things

  27. arXiv:2312.09911  [pdf, other

    cs.SD eess.AS

    Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

    Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

    Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More

    Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Amphion Website: https://github.com/open-mmlab/Amphion

  28. arXiv:2311.17631  [pdf, other

    eess.SY cs.CR cs.LG

    Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks

    Authors: Xianlun Peng, Yang Tang, Fangfei Li, Yang Liu

    Abstract: In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obta… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  29. arXiv:2311.00372  [pdf, ps, other

    eess.SY

    Zeroth-Order Feedback-Based Optimization for Distributed Demand Response

    Authors: Ruiyang **, Yujie Tang, Jie Song

    Abstract: Distributed demand response is a typical distributed optimization problem that requires coordination among multiple agents to satisfy demand response requirements. However, existing distributed algorithms for this problem still face challenges such as unknown system models, nonconvexity, privacy issues, etc. To address these challenges, we propose and analyze two distributed algorithms, in which t… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  30. arXiv:2310.12286  [pdf

    eess.SY eess.IV

    System identification and closed-loop control of laser hot-wire directed energy deposition using the parameter-signature-property modeling scheme

    Authors: M. Rahmani Dehaghani, Atieh Sahraeidolatkhaneh, Morgan Nilsen, Fredrik Sikström, Pouyan Sajadi, Yifan Tang, G. Gary Wang

    Abstract: Hot-wire directed energy deposition using a laser beam (DED-LB/w) is a method of metal additive manufacturing (AM) that has benefits of high material utilization and deposition rate, but parts manufactured by DED-LB/w suffer from a substantial heat input and undesired surface finish. Hence, monitoring and controlling the process parameters and signatures during the deposition is crucial to ensure… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 28 pages, 14 figures, 4 tables,

  31. arXiv:2310.07141  [pdf, ps, other

    cs.IT eess.SP

    Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems

    Authors: Yuankun Tang, Anjie Zhang, Miaowen Wen, Yu Huang, Fei Ji, **ming Wen

    Abstract: Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o… ▽ More

    Submitted 28 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: accepted by IEEE Wireless Communications and Networking Conference (WCNC) 2024

  32. arXiv:2310.05513  [pdf, other

    cs.SD cs.CL eess.AS

    Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

    Authors: Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-** Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

    Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track w… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU

  33. arXiv:2309.09392  [pdf, other

    eess.IV cs.CV

    Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

    Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  34. arXiv:2309.04071  [pdf, other

    eess.IV cs.CV

    Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration

    Authors: Xin Yu, Yucheng Tang, Qi Yang, Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

    Abstract: Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potentia… ▽ More

    Submitted 10 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  35. arXiv:2309.00721  [pdf, ps, other

    eess.SY

    Geometric Tracking on $\mathcal{S}^{3}$ Based on Sliding Mode Control

    Authors: Eduardo Espindola, Yu Tang

    Abstract: Attitude tracking on the unit sphere of dimension $3$ based on sliding mode is considered in this paper. The tangent bundle of Lagrangian dynamics that describes the rotational motion of a rigid body is first shown to be a Lie group, and then a sliding surface that emerged on it is defined. Next, a sliding-mode controller is designed for attitude tracking that relies on an intrinsic error defined… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  36. arXiv:2308.05785  [pdf, other

    eess.IV cs.CV

    Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning

    Authors: Xueyuan Li, Ruining Deng, Yucheng Tang, Shunxing Bao, Haichun Yang, Yuankai Huo

    Abstract: Precise identification of multiple cell classes in high-resolution Giga-pixel whole slide imaging (WSI) is critical for various clinical scenarios. Building an AI model for this purpose typically requires pixel-level annotations, which are often unscalable and must be done by skilled domain experts (e.g., pathologists). However, these annotations can be prone to errors, especially when distinguish… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  37. arXiv:2308.05784  [pdf, other

    eess.IV cs.CV

    High-performance Data Management for Whole Slide Image Analysis in Digital Pathology

    Authors: Haoju Leng, Ruining Deng, Shunxing Bao, Dazheng Fang, Bryan A. Millis, Yucheng Tang, Haichun Yang, Xiao Wang, Yifan Peng, Lipeng Wan, Yuankai Huo

    Abstract: When dealing with giga-pixel digital pathology in whole-slide imaging, a notable proportion of data records holds relevance during each analysis operation. For instance, when deploying an image analysis algorithm on whole-slide images (WSI), the computational bottleneck often lies in the input-output (I/O) system. This is particularly notable as patch-level processing introduces a considerable I/O… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

  38. arXiv:2308.03137  [pdf, other

    eess.SP

    Digital Self-Interference Cancellation With Robust Multi-layered Total Least Mean Squares Adaptive Filters

    Authors: Shiyu Song, Yanqun Tang, Xizhang Wei, Yu Zhou, Xianjie Lu, Zhengpeng Wang, Songhu Ge

    Abstract: In simultaneous transmit and receive (STAR) wireless communications, digital self-interference (SI) cancellation is required before estimating the remote transmission (RT) channel. Considering the inherent connection between SI channel reconstruction and RT channel estimation, we propose a multi-layered M-estimate total least mean squares (m-MTLS) joint estimator to estimate both channels. In each… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  39. arXiv:2308.00507  [pdf, other

    eess.IV cs.CV cs.LG

    Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

    Authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, **gren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang

    Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that descr… ▽ More

    Submitted 13 September, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: MICCAI 2023

  40. arXiv:2307.14076  [pdf, other

    eess.SP

    A Phase-Coded Time-Domain Interleaved OTFS Waveform with Improved Ambiguity Function

    Authors: Jiajun Zhu, Yanqun Tang, Chao Yang, Chi Zhang, Haoran Yin, Jiaojiao Xiong, Yuhua Chen

    Abstract: Integrated sensing and communication (ISAC) is a significant application scenario in future wireless communication networks, and sensing capability of a waveform is always evaluated by the ambiguity function. To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved a… ▽ More

    Submitted 23 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted by 2023 IEEE Globecom Workshops (GC Wkshps): Workshop on Integrated Sensing and Communications for Internet of Things

  41. arXiv:2307.10824  [pdf, other

    eess.IV cs.CV

    Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

    Authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, **gren Zhou, Le Lu, Ling Zhang

    Abstract: Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

  42. arXiv:2307.04827  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

    Authors: Siting Xu, Yunlong Tang, Feng Zheng

    Abstract: Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the la… ▽ More

    Submitted 23 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted by International Computer Music Conference (ICMC) 2023

  43. arXiv:2306.04668  [pdf, other

    eess.IV cs.CV cs.LG

    SMRVIS: Point cloud extraction from 3-D ultrasound for non-destructive testing

    Authors: Lisa Y. W. Tang

    Abstract: We propose to formulate point cloud extraction from ultrasound volumes as an image segmentation problem. Through this convenient formulation, a quick prototype exploring various variants of the Residual Network, U-Net, and the Squeeze and Excitation Network was developed and evaluated. This report documents the experimental results compiled using a training dataset of five labeled ultrasound volum… ▽ More

    Submitted 15 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  44. arXiv:2306.02990  [pdf, other

    cs.IT cs.LG eess.SP

    Integrated Sensing, Computation, and Communication for UAV-assisted Federated Edge Learning

    Authors: Yao Tang, Guangxu Zhu, Wei Xu, Man Hon Cheung, Tat-Ming Lok, Shuguang Cui

    Abstract: Federated edge learning (FEEL) enables privacy-preserving model training through periodic communication between edge devices and the server. Unmanned Aerial Vehicle (UAV)-mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection. In UAV-assisted FEEL, sensing, computation, and communication are coupled and compete for limited onb… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  45. arXiv:2306.02309  [pdf, other

    eess.SY

    Synchronization of multiple rigid body systems: a survey

    Authors: X. **, Daniel W. C. Ho, Y. Tang

    Abstract: The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of… ▽ More

    Submitted 27 August, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

  46. arXiv:2306.02132  [pdf, ps, other

    math.OC eess.SY

    Formation Control with Unknown Directions and General Coupling Coefficients

    Authors: Zhen Li, Yang Tang, Yongqing Fan, Tingwen Huang

    Abstract: Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances.… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  47. arXiv:2306.01853  [pdf, other

    eess.IV cs.CV

    Multi-Contrast Computed Tomography Atlas of Healthy Pancreas

    Authors: Yinchi Zhou, Ho Hin Lee, Yucheng Tang, Xin Yu, Qi Yang, Shunxing Bao, Jeffrey M. Spraggins, Yuankai Huo, Bennett A. Landman

    Abstract: With the substantial diversity in population demographics, such as differences in age and body composition, the volumetric morphology of pancreas varies greatly, resulting in distinctive variations in shape and appearance. Such variations increase the difficulty at generalizing population-wide pancreas features. A volumetric spatial reference is needed to adapt the morphological variability for or… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  48. arXiv:2306.01084  [pdf, other

    cs.SD eess.AS

    Exploration on HuBERT with Multiple Resolutions

    Authors: Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu GOng, Juan Pino, Shinji Watanabe

    Abstract: Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT repr… ▽ More

    Submitted 22 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech2023

  49. arXiv:2306.00047  [pdf, other

    eess.IV cs.CV

    Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

    Authors: Ruining Deng, Yanwei Li, Peize Li, Jiacheng Wang, Lucas W. Remedios, Saydolimkhon Agzamkhodjaev, Zuhayr Asad, Quan Liu, Can Cui, Yaohong Wang, Yihan Wang, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Multi-class cell segmentation in high-resolution Giga-pixel whole slide images (WSI) is critical for various clinical applications. Training such an AI model typically requires labor-intensive pixel-wise manual annotation from experienced domain experts (e.g., pathologists). Moreover, such annotation is error-prone when differentiating fine-grained cell types (e.g., podocyte and mesangial cells) v… ▽ More

    Submitted 21 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  50. arXiv:2305.19530  [pdf, ps, other

    math.OC cs.RO eess.SY

    Geometric sliding mode control of mechanical systems on Lie groups

    Authors: Eduardo Espindola, Yu Tang

    Abstract: This paper presents a generalization of conventional sliding mode control designs for systems in Euclidean spaces to fully actuated simple mechanical systems whose configuration space is a Lie group for the trajectory-tracking problem. A generic kinematic control is first devised in the underlying Lie algebra, which enables the construction of a Lie group on the tangent bundle where the system sta… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 13 pages, 1 figure