Search | arXiv e-print repository

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.12254 [pdf, other]

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10911 [pdf, other]

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin **

Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction models are trained using annotations from previous speech-related challenges. However, compared to the speech domain, the singing domain faces data scarcity and stricter copyright protections, leading to a lack of high-quality MOS-annotated datasets for singing. To address this, we propose SingMOS, a high-quality and diverse MOS dataset for singing, covering a range of Chinese and Japanese datasets. These synthesized vocals are generated using state-of-the-art models in singing synthesis, conversion, or resynthesis tasks and are rated by professional annotators alongside real vocals. Data analysis demonstrates the diversity and reliability of our dataset. Additionally, we conduct further exploration on SingMOS, providing insights for singing MOS prediction and guidance for the continued expansion of SingMOS. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.08905 [pdf, other]

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin **

Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis. △ Less

Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.08416 [pdf, other]

TokSing: Singing Voice Synthesis based on Discrete Tokens

Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin **

Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody expression poses a great challenge for utilizing discrete tokens. In this paper, we introduce TokSing, a discrete-based SVS system equipped with a token formulator that offers flexible token blendings. We observe a melody degradation during discretization, prompting us to integrate a melody signal with the discrete token and incorporate a specially-designed melody enhancement strategy in the musical encoder. Extensive experiments demonstrate that our TokSing achieves better performance against the Mel spectrogram baselines while offering advantages in intermediate representation space cost and convergence speed. △ Less

Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.07725 [pdf, ps, other]

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Authors: Xuankai Chang, Jiatong Shi, **chuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin **

Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge, which focuses on new speech processing benchmarks using discrete units. It encompasses three pivotal tasks, namely multilingual automatic speech recognition, text-to-speech, and singing voice synthesis, and aims to assess the potential applicability of discrete units in these tasks. This paper outlines the challenge designs and baseline descriptions. We also collate baseline and selected submission systems, along with preliminary findings, offering valuable contributions to future research in this evolving field. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: This manuscript has been accepted by Interspeech2024

arXiv:2406.04001 [pdf, other]

Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex Lifting

Authors: Yang Zheng, Chih-Fan Pai, Yujie Tang

Abstract: Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and conv… ▽ More Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and convex reformulations, enabling convex analysis for nonconvex problems. Despite non-convexity and non-smoothness, the existence of an ECL not only reveals that minimizing the original function is equivalent to a convex problem but also certifies a class of first-order non-degenerate stationary points to be globally optimal. Therefore, no spurious stationarity exists in the set of non-degenerate policies. This ECL framework can cover many benchmark control problems, including state feedback linear quadratic regulator (LQR), dynamic output feedback linear quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ robust control. ECL can also handle a class of distributed control problems when the notion of quadratic invariance (QI) holds. We further show that all static stabilizing policies are non-degenerate for state feedback LQR and $\mathcal{H}_\infty$ control under standard assumptions. We believe that the new ECL framework may be of independent interest for analyzing nonconvex problems beyond control. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02438 [pdf, other]

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesized using state-of-the-art methods from publicly accessible singing voice datasets. CtrSVDD includes 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, spanning 14 deepfake methods and involving 164 singer identities. We also present a baseline system with flexible front-end features, evaluated against a structured train/dev/eval split. The experiments show the importance of feature selection and highlight a need for generalization towards deepfake methods that deviate further from training distribution. The CtrSVDD dataset and baselines are publicly accessible. △ Less

Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.02262 [pdf, other]

A DAFT Based Unified Waveform Design Framework for High-Mobility Communications

Authors: Xingyao Zhang, Haoran Yin, Yanqun Tang, Yu Zhou, Yuqing Liu, **ming Du, Yipeng Ding

Abstract: With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM),… ▽ More With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM), and affine frequency division multiplexing (AFDM). Among these, the AFDM is a strong candidate for its low implementation complexity and ability to achieve optimal diversity. This paper unifies the waveforms based on the discrete affine Fourier transform (DAFT) by using the chirp slope factor "k" in the time-frequency representation to construct a unified design framework for high-mobility communications. The design framework is employed to verify that the bit error rate performance of the DAFT-based waveform can be enhanced when the signal-to-noise ratio (SNR) is sufficiently high by adjusting the chirp slope factor "k". △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.18356 [pdf, other]

Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3,410 CT volumes assembled from 14 publicly available datasets and then test it on 6,173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6x faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted to Medical Image Analysis

arXiv:2405.05244 [pdf, other]

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specialized field requiring focused attention. To promote SVDD research, we recently proposed the "SVDD Challenge," the very first research challenge focusing on SVDD for lab-controlled and in-the-wild bonafide and deepfake singing voice recordings. The challenge will be held in conjunction with the 2024 IEEE Spoken Language Technology Workshop (SLT 2024). △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

arXiv:2405.00372 [pdf, other]

High-Precision Positioning with Continuous Delay and Doppler Shift using AFT-MC Waveforms

Authors: Cong Yi, Haoran Yin, Xianjie Lu, Yanqun Tang

Abstract: This paper explores a novel integrated localization and communication (ILAC) system using the affine Fourier transform multicarrier (AFT-MC) waveform. Specifically, we consider a multiple-input multiple-output (MIMO) AFT-MC system with ILAC and derive a continuous delay and Doppler shift channel matrix model. Based on the derived signal model, we develop a two-step algorithm with low complexity fo… ▽ More This paper explores a novel integrated localization and communication (ILAC) system using the affine Fourier transform multicarrier (AFT-MC) waveform. Specifically, we consider a multiple-input multiple-output (MIMO) AFT-MC system with ILAC and derive a continuous delay and Doppler shift channel matrix model. Based on the derived signal model, we develop a two-step algorithm with low complexity for estimating channel parameters. Furthermore, we derive the Cramér-Rao lower bound (CRLB) of location estimation as the fundamental limit of localization. Finally, we provide some insights about the AFT-MC parameters by explaining the impact of the parameters on localization performance. Simulation results demonstrate that the AFT-MC waveform is able to provide significant localization performance improvement compared to orthogonal frequency division multiplexing (OFDM) while achieving the CRLB of location estimation. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.12595 [pdf, other]

Deep Reinforcement Learning-aided Transmission Design for Energy-efficient Link Optimization in Vehicular Communications

Authors: Zhengpeng Wang, Yanqun Tang, Yingzhe Mao, Tao Wang, Xiunan Huang

Abstract: This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (S… ▽ More This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (SI-D3QN), a DRL algorithm integrating both double deep Q-Network and Dueling deep Q-Network, for the joint design of modulation and coding scheme (MCS) selection and power control. To be more specific, we employ SI techique to enhance link performance and assit the D3QN agent in refining its decision-making processes. The experiment results demonstrate that, across various optimization tasks, our proposed SI-D3QN agent outperforms the benchmark algorithms in terms of the valid actions and link performance metrics. Particularly, while ensuring significant improvement in energy efficiency, the agent facilitates a 29.6% enhancement in the link throughput under the same energy consumption. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures

arXiv:2404.07989 [pdf, other]

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantly, their frameworks are mainly designed for 2D models, lacking a general any-to-3D paradigm. In this paper, we introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding. Given a frozen transformer from any source modality, we propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality. This mechanism enables us to assign each 3D token with a positional encoding paired with the pre-trained model, which avoids 3D geometry loss caused by the true projection and better motivates the transformer for 3D learning with 1D/2D positional priors. Then, within each transformer block, we insert an any-to-3D guided adapter module for parameter-efficient fine-tuning. The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers. We conduct extensive experiments to showcase the effectiveness and efficiency of our method. Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point. △ Less

Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

arXiv:2404.04878 [pdf, other]

CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to surmount these challenges, enhancing inter-slice resolution and overall 3D medical imaging quality. However, existing approaches confront inherent challenges: 1) often tailored to specific upsampling factors, lacking flexibility for diverse clinical scenarios; 2) newly generated slices frequently suffer from over-smoothing, degrading fine details, and leading to inter-slice inconsistency. In response, this study presents CycleINR, a novel enhanced Implicit Neural Representation model for 3D medical data volumetric super-resolution. Leveraging the continuity of the learned implicit function, the CycleINR model can achieve results with arbitrary up-sampling rates, eliminating the need for separate training. Additionally, we enhance the grid sampling in CycleINR with a local attention mechanism and mitigate over-smoothing by integrating cycle-consistent loss. We introduce a new metric, Slice-wise Noise Level Inconsistency (SNLI), to quantitatively assess inter-slice noise level inconsistency. The effectiveness of our approach is demonstrated through image quality evaluations on an in-house dataset and a downstream task analysis on the Medical Segmentation Decathlon liver tumor dataset. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: CVPR accepted paper

arXiv:2404.01088 [pdf, other]

GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems

Authors: Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan

Abstract: The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate da… ▽ More The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate data-pilot interference, leading to spectral efficiency degradation. In this paper, we propose a GI-free pilot-aided channel estimation algorithm for AFDM systems, which improves spectral efficiency significantly. To mitigate the interference between the pilot and data symbols caused by the absence of GI, we perform joint interference cancellation, channel estimation, and signal detection iterately. Simulation results show that the bit error rate (BER) performance of the proposed method can approach the ideal case with perfect channel estimation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.07453 [pdf, other]

Humans-in-the-Building: Getting Rid of Thermostats for Optimal Thermal Comfort Control in Energy Management Systems

Authors: Jiali Wang, Yang Tang, Luca Schenato

Abstract: Given the widespread attention to individual thermal comfort, coupled with significant energy-saving potential inherent in energy management systems for optimizing indoor environments, this paper aims to introduce advanced "Humans-in-the-building" control techniques to redefine the paradigm of indoor temperature design. Firstly, we innovatively redefine the role of individuals in the control loop,… ▽ More Given the widespread attention to individual thermal comfort, coupled with significant energy-saving potential inherent in energy management systems for optimizing indoor environments, this paper aims to introduce advanced "Humans-in-the-building" control techniques to redefine the paradigm of indoor temperature design. Firstly, we innovatively redefine the role of individuals in the control loop, establishing a model for users' thermal comfort and constructing discomfort signals based on individual preferences. Unlike traditional temperature-centric approaches, "thermal comfort control" prioritizes personalized comfort. Then, considering the diversity among users, we propose a novel method to determine the optimal indoor temperature range, thus minimizing discomfort for various users and reducing building energy consumption. Finally, the efficacy of the "thermal comfort control" approach is substantiated through simulations conducted using Matlab. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.03771 [pdf, other]

doi 10.1109/TVT.2024.3375027

Joint Sparsity Pattern Learning Based Channel Estimation for Massive MIMO-OTFS Systems

Authors: Kuo Meng, Shaoshi Yang, Xiao-Yang Wang, Yan Bu, Yurong Tang, Jianhua Zhang, Lajos Hanzo

Abstract: We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) orthogonal time-frequency-space (OTFS) modulation aided systems. By exploiting the potential joint sparsity of the delay-Doppler-angle (DDA) domain channel, the channel estimation problem is transformed into a sparse recovery problem. To solve it, we first apply the sp… ▽ More We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) orthogonal time-frequency-space (OTFS) modulation aided systems. By exploiting the potential joint sparsity of the delay-Doppler-angle (DDA) domain channel, the channel estimation problem is transformed into a sparse recovery problem. To solve it, we first apply the spike and slab prior model to iteratively estimate the support set of the channel matrix, and a higher-accuracy parameter update rule relying on the identified support set is introduced into the iteration. Then the specific values of the channel elements corresponding to the support set are estimated by the orthogonal matching pursuit (OMP) method. Both our simulation results and analysis demonstrate that the proposed JSPL channel estimation scheme achieves an improved performance over the representative state-of-the-art baseline schemes, despite its reduced pilot overhead. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 6 pages, 6 figures, accepted to appear on IEEE Transactions on Vehicular Technology, Mar. 2024

arXiv:2402.19286 [pdf, other]

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy. In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

arXiv:2402.04356 [pdf, other]

Bidirectional Autoregressive Diffusion Model for Dance Generation

Authors: Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, **g Xiao, Song Wang

Abstract: Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create… ▽ More Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation. △ Less

Submitted 22 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.17619 [pdf, ps, other]

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

Authors: Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin **, Shinji Watanabe

Abstract: In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu… ▽ More In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatural voice synthesis. This innovative method has led to the creation of two expansive singing voice datasets, ACE-Opencpop and ACE-KiSing, which are instrumental for large-scale, multi-singer voice synthesis. Through thorough experimentation, we establish that these datasets not only serve as new benchmarks for SVS but also enhance SVS performance on other singing voice datasets when used as supplementary resources. The corpora, pre-trained models, and their related training recipes are publicly available at ESPnet-Muskits (\url{https://github.com/espnet/espnet}) △ Less

Submitted 12 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: Accepted by Interspeech2024

arXiv:2401.08837 [pdf]

Image Fusion in Remote Sensing: An Overview and Meta Analysis

Authors: Hessah Albanwan, Rongjun Qin, Yang Tang

Abstract: Image fusion in Remote Sensing (RS) has been a consistent demand due to its ability to turn raw images of different resolutions, sources, and modalities into accurate, complete, and spatio-temporally coherent images. It greatly facilitates downstream applications such as pan-sharpening, change detection, land-cover classification, etc. Yet, image fusion solutions are highly disparate to various re… ▽ More Image fusion in Remote Sensing (RS) has been a consistent demand due to its ability to turn raw images of different resolutions, sources, and modalities into accurate, complete, and spatio-temporally coherent images. It greatly facilitates downstream applications such as pan-sharpening, change detection, land-cover classification, etc. Yet, image fusion solutions are highly disparate to various remote sensing problems and thus are often narrowly defined in existing reviews as topical applications, such as pan-sharpening, and spatial-temporal image fusion. Considering that image fusion can be theoretically applied to any gridded data through pixel-level operations, in this paper, we expanded its scope by comprehensively surveying relevant works with a simple taxonomy: 1) many-to-one image fusion; 2) many-to-many image fusion. This simple taxonomy defines image fusion as a map** problem that turns either a single or a set of images into another single or set of images, depending on the desired coherence, e.g., spectral, spatial/resolution coherence, etc. We show that this simple taxonomy, despite the significant modality difference it covers, can be presented by a conceptually easy framework. In addition, we provide a meta-analysis to review the major papers studying the various types of image fusion and their applications over the years (from the 1980s to date), covering 5,926 peer-reviewed papers. Finally, we discuss the main benefits and emerging challenges to provide open research directions and potential future works. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 21pages, 10 figures

arXiv:2401.03060 [pdf]

Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement

Authors: Ho Hin Lee, Adam M. Saunders, Michael E. Kim, Samuel W. Remedios, Lucas W. Remedios, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Chloe Cho, Louise A. Mawn, Tonia S. Rex, Kevin L. Schey, Blake E. Dewey, Jeffrey M. Spraggins, Jerry L. Prince, Yuankai Huo, Bennett A. Landman

Abstract: Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details… ▽ More Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details from scans with a low through-plane resolution compared to a high in-plane resolution, we apply a deep learning-based super-resolution algorithm. Then, we generate an initial unbiased reference with an iterative metric-based registration using a small portion of subject scans. We register the remaining scans to this template and refine the template using an unsupervised deep probabilistic approach that generates a more expansive deformation field to enhance the organ boundary alignment. We demonstrate this framework using magnetic resonance images across four different tissue contrasts, generating four atlases in separate spatial alignments. Results: For each tissue contrast, we find a significant improvement using the Wilcoxon signed-rank test in the average Dice score across four labeled regions compared to a standard registration framework consisting of rigid, affine, and deformable transformations. These results highlight the effective alignment of eye organs and boundaries using our proposed process. Conclusions: By combining super-resolution preprocessing and deep probabilistic models, we address the challenge of generating an eye atlas to serve as a standardized reference across a largely variable population. △ Less

Submitted 14 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: Revised for submission to SPIE Journal of Medical Imaging. 26 pages, 6 figures

arXiv:2312.15380 [pdf, other]

Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei

Abstract: Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste… ▽ More Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaster MEC environment with unstable 5G communication, where device-to-device (D2D) link communication and dynamic voltage and frequency scaling (DVFS) are adopted to balance each user's requirement for task delay and energy consumption. A battery degradation evaluation approach to prolong battery lifetime is also presented. The distributed optimization problem is formulated into a mixed cooperative-competitive (MCC) multi-agent Markov decision process (MAMDP) and is tackled with recurrent multi-agent Proximal Policy Optimization (rMAPPO). Extensive simulations and comprehensive comparisons with other representative algorithms clearly demonstrate the effectiveness of the proposed rMAPPO-based offloading scheme. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: accepted by wcnc2024

arXiv:2312.15332 [pdf, other]

Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

Authors: Yang Zheng, Chih-fan Pai, Yujie Tang

Abstract: Direct policy search has achieved great empirical success in reinforcement learning. Many recent studies have revisited its theoretical foundation for continuous control, which reveals elegant nonconvex geometry in various benchmark problems, especially in fully observable state-feedback cases. This paper considers two fundamental optimal and robust control problems with partial observability: the… ▽ More Direct policy search has achieved great empirical success in reinforcement learning. Many recent studies have revisited its theoretical foundation for continuous control, which reveals elegant nonconvex geometry in various benchmark problems, especially in fully observable state-feedback cases. This paper considers two fundamental optimal and robust control problems with partial observability: the Linear Quadratic Gaussian (LQG) control with stochastic noises, and $\mathcal{H}_\infty$ robust control with adversarial noises. In the policy space, the former problem is smooth but nonconvex, while the latter one is nonsmooth and nonconvex. We highlight some interesting and surprising ``discontinuity'' of LQG and $\mathcal{H}_\infty$ cost functions around the boundary of their domains. Despite the lack of convexity (and possibly smoothness), our main results show that for a class of non-degenerate policies, all Clarke stationary points are globally optimal and there is no spurious local minimum for both LQG and $\mathcal{H}_\infty$ control. Our proof techniques rely on a new and unified framework of Extended Convex Lifting (ECL), which reconciles the gap between nonconvex policy optimization and convex reformulations. This ECL framework is of independent interest, and we will discuss its details in Part II of this paper. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 79 pages, 12 figures

arXiv:2312.11125 [pdf, other]

A Low-Complexity Range Estimation with Adjusted Affine Frequency Division Multiplexing Waveform

Authors: Jiajun Zhu, Yanqun Tang, Xizhang Wei, Haoran Yin, **ming Du, Zhengpeng Wang, Yuqinng Liu

Abstract: Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface… ▽ More Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface waveform in integrated sensing and communication (ISAC) applications. In this paper, we investigate an AFDM-based radar system and analyze the radar ambiguity function of AFDM with different built-in parameters, based on which we find an AFDM waveform with the specific parameter c2 owns the near-optimal time-domain ambiguity function. Then a low-complexity algorithm based on matched filtering for high-resolution target range estimation is proposed for this specific AFDM waveform. Through simulation and analysis, the specific AFDM waveform has near-optimal range estimation performance with the proposed low-complexity algorithm while having the same bit error rate (BER) performance as orthogonal time frequency space (OTFS) using simple linear minimum mean square error (LMMSE) equalizer. △ Less

Submitted 29 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The paper has been submitted to IEEE WCNC 2024 WS-13: Mobile Sensing-Communication-Computation Synergy for 6G Internet of Things

arXiv:2312.09911 [pdf, other]

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. Additionally, it provides interactive visualizations and demonstrations of classic models for educational purposes. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. △ Less

Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Amphion Website: https://github.com/open-mmlab/Amphion

arXiv:2311.17631 [pdf, other]

Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks

Authors: Xianlun Peng, Yang Tang, Fangfei Li, Yang Liu

Abstract: In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obta… ▽ More In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obtains optimal attack strategies for large-scale PBCNs that the standard QL algorithm cannot handle. Finally, we verify the effectiveness of our proposed approach by considering two attacked PBCNs, including a 10-node network and a 28-node network. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.00372 [pdf, ps, other]

Zeroth-Order Feedback-Based Optimization for Distributed Demand Response

Authors: Ruiyang **, Yujie Tang, Jie Song

Abstract: Distributed demand response is a typical distributed optimization problem that requires coordination among multiple agents to satisfy demand response requirements. However, existing distributed algorithms for this problem still face challenges such as unknown system models, nonconvexity, privacy issues, etc. To address these challenges, we propose and analyze two distributed algorithms, in which t… ▽ More Distributed demand response is a typical distributed optimization problem that requires coordination among multiple agents to satisfy demand response requirements. However, existing distributed algorithms for this problem still face challenges such as unknown system models, nonconvexity, privacy issues, etc. To address these challenges, we propose and analyze two distributed algorithms, in which the agents do not share their information and instead perform local updates using zeroth-order feedback information to estimate the gradient of the global objective function. One algorithm applies to problems with general convex and compact feasible sets but has higher oracle complexity bounded by $O(d/ε^2)$, while the other algorithm achieves lower complexity bound $O(d/ε)$ but is only applicable to problems with box constraints. We conduct empirical experiments to validate their performance. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.12286 [pdf]

System identification and closed-loop control of laser hot-wire directed energy deposition using the parameter-signature-property modeling scheme

Authors: M. Rahmani Dehaghani, Atieh Sahraeidolatkhaneh, Morgan Nilsen, Fredrik Sikström, Pouyan Sajadi, Yifan Tang, G. Gary Wang

Abstract: Hot-wire directed energy deposition using a laser beam (DED-LB/w) is a method of metal additive manufacturing (AM) that has benefits of high material utilization and deposition rate, but parts manufactured by DED-LB/w suffer from a substantial heat input and undesired surface finish. Hence, monitoring and controlling the process parameters and signatures during the deposition is crucial to ensure… ▽ More Hot-wire directed energy deposition using a laser beam (DED-LB/w) is a method of metal additive manufacturing (AM) that has benefits of high material utilization and deposition rate, but parts manufactured by DED-LB/w suffer from a substantial heat input and undesired surface finish. Hence, monitoring and controlling the process parameters and signatures during the deposition is crucial to ensure the quality of final part properties and geometries. This paper explores the dynamic modeling of the DED-LB/w process and introduces a parameter-signature-property modeling and control approach to enhance the quality of modeling and control of part properties that cannot be measured in situ. The study investigates different process parameters that influence the melt pool width (signature) and bead width (property) in single and multi-layer beads. The proposed modeling approach utilizes a parameter-signature model as F_1 and a signature-property model as F_2. Linear and nonlinear modeling approaches are compared to describe a dynamic relationship between process parameters and a process signature, the melt pool width (F_1). A fully connected artificial neural network is employed to model and predict the final part property, i.e., bead width, based on melt pool signatures (F_2). Finally, the effectiveness and usefulness of the proposed parameter-signature-property modeling is tested and verified by integrating the parameter-signature (F_1) and signature-property (F_2) models in the closed-loop control of the width of the part. Compared with the control loop with only F_1, the proposed method shows clear advantages and bears potential to be applied to control other part properties that cannot be directly measured or monitored in situ. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 28 pages, 14 figures, 4 tables,

arXiv:2310.07141 [pdf, ps, other]

Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems

Authors: Yuankun Tang, Anjie Zhang, Miaowen Wen, Yu Huang, Fei Ji, **ming Wen

Abstract: Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o… ▽ More Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency offset by comparing the correlations of samples. Moreover, we propose the other so-called stepwise ML estimator to reduce the complexity. Both proposed estimators exploit the redundant information contained within the chirp-periodic prefix inherent in AFDM symbols, thus dispensing with any additional pilots. To further mitigate the intercarrier interference resulting from the residual frequency offset, we design a mirror-map**-based scheme for AFDM systems. Numerical results verify the effectiveness of the proposed time and carrier frequency offset estimation criteria and the mirror-map**-based modulation for AFDM systems. △ Less

Submitted 28 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: accepted by IEEE Wireless Communications and Networking Conference (WCNC) 2024

arXiv:2310.05513 [pdf, other]

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Authors: Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-** Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track w… ▽ More The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted by ASRU

arXiv:2309.09392 [pdf, other]

Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different organs/tissues captured. To address this issue, we propose C-SliceGen, which takes an arbitrary axial slice in the abdominal region as a condition and generates a pre-defined vertebral level slice by estimating structural changes in the latent space. Our experiments on 2608 volumetric CT data from two in-house datasets and 50 subjects from the 2015 Multi-Atlas Abdomen Labeling Challenge dataset (BTCV) Challenge demonstrate that our model can generate high-quality images that are realistic and similar. We further evaluate our method's capability to harmonize longitudinal positional variation on 1033 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset, which contains longitudinal single abdominal slices, and confirmed that our method can harmonize the slice positional variance in terms of visceral fat area. This approach provides a promising direction for map** slices from different vertebral levels to a target slice and reducing positional variance for single-slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.04071 [pdf, other]

doi 10.1117/12.3009084

Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration

Authors: Xin Yu, Yucheng Tang, Qi Yang, Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

Abstract: Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potentia… ▽ More Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potential, the task of generalizing deep learning techniques for intracranial measurements faces data availability constraints due to limited manually annotated atlases encompassing whole brain and TICV/PFV labels. In this paper, we enhancing the hierarchical transformer UNesT for whole brain segmentation to achieve segmenting whole brain with 133 classes and TICV/PFV simultaneously. To address the problem of data scarcity, the model is first pretrained on 4859 T1-weighted (T1w) 3D volumes sourced from 8 different sites. These volumes are processed through a multi-atlas segmentation pipeline for label generation, while TICV/PFV labels are unavailable. Subsequently, the model is finetuned with 45 T1w 3D volumes from Open Access Series Imaging Studies (OASIS) where both 133 whole brain classes and TICV/PFV labels are available. We evaluate our method with Dice similarity coefficients(DSC). We show that our model is able to conduct precise TICV/PFV estimation while maintaining the 132 brain regions performance at a comparable level. Code and trained model are available at: https://github.com/MASILab/UNesT/tree/main/wholebrainSeg. △ Less

Submitted 10 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.00721 [pdf, ps, other]

Geometric Tracking on $\mathcal{S}^{3}$ Based on Sliding Mode Control

Authors: Eduardo Espindola, Yu Tang

Abstract: Attitude tracking on the unit sphere of dimension $3$ based on sliding mode is considered in this paper. The tangent bundle of Lagrangian dynamics that describes the rotational motion of a rigid body is first shown to be a Lie group, and then a sliding surface that emerged on it is defined. Next, a sliding-mode controller is designed for attitude tracking that relies on an intrinsic error defined… ▽ More Attitude tracking on the unit sphere of dimension $3$ based on sliding mode is considered in this paper. The tangent bundle of Lagrangian dynamics that describes the rotational motion of a rigid body is first shown to be a Lie group, and then a sliding surface that emerged on it is defined. Next, a sliding-mode controller is designed for attitude tracking that relies on an intrinsic error defined on the Lie group. Almost global asymptotic stability of the closed loop is demonstrated using the Lyapunov analysis. Numerical simulations are included to compare the performance of the sliding mode controller designed on the Lie group with that designed in the embedding Euclidean space. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.05785 [pdf, other]

Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning

Authors: Xueyuan Li, Ruining Deng, Yucheng Tang, Shunxing Bao, Haichun Yang, Yuankai Huo

Abstract: Precise identification of multiple cell classes in high-resolution Giga-pixel whole slide imaging (WSI) is critical for various clinical scenarios. Building an AI model for this purpose typically requires pixel-level annotations, which are often unscalable and must be done by skilled domain experts (e.g., pathologists). However, these annotations can be prone to errors, especially when distinguish… ▽ More Precise identification of multiple cell classes in high-resolution Giga-pixel whole slide imaging (WSI) is critical for various clinical scenarios. Building an AI model for this purpose typically requires pixel-level annotations, which are often unscalable and must be done by skilled domain experts (e.g., pathologists). However, these annotations can be prone to errors, especially when distinguishing between intricate cell types (e.g., podocytes and mesangial cells) using only visual inspection. Interestingly, a recent study showed that lay annotators, when using extra immunofluorescence (IF) images for reference (referred to as molecular-empowered learning), can sometimes outperform domain experts in labeling. Despite this, the resource-intensive task of manual delineation remains a necessity during the annotation process. In this paper, we explore the potential of bypassing pixel-level delineation by employing the recent segment anything model (SAM) on weak box annotation in a zero-shot learning approach. Specifically, we harness SAM's ability to produce pixel-level annotations from box annotations and utilize these SAM-generated labels to train a segmentation model. Our findings show that the proposed SAM-assisted molecular-empowered learning (SAM-L) can diminish the labeling efforts for lay annotators by only requiring weak box annotations. This is achieved without compromising annotation accuracy or the performance of the deep learning-based segmentation. This research represents a significant advancement in democratizing the annotation process for training pathological image segmentation, relying solely on non-expert annotators. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.05784 [pdf, other]

High-performance Data Management for Whole Slide Image Analysis in Digital Pathology

Authors: Haoju Leng, Ruining Deng, Shunxing Bao, Dazheng Fang, Bryan A. Millis, Yucheng Tang, Haichun Yang, Xiao Wang, Yifan Peng, Lipeng Wan, Yuankai Huo

Abstract: When dealing with giga-pixel digital pathology in whole-slide imaging, a notable proportion of data records holds relevance during each analysis operation. For instance, when deploying an image analysis algorithm on whole-slide images (WSI), the computational bottleneck often lies in the input-output (I/O) system. This is particularly notable as patch-level processing introduces a considerable I/O… ▽ More When dealing with giga-pixel digital pathology in whole-slide imaging, a notable proportion of data records holds relevance during each analysis operation. For instance, when deploying an image analysis algorithm on whole-slide images (WSI), the computational bottleneck often lies in the input-output (I/O) system. This is particularly notable as patch-level processing introduces a considerable I/O load onto the computer system. However, this data management process could be further paralleled, given the typical independence of patch-level image processes across different patches. This paper details our endeavors in tackling this data access challenge by implementing the Adaptable IO System version 2 (ADIOS2). Our focus has been constructing and releasing a digital pathology-centric pipeline using ADIOS2, which facilitates streamlined data management across WSIs. Additionally, we've developed strategies aimed at curtailing data retrieval times. The performance evaluation encompasses two key scenarios: (1) a pure CPU-based image analysis scenario ("CPU scenario"), and (2) a GPU-based deep learning framework scenario ("GPU scenario"). Our findings reveal noteworthy outcomes. Under the CPU scenario, ADIOS2 showcases an impressive two-fold speed-up compared to the brute-force approach. In the GPU scenario, its performance stands on par with the cutting-edge GPU I/O acceleration framework, NVIDIA Magnum IO GPU Direct Storage (GDS). From what we know, this appears to be among the initial instances, if any, of utilizing ADIOS2 within the field of digital pathology. The source code has been made publicly available at https://github.com/hrlblab/adios. △ Less

Submitted 20 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.03137 [pdf, other]

Digital Self-Interference Cancellation With Robust Multi-layered Total Least Mean Squares Adaptive Filters

Authors: Shiyu Song, Yanqun Tang, Xizhang Wei, Yu Zhou, Xianjie Lu, Zhengpeng Wang, Songhu Ge

Abstract: In simultaneous transmit and receive (STAR) wireless communications, digital self-interference (SI) cancellation is required before estimating the remote transmission (RT) channel. Considering the inherent connection between SI channel reconstruction and RT channel estimation, we propose a multi-layered M-estimate total least mean squares (m-MTLS) joint estimator to estimate both channels. In each… ▽ More In simultaneous transmit and receive (STAR) wireless communications, digital self-interference (SI) cancellation is required before estimating the remote transmission (RT) channel. Considering the inherent connection between SI channel reconstruction and RT channel estimation, we propose a multi-layered M-estimate total least mean squares (m-MTLS) joint estimator to estimate both channels. In each layer, our proposed m-MTLS estimator first employs an M-estimate total least mean squares (MTLS) algorithm to eliminate residual SI from the received signal and give a new estimation of the RT channel. Then, it gives the final RT channel estimation based on the weighted sum of the estimation values obtained from each layer. Compared to traditional minimum mean square error (MMSE) estimator and single-layered MTLS estimator, it demonstrates that the m-MTLS estimator has better performance of normalized mean squared difference (NMSD). Besides, the simulation results also show the robustness of m-MTLS estimator even in scenarios where the local reference signal is contaminated with noise, and the received signal is impacted by strong impulse noise. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2308.00507 [pdf, other]

Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

Authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, **gren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang

Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that descr… ▽ More Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that describes the precise relationship between the tumor and vessels in CT images of different patients, adopting it as a major feature for prognosis prediction. Besides, different from existing models that used CNNs or LSTMs to exploit tumor enhancement patterns on dynamic contrast-enhanced CT imaging, we improved the extraction of dynamic tumor-related texture features in multi-phase contrast-enhanced CT by fusing local and global features using CNN and transformer modules, further enhancing the features extracted across multi-phase CT images. We extensively evaluated and compared the proposed method with existing methods in the multi-center (n=4) dataset with 1,070 patients with PDAC, and statistical analysis confirmed its clinical effectiveness in the external test set consisting of three centers. The developed risk marker was the strongest predictor of overall survival among preoperative factors and it has the potential to be combined with established clinical factors to select patients at higher risk who might benefit from neoadjuvant therapy. △ Less

Submitted 13 September, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: MICCAI 2023

arXiv:2307.14076 [pdf, other]

A Phase-Coded Time-Domain Interleaved OTFS Waveform with Improved Ambiguity Function

Authors: Jiajun Zhu, Yanqun Tang, Chao Yang, Chi Zhang, Haoran Yin, Jiaojiao Xiong, Yuhua Chen

Abstract: Integrated sensing and communication (ISAC) is a significant application scenario in future wireless communication networks, and sensing capability of a waveform is always evaluated by the ambiguity function. To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved a… ▽ More Integrated sensing and communication (ISAC) is a significant application scenario in future wireless communication networks, and sensing capability of a waveform is always evaluated by the ambiguity function. To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved ambiguity function. TICP4-OTFS can achieve superior autocorrelation features in both the time and frequency domains by exploiting the multicarrier-like form of OTFS after interleaved and the favorable autocorrelation attributes of the P4 code. Furthermore, we present the vectorized formulation of TICP4-OTFS modulation as well as its signal structure in each domain. Numerical simulations show that our proposed TICP4-OTFS waveform outperforms OTFS with a narrower mainlobe as well as lower and more distant sidelobes in terms of delay and Doppler-dimensional ambiguity functions, and an instance of range estimation using pulse compression is illustrated to exhibit the proposed waveform\u2019s greater resolution. Besides, TICP4-OTFS achieves better performance of bit error rate for communication in low signal-to-noise ratio (SNR) scenarios. △ Less

Submitted 23 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: This paper has been accepted by 2023 IEEE Globecom Workshops (GC Wkshps): Workshop on Integrated Sensing and Communications for Internet of Things

arXiv:2307.10824 [pdf, other]

Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

Authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, **gren Zhou, Le Lu, Ling Zhang

Abstract: Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as… ▽ More Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: MICCAI 2023

arXiv:2307.04827 [pdf, other]

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

Authors: Siting Xu, Yunlong Tang, Feng Zheng

Abstract: Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the la… ▽ More Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/. △ Less

Submitted 23 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted by International Computer Music Conference (ICMC) 2023

arXiv:2306.04668 [pdf, other]

SMRVIS: Point cloud extraction from 3-D ultrasound for non-destructive testing

Authors: Lisa Y. W. Tang

Abstract: We propose to formulate point cloud extraction from ultrasound volumes as an image segmentation problem. Through this convenient formulation, a quick prototype exploring various variants of the Residual Network, U-Net, and the Squeeze and Excitation Network was developed and evaluated. This report documents the experimental results compiled using a training dataset of five labeled ultrasound volum… ▽ More We propose to formulate point cloud extraction from ultrasound volumes as an image segmentation problem. Through this convenient formulation, a quick prototype exploring various variants of the Residual Network, U-Net, and the Squeeze and Excitation Network was developed and evaluated. This report documents the experimental results compiled using a training dataset of five labeled ultrasound volumes and 84 unlabeled volumes that got completed in a two-week period as part of a submission to the open challenge "3D Surface Mesh Estimation for CVPR workshop on Deep Learning in Ultrasound Image Analysis". Based on external evaluation performed by the challenge's organizers, the framework came first place on the challenge's \href{https://www.cvpr2023-dl-ultrasound.com/}{Leaderboard}. Source code is shared with the research community at a \href{https://github.com/lisatwyw/smrvis}{public repository}. △ Less

Submitted 15 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.02990 [pdf, other]

Integrated Sensing, Computation, and Communication for UAV-assisted Federated Edge Learning

Authors: Yao Tang, Guangxu Zhu, Wei Xu, Man Hon Cheung, Tat-Ming Lok, Shuguang Cui

Abstract: Federated edge learning (FEEL) enables privacy-preserving model training through periodic communication between edge devices and the server. Unmanned Aerial Vehicle (UAV)-mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection. In UAV-assisted FEEL, sensing, computation, and communication are coupled and compete for limited onb… ▽ More Federated edge learning (FEEL) enables privacy-preserving model training through periodic communication between edge devices and the server. Unmanned Aerial Vehicle (UAV)-mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection. In UAV-assisted FEEL, sensing, computation, and communication are coupled and compete for limited onboard resources, and UAV deployment also affects sensing and communication performance. Therefore, the joint design of UAV deployment and resource allocation is crucial to achieving the optimal training performance. In this paper, we address the problem of joint UAV deployment design and resource allocation for FEEL via a concrete case study of human motion recognition based on wireless sensing. We first analyze the impact of UAV deployment on the sensing quality and identify a threshold value for the sensing elevation angle that guarantees a satisfactory quality of data samples. Due to the non-ideal sensing channels, we consider the probabilistic sensing model, where the successful sensing probability of each UAV is determined by its position. Then, we derive the upper bound of the FEEL training loss as a function of the sensing probability. Theoretical results suggest that the convergence rate can be improved if UAVs have a uniform successful sensing probability. Based on this analysis, we formulate a training time minimization problem by jointly optimizing UAV deployment, integrated sensing, computation, and communication (ISCC) resources under a desirable optimality gap constraint. To solve this challenging mixed-integer non-convex problem, we apply the alternating optimization technique, and propose the bandwidth, batch size, and position optimization (BBPO) scheme to optimize these three decision variables alternately. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.02309 [pdf, other]

Synchronization of multiple rigid body systems: a survey

Authors: X. **, Daniel W. C. Ho, Y. Tang

Abstract: The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of… ▽ More The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of the dynamics of rigid body systems, synchronization of multiple rigid body systems is quite challenging. This paper aims to present an overview of the recent progress in synchronization of multiple rigid body systems from the view of two fundamental problems. The first problem focuses on attitude synchronization, while the second one focuses on cooperative motion control in that rotation and translation dynamics are coupled. Finally, a summary and future directions are given in the conclusion. △ Less

Submitted 27 August, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2306.02132 [pdf, ps, other]

Formation Control with Unknown Directions and General Coupling Coefficients

Authors: Zhen Li, Yang Tang, Yongqing Fan, Tingwen Huang

Abstract: Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances.… ▽ More Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances. This paper introduces these phenomena into a class of displacement-based formation control problem. Then, a geometric approach have been employed to overcome the difficulty of analysis on the introduced phenomena. The purpose of this approach is to construct some convex polytopes for containing the effects caused by the unknown direction, and to analyze the non-convexity by admitting the negative coupling coefficients in a certain range. Under the actions of these phenomena, the constructed polytopes are shown to be invariant in view of the contractive set method. It means that the convergence of formation shape can be guaranteed. Subsequently, an example is given to examine the applicability of derived result. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2306.01853 [pdf, other]

Multi-Contrast Computed Tomography Atlas of Healthy Pancreas

Authors: Yinchi Zhou, Ho Hin Lee, Yucheng Tang, Xin Yu, Qi Yang, Shunxing Bao, Jeffrey M. Spraggins, Yuankai Huo, Bennett A. Landman

Abstract: With the substantial diversity in population demographics, such as differences in age and body composition, the volumetric morphology of pancreas varies greatly, resulting in distinctive variations in shape and appearance. Such variations increase the difficulty at generalizing population-wide pancreas features. A volumetric spatial reference is needed to adapt the morphological variability for or… ▽ More With the substantial diversity in population demographics, such as differences in age and body composition, the volumetric morphology of pancreas varies greatly, resulting in distinctive variations in shape and appearance. Such variations increase the difficulty at generalizing population-wide pancreas features. A volumetric spatial reference is needed to adapt the morphological variability for organ-specific analysis. Here, we proposed a high-resolution computed tomography (CT) atlas framework specifically optimized for the pancreas organ across multi-contrast CT. We introduce a deep learning-based pre-processing technique to extract the abdominal region of interests (ROIs) and leverage a hierarchical registration pipeline to align the pancreas anatomy across populations. Briefly, DEEDs affine and non-rigid registration are performed to transfer patient abdominal volumes to a fixed high-resolution atlas template. To generate and evaluate the pancreas atlas template, multi-contrast modality CT scans of 443 subjects (without reported history of pancreatic disease, age: 15-50 years old) are processed. Comparing with different registration state-of-the-art tools, the combination of DEEDs affine and non-rigid registration achieves the best performance for the pancreas label transfer across all contrast phases. We further perform external evaluation with another research cohort of 100 de-identified portal venous scans with 13 organs labeled, having the best label transfer performance of 0.504 Dice score in unsupervised setting. The qualitative representation (e.g., average map**) of each phase creates a clear boundary of pancreas and its distinctive contrast appearance. The deformation surface renderings across scales (e.g., small to large volume) further illustrate the generalizability of the proposed atlas template. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.01084 [pdf, other]

Exploration on HuBERT with Multiple Resolutions

Authors: Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu GOng, Juan Pino, Shinji Watanabe

Abstract: Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT repr… ▽ More Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT representations at multiple resolutions for downstream tasks. We explore two approaches, namely the parallel and hierarchical approaches, for integrating HuBERT features with different resolutions. Through experiments, we demonstrate that HuBERT with multiple resolutions outperforms the original model. This highlights the potential of utilizing multiple resolutions in SSL models like HuBERT to capture diverse information from speech signals. △ Less

Submitted 22 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to Interspeech2023

arXiv:2306.00047 [pdf, other]

Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

Authors: Ruining Deng, Yanwei Li, Peize Li, Jiacheng Wang, Lucas W. Remedios, Saydolimkhon Agzamkhodjaev, Zuhayr Asad, Quan Liu, Can Cui, Yaohong Wang, Yihan Wang, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Multi-class cell segmentation in high-resolution Giga-pixel whole slide images (WSI) is critical for various clinical applications. Training such an AI model typically requires labor-intensive pixel-wise manual annotation from experienced domain experts (e.g., pathologists). Moreover, such annotation is error-prone when differentiating fine-grained cell types (e.g., podocyte and mesangial cells) v… ▽ More Multi-class cell segmentation in high-resolution Giga-pixel whole slide images (WSI) is critical for various clinical applications. Training such an AI model typically requires labor-intensive pixel-wise manual annotation from experienced domain experts (e.g., pathologists). Moreover, such annotation is error-prone when differentiating fine-grained cell types (e.g., podocyte and mesangial cells) via the naked human eye. In this study, we assess the feasibility of democratizing pathological AI deployment by only using lay annotators (annotators without medical domain knowledge). The contribution of this paper is threefold: (1) We proposed a molecular-empowered learning scheme for multi-class cell segmentation using partial labels from lay annotators; (2) The proposed method integrated Giga-pixel level molecular-morphology cross-modality registration, molecular-informed annotation, and molecular-oriented segmentation model, so as to achieve significantly superior performance via 3 lay annotators as compared with 2 experienced pathologists; (3) A deep corrective learning (learning with imperfect label) method is proposed to further improve the segmentation performance using partially annotated noisy data. From the experimental results, our learning method achieved F1 = 0.8496 using molecular-informed annotations from lay annotators, which is better than conventional morphology-based annotations (F1 = 0.7015) from experienced pathologists. Our method democratizes the development of a pathological segmentation deep model to the lay annotator level, which consequently scales up the learning process similar to a non-medical computer vision task. The official implementation and cell annotations are publicly available at https://github.com/hrlblab/MolecularEL. △ Less

Submitted 21 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.19530 [pdf, ps, other]

Geometric sliding mode control of mechanical systems on Lie groups

Authors: Eduardo Espindola, Yu Tang

Abstract: This paper presents a generalization of conventional sliding mode control designs for systems in Euclidean spaces to fully actuated simple mechanical systems whose configuration space is a Lie group for the trajectory-tracking problem. A generic kinematic control is first devised in the underlying Lie algebra, which enables the construction of a Lie group on the tangent bundle where the system sta… ▽ More This paper presents a generalization of conventional sliding mode control designs for systems in Euclidean spaces to fully actuated simple mechanical systems whose configuration space is a Lie group for the trajectory-tracking problem. A generic kinematic control is first devised in the underlying Lie algebra, which enables the construction of a Lie group on the tangent bundle where the system state evolves. A sliding subgroup is then proposed on the tangent bundle with the desired sliding properties, and a control law is designed for the error dynamics trajectories to reach the sliding subgroup globally exponentially. Tracking control is then composed of the reaching law and sliding mode, and is applied for attitude tracking on the special orthogonal group SO(3) and the unit sphere S3. Numerical simulations show the performance of the proposed geometric sliding-mode controller (GSMC) in contrast with two control schemes of the literature. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 13 pages, 1 figure

Showing 1–50 of 193 results for author: Tang, Y