Skip to main content

Showing 1–50 of 201 results for author: Li, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07952  [pdf, other

    eess.IV cs.CV

    Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

    Authors: Zhenhuan Zhou, Along He, Yanlin Wu, Rui Yao, Xueshuo Xie, Tao Li

    Abstract: In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages

  2. arXiv:2406.05681  [pdf, other

    cs.SD eess.AS

    Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

    Authors: Yuepeng Jiang, Tao Li, Fengyu Yang, Lei Xie, Meng Meng, Yujun Wang

    Abstract: Recent research in zero-shot speech synthesis has made significant progress in speaker similarity. However, current efforts focus on timbre generalization rather than prosody modeling, which results in limited naturalness and expressiveness. To address this, we introduce a novel speech synthesis model trained on large-scale datasets, including both timbre and hierarchical prosody modeling. As timb… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, accepted by Interspeech2024

  3. arXiv:2405.19363  [pdf, other

    eess.SP cs.AI cs.LG

    Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

    Authors: Yihe Wang, Nan Huang, Taida Li, Yujun Yan, Xiang Zhang

    Abstract: Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we int… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 20pages (14 pages main paper + 6 pages supplementary materials)

  4. arXiv:2405.16760  [pdf, ps, other

    eess.SY math.PR

    Graphon Particle Systems, Part I: Spatio-Temporal Approximation and Law of Large Numbers

    Authors: Yan Chen, Tao Li

    Abstract: We study a class of graphon particle systems with time-varying random coefficients. In a graphon particle system, the interactions among particles are characterized by the coupled mean field terms through an underlying graphon and the randomness of the coefficients comes from the stochastic processes associated with the particle labels. By constructing two-level approximated sequences converging i… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2405.11935  [pdf

    eess.SY physics.app-ph physics.optics

    A Flat Dual-Polarized Millimeter-Wave Luneburg Lens Antenna Using Transformation Optics with Reduced Anisotropy and Impedance Mismatch

    Authors: Yuanyan Su, Teng Li, Wei Hong, Zhi Ning Chen, Anja K. Skrivervik

    Abstract: In this paper, a compact wideband dual-polarized Luneburg lens antenna (LLA) with reduced anisotropy and improved impedance matching is proposed in Ka band with a wide 2D beamscanning capability. Based on transformation optics, the spherical Luneburg lens is compressed into a cylindrical one, while the merits of high gain, broad band, wide scanning, and free polarization are preserved. A trigonome… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  6. arXiv:2405.11883  [pdf, other

    cs.IT eess.SP

    Asynchronous MIMO-OFDM Massive Unsourced Random Access with Codeword Collisions

    Authors: Tianya Li, Yongpeng Wu, Junyuan Gao, Wenjun Zhang, Xiang-Gen Xia, Derrick Wing Kwan Ng, Chengshan Xiao

    Abstract: This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures, submitted to the IEEE for possible publication

  7. arXiv:2404.03211  [pdf, other

    cs.LG eess.SY

    Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

    Authors: Xiwei Zhang, Tao Li

    Abstract: We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  8. arXiv:2403.15636  [pdf, ps, other

    cs.GT eess.SY

    On the Variational Interpretation of Mirror Play in Monotone Games

    Authors: Yunian Pan, Tao Li, Quanyan Zhu

    Abstract: Mirror play (MP) is a well-accepted primal-dual multi-agent learning algorithm where all agents simultaneously implement mirror descent in a distributed fashion. The advantage of MP over vanilla gradient play lies in its usage of mirror maps that better exploit the geometry of decision domains. Despite extensive literature dedicated to the asymptotic convergence of MP to equilibrium, the understan… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  9. arXiv:2403.12467  [pdf, other

    eess.SP

    Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

    Authors: Heng Wang, Jianhua Zhang, Gaofeng Nie, Li Yu, Zhiqiang Yuan, Tongjie Li, Jialin Wang, Guangyi Liu

    Abstract: Digital twin channel (DTC) is the real-time map** of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design. In this work, we first define five evolution levels of channel twins with the progression of wireless communication. The fifth level, autonomous DTC, is elaborate… ▽ More

    Submitted 31 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures, 15 references. It is submitted to IEEE journal

  10. arXiv:2403.00274  [pdf, other

    cs.CV cs.SD eess.AS

    CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

    Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

    Abstract: Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but… ▽ More

    Submitted 29 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  11. arXiv:2402.18781  [pdf, ps, other

    cs.GT cs.LG eess.SY

    Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

    Authors: Tao Li, Kim Hammar, Rolf Stadler, Quanyan Zhu

    Abstract: Asymmetric information stochastic games (\textsc{aisg}s) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for \textsc{aisg}s are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to special classes of \textsc{aisg}s to avoid belief hierarchies. To address these limi… ▽ More

    Submitted 8 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  12. arXiv:2402.12499  [pdf, other

    cs.GT cs.AI cs.CR cs.LG eess.SY

    Automated Security Response through Online Learning with Adaptive Conjectures

    Authors: Kim Hammar, Tao Li, Rolf Stadler, Quanyan Zhu

    Abstract: We study automated security response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed, non-stationary game. We relax the standard assumption that the game model is correctly specified and consider that each player has a probabilistic conjecture about the model, which may be misspecified in the sense that the true model has probabilit… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  13. arXiv:2402.09181  [pdf, other

    eess.IV cs.CV

    OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

    Authors: Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, ** Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape… ▽ More

    Submitted 21 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  14. arXiv:2402.05642  [pdf, ps, other

    eess.IV cs.CV

    An Optimization-based Baseline for Rigid 2D/3D Registration Applied to Spine Surgical Navigation Using CMA-ES

    Authors: Minheng Chen, Tonglong Li, Zhirun Zhang, Youyong Kong

    Abstract: A robust and efficient optimization-based 2D/3D registration framework is crucial for the navigation system of orthopedic surgical robots. It can provide precise position information of surgical instruments and implants during surgery. While artificial intelligence technology has advanced rapidly in recent years, traditional optimization-based registration methods remain indispensable in the field… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  15. arXiv:2402.03030  [pdf, other

    cs.IT eess.SP

    Rejection-Sampled Universal Quantization for Smaller Quantization Errors

    Authors: Chih Wei Ling, Cheuk Ting Li

    Abstract: We construct a randomized vector quantizer which has a smaller maximum error compared to all known lattice quantizers with the same entropy for dimensions 5, 6, ..., 48, and also has a smaller mean squared error compared to known lattice quantizers with the same entropy for dimensions 35, ..., 48, in the high resolution limit. Moreover, our randomized quantizer has a desirable property that the qu… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 15 pages, 2 figures

  16. arXiv:2402.00080  [pdf, ps, other

    eess.SY eess.SP

    Arithmetic Average Density Fusion -- Part IV: Distributed Heterogeneous Fusion of RFS and LRFS Filters via Variational Approximation

    Authors: Tiancheng Li, Haozhe Liang, Guchong Li, Jesús García Herrero, Quan Pan

    Abstract: This paper, the fourth part of a series of papers on the arithmetic average (AA) density fusion approach and its application for target tracking, addresses the intricate challenge of distributed heterogeneous multisensor multitarget tracking, where each inter-connected sensor operates a probability hypothesis density (PHD) filter, a multiple Bernoulli (MB) filter or a labeled MB (LMB) filter and t… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: 13 pages,14 figures

  17. arXiv:2401.15111  [pdf, other

    eess.IV cs.CV cs.LG

    Improving Fairness of Automated Chest X-ray Diagnosis by Contrastive Learning

    Authors: Mingquan Lin, Tianhao Li, Zhaoyi Sun, Gregory Holste, Ying Ding, Fei Wang, George Shih, Yifan Peng

    Abstract: Purpose: Limited studies exploring concrete methods or approaches to tackle and enhance model fairness in the radiology domain. Our proposed AI model utilizes supervised contrastive learning to minimize bias in CXR diagnosis. Materials and Methods: In this retrospective study, we evaluated our proposed method on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77,8… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 23 pages, 5 figures

    MSC Class: arms.org

  18. A Unified NOMA Framework in Beam-Hop** Satellite Communication Systems

    Authors: Xuyang Zhang, Xinwei Yue, Tian Li, Zhihao Han, Yafei Wang, Yong Ding, Rongke Liu

    Abstract: This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization prob… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Aerospace and Electronic Systems, vol. 59, no. 5, pp. 5390-5404, Oct. 2023

  19. arXiv:2401.05363  [pdf, other

    eess.SP cs.LG

    Generalizable Sleep Staging via Multi-Level Domain Alignment

    Authors: Jiquan Wang, Sha Zhao, Haiteng Jiang, Shijian Li, Tao Li, Gang Pan

    Abstract: Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging wh… ▽ More

    Submitted 27 January, 2024; v1 submitted 13 December, 2023; originally announced January 2024.

    Comments: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

  20. arXiv:2401.00283  [pdf, other

    cs.IT eess.SP

    Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

    Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

    Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 28 pages, 8 figures, 2 tables

  21. arXiv:2312.12810  [pdf, other

    eess.AS cs.SD

    Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection

    Authors: Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

    Abstract: Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance of each aspect remains limited. In this work, we present an unconstrained dysfluency modeling (UDM) approach that addresses both transcription and dete… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 2023 ASRU

  22. arXiv:2312.10687  [pdf, other

    eess.AS cs.SD

    MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

    Authors: Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

    Abstract: The style transfer task in Text-to-Speech refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide… ▽ More

    Submitted 31 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI2024

  23. arXiv:2312.06462  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

    Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More

    Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight. 13 pages, 10 figures

  24. arXiv:2312.05640  [pdf, other

    cs.SD cs.AI cs.CL cs.HC eess.AS

    Keyword spotting -- Detecting commands in speech using deep learning

    Authors: Sumedha Rai, Tong Li, Bella Lyu

    Abstract: Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models.… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  25. arXiv:2311.14925  [pdf, other

    cs.CV eess.IV

    Coordinate-based Neural Network for Fourier Phase Retrieval

    Authors: Tingyou Li, Zixin Xu, Yong S. Chu, Xiao**g Huang, Jizhou Li

    Abstract: Fourier phase retrieval is essential for high-definition imaging of nanoscale structures across diverse fields, notably coherent diffraction imaging. This study presents the Single impliCit neurAl Network (SCAN), a tool built upon coordinate neural networks meticulously designed for enhanced phase retrieval performance. Remedying the drawbacks of conventional iterative methods which are easiliy tr… ▽ More

    Submitted 8 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  26. arXiv:2311.14295  [pdf, ps, other

    cs.IT eess.SP

    Exploiting Active RIS in NOMA Networks with Hardware Impairments

    Authors: Xinwei Yue, Meiqi Song, Chongjun Ouyang, Yuanwei Liu, Tian Li, Tianwei Hou

    Abstract: Active reconfigurable intelligent surface (ARIS) is a promising way to compensate for multiplicative fading attenuation by amplifying and reflecting event signals to selected users. This paper investigates the performance of ARIS assisted non-orthogonal multiple access (NOMA) networks over cascaded Nakagami-m fading channels. The effects of hardware impairments (HIS) and reflection coefficients on… ▽ More

    Submitted 12 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  27. arXiv:2311.12273  [pdf, other

    cs.NI eess.SY

    How AI-driven Digital Twins Can Empower Mobile Networks

    Authors: Tong Li, Fenyu Jiang, Qiaohong Yu, Wenzhen Huang, Tao Jiang, Depeng **

    Abstract: The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  28. arXiv:2311.11969  [pdf, other

    eess.IV cs.CV

    SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

    Authors: ** Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

    Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  29. arXiv:2311.05941  [pdf, other

    eess.SY cs.LG

    Learning-Augmented Scheduling for Solar-Powered Electric Vehicle Charging

    Authors: Tongxin Li

    Abstract: We tackle the complex challenge of scheduling the charging of electric vehicles (EVs) equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions. Traditional scheduling approaches, such as reinforcement learning (RL) and model predictive control (MPC), often fail to provide satisfactory results when faced with OOD data, struggling to balance robustness (worst… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  30. arXiv:2311.01003  [pdf, other

    eess.SY cs.RO

    Minimum Snap Trajectory Generation and Control for an Under-actuated Flap** Wing Aerial Vehicle

    Authors: Chen Qian, Rui Chen, Peiyao Shen, Yongchun Fang, Jifu Yan, Tiefeng Li

    Abstract: Minimum Snap Trajectory Generation and Control for an Under-actuated Flap** Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flap** wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and d… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  31. arXiv:2310.20242  [pdf, other

    cs.NI eess.SP

    Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks

    Authors: Zhaolong Ning, Tengfeng Li, Yu Wu, Xiaojie Wang, Qingqing Wu, Fei Richard Yu, Song Guo

    Abstract: In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, th… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  32. arXiv:2310.17363  [pdf, ps, other

    eess.SY

    Controllability of networked multiagent systems based on linearized Turing's model

    Authors: Tianhao Li, Ruichang Zhang, Zhixin Liu, Zhuo Zou, Xiaoming Hu

    Abstract: Turing's model has been widely used to explain how simple, uniform structures can give rise to complex, patterned structures during the development of organisms. However, it is very hard to establish rigorous theoretical results for the dynamic evolution behavior of Turing's model since it is described by nonlinear partial differential equations. We focus on controllability of Turing's model by li… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 13 pages, 4 figures, submitted to automatica

  33. arXiv:2310.08740  [pdf, other

    cs.CL eess.SY

    A Zero-Shot Language Agent for Computer Control with Structured Reflection

    Authors: Tao Li, Gang Li, Zhiwei Deng, Bryan Wang, Yang Li

    Abstract: Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and… ▽ More

    Submitted 23 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings of EMNLP 2023

  34. arXiv:2310.07246  [pdf, other

    cs.SD eess.AS

    Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

    Authors: Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating e… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  35. arXiv:2310.04004  [pdf, other

    cs.SD eess.AS

    U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

    Authors: Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods still produce speech with undesirable naturalness and speaker similarity. Moreover, endowing the target speaker with arbitrary speaking styles in the zer… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  36. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  37. arXiv:2309.12953  [pdf

    eess.IV cs.CV

    Inter-vendor harmonization of Computed Tomography (CT) reconstruction kernels using unpaired image translation

    Authors: Aravind R. Krishnan, Kaiwen Xu, Thomas Li, Chenyu Gao, Lucas W. Remedios, Praitayini Kanakaraj, Ho Hin Lee, Shunxing Bao, Kim L. Sandler, Fabien Maldonado, Ivana Isgum, Bennett A. Landman

    Abstract: The reconstruction kernel in computed tomography (CT) generation determines the texture of the image. Consistency in reconstruction kernels is important as the underlying CT texture can impact measurements during quantitative image analysis. Harmonization (i.e., kernel conversion) minimizes differences in measurements due to inconsistent reconstruction kernels. Existing methods investigate harmoni… ▽ More

    Submitted 26 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: 10 pages, 6 figures, 1 table, Submitted to SPIE Medical Imaging : Image Processing. San Diego, CA. February 2024

  38. arXiv:2309.11783  [pdf, other

    cs.HC cs.SD eess.AS

    Frame Pairwise Distance Loss for Weakly-supervised Sound Event Detection

    Authors: Rui Tao, Yuxing Huang, Xiangdong Wang, Long Yan, Lufeng Zhai, Kazushige Ouchi, Taihao Li

    Abstract: Weakly-supervised learning has emerged as a promising approach to leverage limited labeled data in various domains by bridging the gap between fully supervised methods and unsupervised techniques. Acquisition of strong annotations for detecting sound events is prohibitively expensive, making weakly supervised learning a more cost-effective and broadly applicable alternative. In order to enhance th… ▽ More

    Submitted 7 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  39. arXiv:2309.09426  [pdf, other

    eess.IV cs.AI cs.CV cs.LG eess.SP

    Joint Demosaicing and Denoising with Double Deep Image Priors

    Authors: Taihui Li, Anish Lahiri, Yutong Dai, Owen Mayer

    Abstract: Demosaicing and denoising of RAW images are crucial steps in the processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of th… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  40. arXiv:2309.07133  [pdf

    eess.SP cs.HC cs.LG

    Assessing cognitive function among older adults using machine learning and wearable device data: a feasibility study

    Authors: Collin Sakal, Tingyou Li, Juan Li, Xinyue Li

    Abstract: Timely implementation of interventions to slow cognitive decline among older adults requires accurate monitoring to detect changes in cognitive function. Data gathered using wearable devices that can continuously monitor factors known to be associated with cognition could be used to train machine learning models and develop wearable-based cognitive monitoring systems. Using data from over 2,400 ol… ▽ More

    Submitted 24 March, 2024; v1 submitted 27 August, 2023; originally announced September 2023.

  41. arXiv:2309.03906  [pdf, other

    eess.IV cs.CV

    A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation

    Authors: Ziyan Huang, Zhongying Deng, ** Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao

    Abstract: Although deep learning have revolutionized abdominal multi-organ segmentation, models often struggle with generalization due to training on small, specific datasets. With the recent emergence of large-scale datasets, some important questions arise: \textbf{Can models trained on these datasets generalize well on different ones? If yes/no, how to further improve their generalizability?} To address t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  42. arXiv:2309.03462  [pdf, other

    eess.SY

    Research on Damage Analysis of Key Parts of UAV Flight Control System

    Authors: Tianshun Li, Huaimin Chen, Ben Xiao, Hao Li, Shiyu Hao, Di Hai, Xuetong Wang

    Abstract: A set of hardware in the loop simulation methods based on the UAV model is proposed to create fault data, which is used to judge the parts where faults happen. Actual flight experimental data is utilized to prove the reliability of Simulink models. Then a series of typical faults with various amplitudes are injected into different channels of UAV parts in hardware in the loop simulation platform.… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  43. arXiv:2309.02609  [pdf, other

    cs.RO eess.SY

    Directionality-Aware Mixture Model Parallel Sampling for Efficient Linear Parameter Varying Dynamical System Learning

    Authors: Sunan Sun, Haihui Gao, Tianyu Li, Nadia Figueroa

    Abstract: The Linear Parameter Varying Dynamical System (LPV-DS) is an effective approach that learns stable, time-invariant motion policies using statistical modeling and semi-definite optimization to encode complex motions for reactive robot control. Despite its strengths, the LPV-DS learning approach faces challenges in achieving a high model accuracy without compromising the computational efficiency. To… ▽ More

    Submitted 24 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  44. arXiv:2309.01884  [pdf, other

    cs.RO cs.LG eess.SY

    Task Generalization with Stability Guarantees via Elastic Dynamical System Motion Policies

    Authors: Tianyu Li, Nadia Figueroa

    Abstract: Dynamical System (DS) based Learning from Demonstration (LfD) allows learning of reactive motion policies with stability and convergence guarantees from a few trajectories. Yet, current DS learning techniques lack the flexibility to generalize to new task instances as they ignore explicit task parameters that inherently change the underlying trajectories. In this work, we propose Elastic-DS, a nov… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted to CoRL 2023

  45. arXiv:2309.01142  [pdf, other

    eess.AS cs.SD

    MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

    Authors: Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yu** Wang

    Abstract: In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation. Previous work generally took explicit prosodic features or fixed-length style embeddin… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: This work was submitted on April 10, 2022 and accepted on August 29, 2023

  46. arXiv:2309.00883  [pdf, other

    cs.SD eess.AS

    DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

    Authors: Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, **gbei Li, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: While the performance of cross-lingual TTS based on monolingual corpora has been significantly improved recently, generating cross-lingual speech still suffers from the foreign accent problem, leading to limited naturalness. Besides, current cross-lingual methods ignore modeling emotion, which is indispensable paralinguistic information in speech delivery. In this paper, we propose DiCLET-TTS, a D… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: accepted by TASLP

  47. arXiv:2308.16021  [pdf, other

    cs.SD eess.AS

    CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis

    Authors: Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng

    Abstract: To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by InterSpeech 2022

  48. arXiv:2308.13365  [pdf, ps, other

    cs.SD eess.AS

    Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

    Authors: Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

    Abstract: Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: accepted at Interspeech 2024

  49. arXiv:2308.04127  [pdf, ps, other

    eess.SY

    Flexible Distributed Flocking Control for Multi-agent Unicycle Systems

    Authors: Tinghua Li, Bayu Jayawardhana

    Abstract: Currently, the general aim of flocking and formation control laws for multi-agent systems is to form and maintain a rigid configuration, such as, the alpha-lattices in flocking control methods, where the desired distance between each pair of connected agents is fixed. This introduces a scalability issue for large-scale deployment of agents due to unrealizable geometrical constraints and the consta… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 9 pages, 2 figures

  50. arXiv:2308.00186  [pdf, other

    cs.RO eess.SY

    Learning Complex Motion Plans using Neural ODEs with Safety and Stability Guarantees

    Authors: Farhad Nawaz, Tianyu Li, Nikolai Matni, Nadia Figueroa

    Abstract: We propose a Dynamical System (DS) approach to learn complex, possibly periodic motion plans from kinesthetic demonstrations using Neural Ordinary Differential Equations (NODE). To ensure reactivity and robustness to disturbances, we propose a novel approach that selects a target point at each time step for the robot to follow, by combining tools from control theory and the target trajectory gener… ▽ More

    Submitted 22 March, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: accepted to ICRA 2024