Skip to main content

Showing 1–50 of 475 results for author: Liu, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19749  [pdf, other

    eess.IV cs.CV

    SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

    Abstract: Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, **chuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2406.16933  [pdf, other

    eess.SP cs.AI

    SGSM: A Foundation-model-like Semi-generalist Sensing Model

    Authors: Tianjian Yang, Hao Zhou, Shuo Liu, Kaiwen Guo, Yiwen Hou, Haohua Du, Zhi Liu, Xiang-Yang Li

    Abstract: The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher i… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  4. arXiv:2406.16151  [pdf, other

    cs.AI cs.LG eess.SY

    Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

    Authors: Larkin Liu, Shiqi Liu, Matej Jusup

    Abstract: In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this stru… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Working manuscript

    ACM Class: C.4

  5. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  6. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, **yu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  8. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, **yu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  9. arXiv:2406.04149  [pdf

    eess.IV cs.AI

    Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis

    Authors: Chengeng Liu, Sihong Liu, Chaomin Shen, Yupeng Gao, Yuxuan Liu

    Abstract: Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  10. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  11. arXiv:2406.01795  [pdf, other

    eess.IV

    Video Coding with Cross-Component Sample Offset

    Authors: Han Gao, Xin Zhao, Tianqi Liu, Shan Liu

    Abstract: Beyond the exploration of traditional spatial, temporal and subjective visual signal redundancy in image and video compression, recent research has focused on leveraging cross-color component redundancy to enhance coding efficiency. Cross-component coding approaches are motivated by the statistical correlations among different color components, such as those in the Y'CbCr color space, where luma (… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  12. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  13. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  14. arXiv:2405.10561  [pdf, other

    eess.IV cs.CV

    Infrared Image Super-Resolution via Lightweight Information Split Network

    Authors: Shijie Liu, Kang Yan, Feiwei Qin, Changmiao Wang, Ruiquan Ge, Kai Zhang, Jie Huang, Yong Peng, ** Cao

    Abstract: Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory… ▽ More

    Submitted 27 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  15. arXiv:2405.10254  [pdf, other

    eess.IV cs.CV cs.LG

    PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology

    Authors: George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D. Kunz, Juan A. Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, Matthew Hanna, Michal Zelechowski, Julian Viret, Neil Tenenholtz, James Hall, Nicolo Fusi, Razik Yousfi, Peter Hamilton, William A. Moye, Eugene Vorontsov, Siqi Liu, Thomas J. Fuchs

    Abstract: Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately.… ▽ More

    Submitted 22 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  16. arXiv:2405.09923  [pdf, other

    cs.CV eess.IV

    NTIRE 2024 Restore Any Image Model (RAIM) in the Wild Challenge

    Authors: Jie Liang, Radu Timofte, Qiaosi Yi, Shuaizheng Liu, Lingchen Sun, Rongyuan Wu, Xindong Zhang, Hui Zeng, Lei Zhang

    Abstract: In this paper, we review the NTIRE 2024 challenge on Restore Any Image Model (RAIM) in the Wild. The RAIM challenge constructed a benchmark for image restoration in the wild, including real-world images with/without reference ground truth in various scenarios from real applications. The participants were required to restore the real-captured images from complex and unknown degradation, where gener… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  18. arXiv:2405.02801  [pdf, other

    cs.SD cs.AI eess.AS

    Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

    Authors: Tianze Xu, Jiajun Li, Xuesong Chen, Xinrui Yao, Shuchang Liu

    Abstract: In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the c… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures, submitted to ACM MM 2024

  19. arXiv:2405.01000  [pdf, other

    cs.IT eess.SP

    Low-Complexity Near-Field Localization with XL-MIMO Sectored Uniform Circular Arrays

    Authors: Shicong Liu, Xianghao Yu

    Abstract: Rapid advancement of antenna technology catalyses the popularization of extremely large-scale multiple-input multiple-output (XL-MIMO) antenna arrays, which pose unique challenges for localization with the inescapable near-field effect. In this paper, we propose an efficient near-field localization algorithm by leveraging a sectored uniform circular array (sUCA). In particular, we first customize… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures

  20. arXiv:2404.18058  [pdf, other

    eess.IV cs.CV

    Joint Reference Frame Synthesis and Post Filter Enhancement for Versatile Video Coding

    Authors: Weijie Bao, Yuantong Zhang, Jianghao Jia, Zhenzhong Chen, Shan Liu

    Abstract: This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC), aiming to explore the combination of different neural network-based video coding (NNVC) tools to better utilize the hierarchical bi-directional coding structure of VVC. Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two i… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  21. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  22. arXiv:2404.14946  [pdf, other

    cs.SD cs.CL eess.AS

    StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

    Authors: Sen Liu, Yiwei Guo, Xie Chen, Kai Yu

    Abstract: While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works. In this paper, we introduce StoryTTS, a highly ETTS dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show. A systematic and com… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 11521-11525

  23. arXiv:2404.14712  [pdf, other

    physics.ao-ph cs.AI cs.DC eess.IV physics.geo-ph

    ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

    Authors: Xiao Wang, Aristeidis Tsaris, Siyan Liu, Jong-Youl Choi, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

    Abstract: Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitati… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  24. arXiv:2404.14709  [pdf, ps, other

    cs.CV eess.IV

    SC-HVPPNet: Spatial and Channel Hybrid-Attention Video Post-Processing Network with CNN and Transformer

    Authors: Tong Zhang, Wenxue Cui, Shaohui Liu, Feng Jiang

    Abstract: Convolutional Neural Network (CNN) and Transformer have attracted much attention recently for video post-processing (VPP). However, the interaction between CNN and Transformer in existing VPP methods is not fully explored, leading to inefficient communication between the local and global extracted features. In this paper, we explore the interaction between CNN and Transformer in the task of VPP, a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  25. GNSS Measurement-Based Context Recognition for Vehicle Navigation using Gated Recurrent Unit

    Authors: Sheng Liu, Zhiqiang Yao, Xuemeng Cao, Xiaowen Cai

    Abstract: Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context rec… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 9 pages, 9 figures, 5 tables

    Journal ref: Proceedings of the 36th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2023)

  26. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  27. arXiv:2404.11304  [pdf

    eess.SY

    Dynamic Phasor Modeling of Single-Phase Grid-Forming Converters

    Authors: Wenjia Si, Chenming Liu, Steven Liu, Hongchang Li, Chenghui Zhang, **gyang Fang

    Abstract: In modern power systems, grid-forming power converters (GFMCs) have emerged as an enabling technology. However, the modeling of single-phase GFMCs faces new challenges. In particular, the nonlinear orthogonal signal generation unit, crucial for power measurement, still lacks an accurate model. To overcome the challenges, this letter proposes a dynamic phasor model of single-phase GFMCs. Moreover,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  28. arXiv:2404.08188  [pdf, other

    cs.IT eess.SP

    Fundamental Limits of Communication-Assisted Sensing in ISAC Systems

    Authors: Fuwang Dong, Fan Liu, Shihang Liu, Yifeng Xiong, Weijie Yuan, Yuanhao Cui

    Abstract: In this paper, we introduce a novel communication-assisted sensing (CAS) framework that explores the potential coordination gains offered by the integrated sensing and communication technique. The CAS system endows users with beyond-line-of-the-sight sensing capabilities, supported by a dual-functional base station that enables simultaneous sensing and communication. To delve into the system's fun… ▽ More

    Submitted 23 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ISIT. The updated version will be coming soon

  29. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  30. arXiv:2404.03204  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

    Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, **yu Li, Sheng Zhao

    Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  31. arXiv:2404.02461  [pdf, other

    cs.LG eess.SP

    On the Efficiency and Robustness of Vibration-based Foundation Models for IoT Sensing: A Case Study

    Authors: Tomoyoshi Kimura, **yang Li, Tianshi Wang, Denizhan Kara, Yizhuo Chen, Yigong Hu, Ruijie Wang, Maggie Wigness, Shengzhong Liu, Mani Srivastava, Suhas Diggavi, Tarek Abdelzaher

    Abstract: This paper demonstrates the potential of vibration-based Foundation Models (FMs), pre-trained with unlabeled sensing data, to improve the robustness of run-time inference in (a class of) IoT applications. A case study is presented featuring a vehicle classification application using acoustic and seismic sensing. The work is motivated by the success of foundation models in the areas of natural lang… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  32. arXiv:2404.00656  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    WavLLM: Towards Robust and Adaptive Speech Large Language Model

    Authors: Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, **g Pan, Xunying Liu, **yu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

    Abstract: The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  33. arXiv:2404.00481  [pdf, other

    stat.ML cs.LG eess.SY

    Convolutional Bayesian Filtering

    Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

    Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  34. arXiv:2404.00352  [pdf

    eess.IV

    Dependability Evaluation of Stable Diffusion with Soft Errors on the Model Parameters

    Authors: Zhen Gao, Lini Yuan, Pedro Reviriego, Shanshan Liu, Fabrizio Lombardi

    Abstract: Stable Diffusion is a popular Transformer-based model for image generation from text; it applies an image information creator to the input text and the visual knowledge is added in a step-by-step fashion to create an image that corresponds to the input text. However, this diffusion process can be corrupted by errors from the underlying hardware, which are especially relevant for implementations at… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 6 pages, 16 figures

  35. arXiv:2403.20058  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks

    Authors: Luoyu Wang, Yitian Tao, Qing Yang, Yan Liang, Siwei Liu, Hongcheng Shi, Dinggang Shen, Han Zhang

    Abstract: Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 11 pages

  36. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, **yu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios. We first investigate whether long-content transcriptions can improve the vanilla conformer transducer (C-T) models. Our experiments indicate that th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by TASLP 2024

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

  37. arXiv:2403.11809  [pdf, other

    cs.IT eess.SP

    Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

    Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Jie Xu, Derrick Wing Kwan Ng, Shuguang Cui

    Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channe… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  38. arXiv:2403.07834  [pdf, other

    eess.IV cs.CV

    When Eye-Tracking Meets Machine Learning: A Systematic Review on Applications in Medical Image Analysis

    Authors: Sahar Moradizeyveh, Mehnaz Tabassum, Sidong Liu, Robert Ahadizad Newport, Amin Beheshti, Antonio Di Ieva

    Abstract: Eye-gaze tracking research offers significant promise in enhancing various healthcare-related tasks, above all in medical image analysis and interpretation. Eye tracking, a technology that monitors and records the movement of the eyes, provides valuable insights into human visual attention patterns. This technology can transform how healthcare professionals and medical specialists engage with and… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  39. arXiv:2403.07622  [pdf, other

    cs.CV cs.AI eess.IV

    Multiple Latent Space Map** for Compressed Dark Image Enhancement

    Authors: Yi Zeng, Zhengning Wang, Yuxuan Liu, Tianjiao Zeng, Xuhang Liu, Xinglong Luo, Shuaicheng Liu, Shuyuan Zhu, Bing Zeng

    Abstract: Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  40. arXiv:2403.05912  [pdf, other

    eess.IV cs.CV

    Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

    Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

    Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  41. arXiv:2402.19111  [pdf, other

    eess.IV cs.CV

    Deep Network for Image Compressed Sensing Coding Using Local Structural Sampling

    Authors: Wenxue Cui, Xingtao Wang, Xiaopeng Fan, Shaohui Liu, Xinwei Gao, Debin Zhao

    Abstract: Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods gen… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted by ACM Transactions on Multimedia Computing Communications and Applications (TOMM)

  42. arXiv:2402.17585  [pdf, other

    eess.SY

    Communication-Constrained STL Task Decomposition through Convex Optimization

    Authors: Gregorio Marchesini, Siyuan Liu, Lars Lindemann, Dimos V. Dimarogonas

    Abstract: In this work, we propose a method to decompose signal temporal logic (STL) tasks for multi-agent systems subject to constraints imposed by the communication graph. Specifically, we propose to decompose tasks defined over multiple agents which require multi-hop communication, by a set of sub-tasks defined over the states of agents with 1-hop distance over the communication graph. To this end, we pa… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: This paper is accepted at 2024 American Control Conference (ACC)

  43. arXiv:2402.16908  [pdf

    cs.ET cond-mat.mtrl-sci cs.LG eess.IV

    Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics

    Authors: Lekai Song, Pengyu Liu, **gfang Pei, Yang Liu, Songwei Liu, Shengbo Wang, Leonard W. T. Ng, Tawfique Hasan, Kong-Pang Pun, Shuo Gao, Guohua Hu

    Abstract: The demand for efficient edge vision has spurred the interest in develo** stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech… ▽ More

    Submitted 20 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  44. arXiv:2402.16371  [pdf, other

    eess.IV

    Adaptive Online Learning of Separable Path Graph Transforms for Intra-prediction

    Authors: Wen-Yang Lu, Eduardo Pavez, Antonio Ortega, Xin Zhao, Shan Liu

    Abstract: Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in tex… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages, 4 figures

  45. arXiv:2402.09430  [pdf, other

    eess.SP cs.AI cs.CV cs.MM

    WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

    Authors: Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann

    Abstract: WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user se… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

    Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi

  46. arXiv:2402.09424  [pdf, other

    eess.SP cs.CV cs.LG cs.NE

    Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer

    Authors: Qinyu Chen, Congyi Sun, Chang Gao, Shih-Chii Liu

    Abstract: Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram… ▽ More

    Submitted 21 January, 2024; originally announced February 2024.

    Comments: To be published at the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

  47. arXiv:2402.06903  [pdf, other

    eess.SY math.DS

    High-Performance Distributed Control for Large-Scale Linear Systems: A Partitioned Distributed Observer Approach

    Authors: Haotian Xu, Shuai Liu, Ling Shi

    Abstract: In recent years, the distributed-observer-based distributed control law has shown powerful ability to arbitrarily approximate the centralized control performance. However, the traditional distributed observer requires each local observer to reconstruct the state information of the whole system, which is unrealistic for large-scale scenarios. To fill this gap, this paper develops a greedy-idea-base… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  48. Online Data-Driven Adaptive Control for Unknown Linear Time-Varying Systems

    Authors: Shenyu Liu, Kaiwen Chen, Jaap Eising

    Abstract: This paper proposes a novel online data-driven adaptive control for unknown linear time-varying systems. Initialized with an empirical feedback gain, the algorithm periodically updates this gain based on the data collected over a short time window before each update. Meanwhile, the stability of the closed-loop system is analyzed in detail, which shows that under some mild assumptions, the proposed… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Technical report for the conference paper in 62nd IEEE CDC

    Journal ref: 2023 62nd IEEE Conference on Decision and Control (CDC), Singapore, Singapore, 2023, pp. 8775-8780

  49. arXiv:2401.11856  [pdf, other

    eess.IV cs.CV

    MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Xiu-Ling Liu, Zeng-Guang Hou

    Abstract: Medical image segmentation takes an important position in various clinical applications. Deep learning has emerged as the predominant solution for automated segmentation of volumetric medical images. 2.5D-based segmentation models bridge computational efficiency of 2D-based models and spatial perception capabilities of 3D-based models. However, prevailing 2.5D-based models often treat each slice e… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Under Review

  50. arXiv:2401.07041  [pdf, other

    eess.IV cs.CV

    An automated framework for brain vessel centerline extraction from CTA images

    Authors: Sijie Liu, Ruisheng Su, Jianghang Su, **gmin Xin, Jiayi Wu, Wim van Zwam, Pieter Jan van Doormaal, Aad van der Lugt, Wiro J. Niessen, Nanning Zheng, Theo van Walsum

    Abstract: Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.