Skip to main content

Showing 1–50 of 91 results for author: Jiao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09569  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

    Authors: Frank Seide, Morrie Doulaty, Yangyang Shi, Yashesh Gaur, Junteng Jia, Chunyang Wu

    Abstract: We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio without explicit end-pointing. Speech ReaLLM is a special case of the more general ReaLLM ("real-time LLM") approach, also introduced here for the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2405.20559  [pdf, other

    physics.optics cs.CV cs.IT eess.IV physics.data-an

    Universal evaluation and design of imaging systems using information estimation

    Authors: Henry Pinkard, Leyla Kabuli, Eric Markley, Tiffany Chien, Jiantao Jiao, Laura Waller

    Abstract: Information theory, which describes the transmission of signals in the presence of noise, has enabled the development of reliable communication systems that underlie the modern world. Imaging systems can also be viewed as a form of communication, in which information about the object is "transmitted" through images. However, the application of information theory to imaging systems has been limited… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2405.16797  [pdf

    cs.SD cs.AI eess.AS

    A Real-Time Voice Activity Detection Based On Lightweight Neural

    Authors: Jidong Jia, Pei Zhao, Di Wang

    Abstract: Voice activity detection (VAD) is the task of detecting speech in an audio stream, which is challenging due to numerous unseen noises and low signal-to-noise ratios in real environments. Recently, neural network-based VADs have alleviated the degradation of performance to some extent. However, the majority of existing studies have employed excessively large models and incorporated future context,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  5. arXiv:2404.18058  [pdf, other

    eess.IV cs.CV

    Joint Reference Frame Synthesis and Post Filter Enhancement for Versatile Video Coding

    Authors: Weijie Bao, Yuantong Zhang, Jianghao Jia, Zhenzhong Chen, Shan Liu

    Abstract: This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC), aiming to explore the combination of different neural network-based video coding (NNVC) tools to better utilize the hierarchical bi-directional coding structure of VVC. Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two i… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  6. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  7. arXiv:2404.10777  [pdf, other

    eess.IV cs.GR physics.optics

    Divide-Conquer-and-Merge: Memory- and Time-Efficient Holographic Displays

    Authors: Zhenxing Dong, Jidong Jia, Yan Li, Yuye Ling

    Abstract: Recently, deep learning-based computer-generated holography (CGH) has demonstrated tremendous potential in three-dimensional (3D) displays and yielded impressive display quality. However, most existing deep learning-based CGH techniques can only generate holograms of 1080p resolution, which is far from the ultra-high resolution (16K+) required for practical virtual reality (VR) and augmented reali… ▽ More

    Submitted 25 February, 2024; originally announced April 2024.

    Comments: This paper has been accepted as conference paper in IEEE VR 2024

  8. arXiv:2404.00989  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    360+x: A Panoptic Multi-modal Scene Understanding Dataset

    Authors: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

    Abstract: Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentri… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

    Journal ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

  9. arXiv:2401.16700  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

    Authors: Jianbin Jiao, Xina Cheng, Weijie Chen, Xiaoting Yin, Hao Shi, Kailun Yang

    Abstract: 3D human pose estimation captures the human joint points in three-dimensional space while kee** the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are pr… ▽ More

    Submitted 25 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to IJCNN 2024. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose

  10. arXiv:2401.12707  [pdf, ps, other

    eess.SY

    Localized Data-driven Consensus Control

    Authors: Zeze Chang, Junjie Jiao, Zhongkui Li

    Abstract: This paper considers a localized data-driven consensus problem for leader-follower multi-agent systems with unknown discrete-time agent dynamics, where each follower computes its local control gain using only their locally collected state and input data. Both noiseless and noisy data-driven consensus protocols are presented, which can handle the challenge of the heterogeneity in control gains caus… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  11. arXiv:2401.05711  [pdf, other

    cs.LG eess.SP

    Dynamic Indoor Fingerprinting Localization based on Few-Shot Meta-Learning with CSI Images

    Authors: Jiyu Jiao, Xiaojun Wang, Chenpei Han, Yuhua Huang, Yizhuo Zhang

    Abstract: While fingerprinting localization is favored for its effectiveness, it is hindered by high data acquisition costs and the inaccuracy of static database-based estimates. Addressing these issues, this letter presents an innovative indoor localization method using a data-efficient meta-learning algorithm. This approach, grounded in the ``Learning to Learn'' paradigm of meta-learning, utilizes histori… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 5 pages,7 figures

  12. arXiv:2401.00153  [pdf, other

    eess.IV

    USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis

    Authors: **g Jiao, ** Zhou, Xiaokang Li, Menghua Xia, Yi Huang, Lihong Huang, Na Wang, Xiaofan Zhang, Shichong Zhou, Yuanyuan Wang, Yi Guo

    Abstract: Inadequate generality across different organs and tasks constrains the application of ultrasound (US) image analysis methods in smart healthcare. Building a universal US foundation model holds the potential to address these issues. Nevertheless, the development of such foundational models encounters intrinsic challenges in US analysis, i.e., insufficient databases, low quality, and ineffective fea… ▽ More

    Submitted 2 January, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Submit to MedIA, 17 pages, 11 figures

  13. arXiv:2312.15659  [pdf, other

    eess.IV

    Perceptual Quality Assessment for Video Frame Interpolation

    Authors: **liang Han, Xiongkuo Min, Yixuan Gao, Jun Jia, Lei Sun, Zuowei Cao, Yonglin Luo, Guangtao Zhai

    Abstract: The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures

    ACM Class: I.4.0

  14. arXiv:2312.07784  [pdf, other

    eess.IV cs.AI cs.CV cs.LG eess.SP

    Robust MRI Reconstruction by Smoothed Unrolling (SMUG)

    Authors: Shijun Liang, Van Hoang Minh Nguyen, **ghan Jia, Ismail Alkhouri, Sijia Liu, Saiprasad Ravishankar

    Abstract: As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  15. arXiv:2311.15420  [pdf

    eess.SY cs.CV

    Data-Driven Modelling for Harmonic Current Emission in Low-Voltage Grid Using MCReSANet with Interpretability Analysis

    Authors: Jieyu Yao, Hao Yu, Paul Judge, Jiabin Jia, Sasa Djokic, Verner Püvi, Matti Lehtonen, Jan Meyer

    Abstract: Even though the use of power electronics PE loads offers enhanced electrical energy conversion efficiency and control, they remain the primary sources of harmonics in grids. When diverse loads are connected in the distribution system, their interactions complicate establishing analytical models for the relationship between harmonic voltages and currents. To solve this, our paper presents a data-dr… ▽ More

    Submitted 19 January, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  16. arXiv:2311.12968  [pdf, ps, other

    cs.IT eess.SP

    Bit Error Rate Performance and Diversity Analysis for Mediumband Wireless Communication

    Authors: Dushyantha A Basnayaka, Jiabin Jia

    Abstract: Mediumband wireless communication refers to wireless communication through a class of channels known as mediumband that exists on the TmTs-plane. This paper, through statistical analysis and computer simulations, studies the performance limits of this class of channels in terms of uncoded bit error rate (BER) and diversity order. We show that, owing mainly to the effect of the deep fading avoidanc… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 6 pages, 5 figures, Accepted for Publication in the Proceedings of IEEE VCC 2023, 28-30 Nov. 2023

    Journal ref: Proceedings of IEEE VCC 2023

  17. arXiv:2311.11337  [pdf, other

    eess.SY math.OC

    H2 suboptimal containment control of homogeneous and heterogeneous multi-agent systems

    Authors: Yuan Gao, Junjie Jiao, Zhongkui Li, Sandra Hirche

    Abstract: This paper deals with the H2 suboptimal state containment control problem for homogeneous linear multi-agent systems and the H2 suboptimal output containment control problem for heterogeneous linear multi-agent systems. For both problems, given multiple autonomous leaders and a number of followers, we introduce suitable performance outputs and an associated H2 cost functional, respectively. The ai… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 15 papges, 7 figures

  18. Integrated Sensing and Communication enabled Doppler Frequency Shift Estimation and Compensation

    Authors: **zhu Jia, Zhiqing Wei, Ruiyun Zhang, Lin Wang

    Abstract: Despite the millimeter wave technology fulfills the low-latency and high data transmission, it will cause severe Doppler Frequency Shift (DFS) for high-speed vehicular network, which tremendously damages the communication performance. In this paper, we propose an Integrated Sensing and Communication (ISAC) enabled DFS estimation and compensation algorithm. Firstly, the DFS is coarsely estimated an… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 6 pages,8 figures, IEEE/CIC ICCC conference

    Journal ref: 2023 IEEE/CIC International Conference on Communications in China (ICCC). IEEE, 2023: 1-6

  19. arXiv:2309.13942  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

    Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Published at the CVPR 2023 Sight and Sound workshop

  20. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, **xi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  21. arXiv:2309.11849  [pdf, other

    cs.SD cs.CL eess.AS

    A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

    Authors: Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang

    Abstract: This paper explores predicting suitable prosodic features for fine-grained emotion analysis from the discourse-level text. To obtain fine-grained emotional prosodic features as predictive values for our model, we extract a phoneme-level Local Prosody Embedding sequence (LPEs) and a Global Style Embedding as prosodic speech features from the speech with the help of a style transfer model. We propos… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: ChinaMM 2023

  22. arXiv:2309.11714  [pdf, other

    eess.SP cs.AI cs.LG

    A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification

    Authors: Jie Jiao, Meiyan Xu, Qingqing Chen, Hefan Zhou, Wangliang Zhou

    Abstract: There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 10 pages,4 figures,journal

    MSC Class: 68T07 (Primary) ACM Class: I.2.4

  23. arXiv:2309.01947  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

    Authors: Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

    Abstract: Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficien… ▽ More

    Submitted 27 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Meta AI; Submitted to ICASSP 2024

  24. arXiv:2308.10217  [pdf, other

    eess.SY

    Fault Separation Based on An Excitation Operator with Application to a Quadrotor UAV

    Authors: Sicheng Zhou, Meng Wang, **dou Jia, Kexin Guo, Xiang Yu, Youmin Zhang, Lei Guo

    Abstract: This paper presents an excitation operator based fault separation architecture for a quadrotor unmanned aerial vehicle (UAV) subject to loss of effectiveness (LoE) faults, actuator aging, and load uncertainty. The actuator fault dynamics is deeply excavated, containing the deep coupling information among the actuator faults, the system states, and control inputs. By explicitly considering the phys… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  25. StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability

    Authors: Tengchuan Kou, Xiaohong Liu, Wei Sun, Jun Jia, Xiongkuo Min, Guangtao Zhai, Ning Liu

    Abstract: Video shakiness is an unpleasant distortion of User Generated Content (UGC) videos, which is usually caused by the unstable hold of cameras. In recent years, many video stabilization algorithms have been proposed, yet no specific and accurate metric enables comprehensively evaluating the stability of videos. Indeed, most existing quality assessment models evaluate video quality as a whole without… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM'23

  26. arXiv:2307.15443  [pdf, other

    eess.IV

    RAWIW: RAW Image Watermarking Robust to ISP Pipeline

    Authors: Kang Fu, Xiaohong Liu, Jun Jia, Zicheng Zhang, Yicong Peng, Jia Wang, Guangtao Zhai

    Abstract: Invisible image watermarking is essential for image copyright protection. Compared to RGB images, RAW format images use a higher dynamic range to capture the radiometric characteristics of the camera sensor, providing greater flexibility in post-processing and retouching. Similar to the master recording in the music industry, RAW images are considered the original format for distribution and image… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  27. arXiv:2307.11795  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Prompting Large Language Models with Speech Recognition Abilities

    Authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, **xi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  28. arXiv:2306.17634  [pdf, other

    eess.SP

    Enhancing Feature Extraction for Indoor Fingerprint Localization Using Diversified Data

    Authors: Jiyu Jiao, Xiaojun Wang, Chenlin He

    Abstract: Given the rapid advancements in wireless communication and terminal devices, high-speed and convenient WiFi has permeated various aspects of people's lives, and attention has been drawn to the location services that WiFi can provide. Fingerprint-based methods, as an excellent approach for localization, have gradually become a hot research topic. However, in practical localization, fingerprint feat… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  29. arXiv:2306.02231  [pdf, other

    cs.CL cs.AI cs.LG eess.SY

    Fine-Tuning Language Models with Advantage-Induced Policy Alignment

    Authors: Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  30. arXiv:2306.02003  [pdf, other

    cs.LG cs.AI cs.PF eess.SY stat.ML

    On Optimal Caching and Model Multiplexing for Large Model Inference

    Authors: Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

    Abstract: Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  31. arXiv:2306.00265  [pdf, other

    cs.LG cs.AI cs.CV eess.IV stat.ML

    Doubly Robust Self-Training

    Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

    Abstract: Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provabl… ▽ More

    Submitted 2 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  32. arXiv:2305.12498  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Multi-Head State Space Model for Speech Recognition

    Authors: Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

    Abstract: State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  33. arXiv:2305.10747  [pdf, other

    eess.SY

    Strong Structural Controllability of Structured Networks with MIMO node systems

    Authors: Yanting Ni, Xuyang Lou, Junjie Jiao, Jiajia Jia

    Abstract: The article addresses the problem of strong structural controllability of structured networks with multi-input multi-output (MIMO) node systems. The authors first present necessary and sufficient conditions for strong structural controllability, which involve both algebraic and graph-theoretic aspects. These conditions are computationally expensive, especially for large-scale networks with high-di… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  34. arXiv:2304.05131  [pdf, other

    eess.SY

    Fast IMU-based Dual Estimation of Human Motion and Kinematic Parameters via Progressive In-Network Computing

    Authors: Xiaobing Dai, Huanzhuo Wu, Siyi Wang, Junjie Jiao, Giang T. Nguyen, Frank H. P. Fitzek, Sandra Hirche

    Abstract: Many applications involve humans in the loop, where continuous and accurate human motion monitoring provides valuable information for safe and intuitive human-machine interaction. Portable devices such as inertial measurement units (IMUs) are applicable to monitor human motions, while in practice often limited computational power is available locally. The human motion in task space coordinates req… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  35. arXiv:2303.12735  [pdf, other

    eess.IV cs.CV cs.LG physics.med-ph

    SMUG: Towards robust MRI reconstruction by smoothed unrolling

    Authors: Hui Li, **ghan Jia, Shijun Liang, Yuguang Yao, Saiprasad Ravishankar, Sijia Liu

    Abstract: Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconst… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  36. arXiv:2303.11413  [pdf, other

    eess.SP cs.AI cs.LG

    Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN

    Authors: Youzhi Liang, Wen Liang, Jianguo Jia

    Abstract: Vibration signals have been increasingly utilized in various engineering fields for analysis and monitoring purposes, including structural health monitoring, fault diagnosis and damage detection, where vibration signals can provide valuable information about the condition and integrity of structures. In recent years, there has been a growing trend towards the use of vibration signals in the field… ▽ More

    Submitted 22 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 10 pages, 4 figures

  37. arXiv:2303.08050  [pdf, other

    cs.CV eess.IV

    Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images

    Authors: Zicheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, **g Liu, Xiongkuo Min, Guangtao Zhai

    Abstract: Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc. In practice, the quality of CGIs consistently suffers from poor rendering during production, inevitable compression artifacts during the transmission of multimedia applications, and low aesthetic quality resulting from poor… ▽ More

    Submitted 1 November, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  38. arXiv:2303.05914  [pdf, ps, other

    cs.LG eess.SP

    On the Value of Stochastic Side Information in Online Learning

    Authors: Junzhang Jia, Xuetong Wu, **gge Zhu, Jamie Evans

    Abstract: We study the effectiveness of stochastic side information in deterministic online learning scenarios. We propose a forecaster to predict a deterministic sequence where its performance is evaluated against an expert class. We assume that certain stochastic side information is available to the forecaster but not the experts. We define the minimax expected regret for evaluating the forecasters perfor… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  39. arXiv:2211.04930  [pdf, other

    eess.IV

    On the Robustness of deep learning-based MRI Reconstruction to image transformations

    Authors: **ghan Jia, Mingyi Hong, Yimeng Zhang, Mehmet Akçakaya, Sijia Liu

    Abstract: Although deep learning (DL) has received much attention in accelerated magnetic resonance imaging (MRI), recent studies show that tiny input perturbations may lead to instabilities of DL-based MRI reconstruction models. However, the approaches of robustifying these models are underdeveloped. Compared to image classification, it could be much more challenging to achieve a robust MRI image reconstru… ▽ More

    Submitted 21 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Accepted as TSRML'22 Paper

  40. arXiv:2211.04767  [pdf, other

    eess.IV

    Multimodal Remote Sensing Image Registration Based on Adaptive Multi-scale PIIFD

    Authors: Ning Li, Yuxuan Li, Jichao jiao

    Abstract: In recent years, due to the wide application of multi-sensor vision systems, multimodal image acquisition technology has continued to develop, and the registration problem based on multimodal images has gradually emerged. Most of the existing multimodal image registration methods are only suitable for two modalities, and cannot uniformly register multiple modal image data. Therefore, this paper pr… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  41. arXiv:2210.11588  [pdf, other

    eess.AS cs.SD

    Anchored Speech Recognition with Neural Transducers

    Authors: Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

    Abstract: Neural transducers have achieved human level performance on standard speech recognition benchmarks. However, their performance significantly degrades in the presence of cross-talk, especially when the primary speaker has a low signal-to-noise ratio. Anchored speech recognition refers to a class of methods that use information from an anchor segment (e.g., wake-words) to recognize device-directed s… ▽ More

    Submitted 29 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: To appear at IEEE ICASSP 2023

  42. arXiv:2208.10642  [pdf, other

    cs.CV eess.IV

    Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound

    Authors: Zeyu Fu, Jianbo Jiao, Robail Yasrab, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

    Abstract: Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics.… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: ECCV-MCV 2022

  43. arXiv:2208.05359  [pdf, other

    cs.SD cs.CL eess.AS

    Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

    Authors: Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

    Abstract: Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local… ▽ More

    Submitted 19 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 5 pages, 3 figures, accepted to INTERSPEECH 2022, demo page at https://thuhcsi.github.io/is2022-cross-speaker-reading-style-transfer

  44. arXiv:2206.14964  [pdf, other

    eess.AS cs.MM cs.SD

    Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention

    Authors: Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Dejun Li

    Abstract: Audio-visual speech enhancement system is regarded as one of promising solutions for isolating and enhancing speech of desired speaker. Typical methods focus on predicting clean speech spectrum via a naive convolution neural network based encoder-decoder architecture, and these methods a) are not adequate to use data fully, b) are unable to effectively balance audio-visual features. The proposed m… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022. arXiv admin note: substantial text overlap with arXiv:2101.06268

  45. arXiv:2206.14962  [pdf, other

    eess.AS cs.SD

    GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block

    Authors: Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Jianjun Hao

    Abstract: For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human auditory perception to introduce a two-stage trainable reasoning mechanism, refe… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  46. arXiv:2206.12512  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings

    Authors: Sophia Bano, Alessandro Casella, Francisco Vasconcelos, Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, Fabrice Meriaudeau, Chiara Lena, Ilaria Anita Cintorrino, Gaia Romana De Paolis, Jessica Biagioli, Daria Grechishnikova, **g Jiao, Bizhe Bai, Yanyan Qiao, Binod Bhattarai, Rebati Raman Gaire, Ronast Subedi, Eduard Vazquez, Szymon Płotka, Aneta Lisowska, Arkadiusz Sitek, George Attilakos, Ruwan Wimalasundera, Anna L David , et al. (6 additional authors not shown)

    Abstract: Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to regulate blood exchange among twins. The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. These challe… ▽ More

    Submitted 26 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted at MedIA (Medical Image Analysis)

  47. arXiv:2204.07041  [pdf, ps, other

    eess.SY math.OC

    Distributed Optimal Control with Recovered Robustness for Uncertain Network Systems: A Complementary Design Approach

    Authors: Zhongkui Li, Junjie Jiao, Xiang Chen

    Abstract: This paper considers the distributed robust suboptimal consensus control problem of linear multi-agent systems, with both H2 and H_infty performance requirements. A novel two-step complementary design approach is proposed. In the first step, a distributed control law is designed for the nominal multi-agent system to achieve consensus with a prescribed H2 performance. In the second step, an extra c… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 8 pages

  48. arXiv:2204.01257  [pdf, ps, other

    cs.IT eess.SY

    Age of Information with Hybrid-ARQ: A Unified Explicit Result

    Authors: Aimin Li, Shaohua Wu, Jian Jiao, Ning Zhang, Qinyu Zhang

    Abstract: Delivering timely status updates in a timeliness-critical communication system is of paramount importance to assist accurate and efficient decision making. Therefore, the topic of analyzing Age of Information has aroused new research interest. This paper contributes to new results in this area by systematically analyzing the AoI of two types of Hybrid Automatic Repeat reQuest (HARQ) techniques tha… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

  49. arXiv:2203.15966  [pdf, other

    cs.SD cs.CL eess.AS

    Federated Domain Adaptation for ASR with Full Self-Supervision

    Authors: Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide

    Abstract: Cross-device federated learning (FL) protects user privacy by collaboratively training a model on user devices, therefore eliminating the need for collecting, storing, and manually labeling user data. While important topics such as the FL training algorithm, non-IID-ness, and Differential Privacy have been well studied in the literature, this paper focuses on two challenges of practical importance… ▽ More

    Submitted 5 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  50. arXiv:2203.02321  [pdf, ps, other

    math.OC eess.SY

    Actuator Scheduling for Linear Systems: A Convex Relaxation Approach

    Authors: Junjie Jiao, Dipankar Maity, John S. Baras, Sandra Hirche

    Abstract: In this letter, we investigate the problem of actuator scheduling for networked control systems. Given a stochastic linear system with a number of actuators, we consider the case that one actuator is activated at each time. This problem is combinatorial in nature and NP hard to solve. We propose a convex relaxation to the actuator scheduling problem, and use its solution as a reference to design a… ▽ More

    Submitted 20 May, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: 8 pages, 4 figures