Skip to main content

Showing 1–50 of 113 results for author: Yang, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16871  [pdf, other

    eess.SY

    Neural network based model predictive control of voltage for a polymer electrolyte fuel cell system with constraints

    Authors: Xiufei Li, Miao Yang, Yuanxin Qi, Miao Zhang

    Abstract: A fuel cell system must output a steady voltage as a power source in practical use. A neural network (NN) based model predictive control (MPC) approach is developed in this work to regulate the fuel cell output voltage with safety constraints. The developed NN MPC controller stabilizes the polymer electrolyte fuel cell system's output voltage by controlling the hydrogen and air flow rates at the s… ▽ More

    Submitted 24 March, 2024; originally announced June 2024.

  2. arXiv:2406.12596  [pdf, ps, other

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.10137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

    Authors: Yi-Jen Yang, Ming-Hsun Yang, Jwo-Yuh Wu, Y. -W. Peter Hong

    Abstract: This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data rec… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: v1 was submitted to IEEE Transactions on Signal Processing on Sept. 18, 2023

  4. arXiv:2405.10589  [pdf, other

    cs.CV cs.AI eess.IV

    Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

    Authors: I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo

    Abstract: Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  5. arXiv:2405.07442  [pdf

    cs.SD cs.AI eess.AS q-bio.QM

    Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

    Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

    Abstract: Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio sample… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  6. arXiv:2405.01200  [pdf, other

    eess.SY cs.LG

    Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

    Authors: Mei Yang, Gao Qiu andJunyong Liu, Kai Liu

    Abstract: This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for cons… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2404.17736  [pdf, other

    eess.SP cs.CV cs.IT eess.IV

    Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission

    Authors: Mingyu Yang, Bowen Liu, Boyang Wang, Hun-Seok Kim

    Abstract: Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  8. arXiv:2404.13153  [pdf, other

    eess.IV cs.CV

    Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

    Authors: Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

    Abstract: Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  9. arXiv:2404.11836  [pdf, other

    eess.SP

    AI-Empowered RIS-Assisted Networks: CV-Enabled RIS Selection and DNN-Enabled Transmission

    Authors: Conggang Hu, Yang Lu, Hongyang Du, Mi Yang, Bo Ai, Dusit Niyato

    Abstract: This paper investigates artificial intelligence (AI) empowered schemes for reconfigurable intelligent surface (RIS) assisted networks from the perspective of fast implementation. We formulate a weighted sum-rate maximization problem for a multi-RIS-assisted network. To avoid huge channel estimation overhead due to activate all RISs, we propose a computer vision (CV) enabled RIS selection scheme ba… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  10. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  11. arXiv:2404.06265  [pdf, other

    cs.CV eess.IV

    Spatial-Temporal Multi-level Association for Video Object Segmentation

    Authors: Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal m… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  12. arXiv:2403.16170  [pdf, other

    eess.SY

    Voltage Regulation in Polymer Electrolyte Fuel Cell Systems Using Gaussian Process Model Predictive Control

    Authors: Xiufei Li, Miao Zhang, Yuanxin Qi, Miao Yang

    Abstract: This study introduces a novel approach utilizing Gaussian process model predictive control (MPC) to stabilize the output voltage of a polymer electrolyte fuel cell (PEFC) system by simultaneously regulating hydrogen and airflow rates. Two Gaussian process models are developed to capture PEFC dynamics, taking into account constraints including hydrogen pressure and input change rates, thereby aidin… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  13. arXiv:2403.00605  [pdf, other

    eess.SP

    Channel Measurements and Modeling for Dynamic Vehicular ISAC Scenarios at 28 GHz

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan

    Abstract: Integrated sensing and communication (ISAC) is a promising technology for 6G, with the goal of providing end-to-end information processing and inherent perception capabilities for future communication systems. Within ISAC emerging application scenarios, vehicular ISAC technologies have the potential to enhance traffic efficiency and safety through integration of communication and synchronized perc… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  14. arXiv:2403.00569  [pdf, other

    eess.SP

    Characterization of Wireless Channel Semantics: A New Paradigm

    Authors: Zhengyu Zhang, Ruisi He, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan, Bo Ai

    Abstract: Recently, deep learning enabled semantic communications have been developed to understand transmission content from semantic level, which realize effective and accurate information transfer. Aiming to the vision of sixth generation (6G) networks, wireless devices are expected to have native perception and intelligent capabilities, which associate wireless channel with surrounding environments from… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  15. arXiv:2403.00557  [pdf, other

    eess.SP

    Non-stationarity Characteristics in Dynamic Vehicular ISAC Channels at 28 GHz

    Authors: Zhengyu Zhang, Ruisi He, Mi Yang, Xuejian Zhang, Ziyi Qi, Hang Mi, Guiqi Sun, **gya Yang, Bo Ai

    Abstract: Integrated sensing and communications (ISAC) is a potential technology of 6G, aiming to enable end-to-end information processing ability and native perception capability for future communication systems. As an important part of the ISAC application scenarios, ISAC aided vehicle-to-everything (V2X) can improve the traffic efficiency and safety through intercommunication and synchronous perception.… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  16. arXiv:2403.00505  [pdf, other

    eess.SP

    A Cluster-Based Statistical Channel Model for Integrated Sensing and Communication Channels

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Yong Niu, Zhangdui Zhong, Yujian Li, Xuejian Zhang, **g Li

    Abstract: The emerging 6G network envisions integrated sensing and communication (ISAC) as a promising solution to meet growing demand for native perception ability. To optimize and evaluate ISAC systems and techniques, it is crucial to have an accurate and realistic wireless channel model. However, some important features of ISAC channels have not been well characterized, for example, most existing ISAC ch… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  17. arXiv:2402.10427  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Evaluating and Improving Continual Learning in Spoken Language Understanding

    Authors: Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

    Abstract: Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects o… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  18. arXiv:2401.02046  [pdf, other

    eess.AS cs.SD

    CTC Blank Triggered Dynamic Layer-Skip** for Efficient CTC-based Speech Recognition

    Authors: Junfeng Hou, Peiyao Wang, **cheng Zhang, Meng Yang, Minwei Feng, **gcheng Yin

    Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skip** method th… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: accepted by ASRU 2023

  19. arXiv:2312.15873  [pdf, other

    cs.NI eess.SY

    Investigating Inter-Satellite Link Spanning Patterns on Networking Performance in Mega-constellations

    Authors: Xiangtong Wang, Xiaodong Han, Menglong Yang, Chuan Xing, Yuqi Wang, Songchen Han, Wei Li

    Abstract: Low Earth orbit (LEO) mega-constellations rely on inter-satellite links (ISLs) to provide global connectivity. We note that in addition to the general constellation parameters, the ISL spanning patterns are also greatly influence the final network structure and thus the network performance. In this work, we formulate the ISL spanning patterns, apply different patterns to mega-constellation and g… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5pages

  20. arXiv:2312.07858  [pdf, other

    eess.SY

    Non-myopic Beam Scheduling for Multiple Smart Target Tracking in Phased Array Radar Network

    Authors: Yuhang Hao, Zengfu Wang, José Niño-Mora, **g Fu, Min Yang, Quan Pan

    Abstract: A smart target, also referred to as a reactive target, can take maneuvering motions to hinder radar tracking. We address beam scheduling for tracking multiple smart targets in phased array radar networks. We aim to mitigate the performance degradation in previous myopic tracking methods and enhance the system performance, which is measured by a discounted cost objective related to the tracking err… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 14 pages

  21. arXiv:2310.10413  [pdf, other

    eess.IV cs.CV

    Image super-resolution via dynamic network

    Authors: Chunwei Tian, Xuanyu Zhang, Qi Zhang, Mingming Yang, Zhaojie Ju

    Abstract: Convolutional neural networks (CNNs) depend on deep network architectures to extract accurate information for image super-resolution. However, obtained information of these CNNs cannot completely express predicted high-quality images for complex scenes. In this paper, we present a dynamic network for image super-resolution (DSRNet), which contains a residual enhancement block, wide enhancement blo… ▽ More

    Submitted 22 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  22. arXiv:2310.02699  [pdf, other

    eess.AS cs.AI

    Continual Contrastive Spoken Language Understanding

    Authors: Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

    Abstract: Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from sc… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to ACL Findings 2024

  23. arXiv:2310.00900  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

    Authors: Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

    Abstract: Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by p… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  24. arXiv:2309.09028  [pdf, other

    eess.AS cs.SD

    Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

    Authors: Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

    Abstract: Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynth… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  25. arXiv:2309.08007  [pdf, ps, other

    eess.AS cs.CL cs.SD

    DiariST: Streaming Speech Translation with Speaker Diarization

    Authors: Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, **yu Li, Takuya Yoshioka

    Abstract: End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlap** speech in a streaming fashion. In this work, we propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level seri… ▽ More

    Submitted 22 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  26. arXiv:2309.07432  [pdf, other

    cs.SD eess.AS

    SpatialCodec: Neural Spatial Speech Coding

    Authors: Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang, Dong Yu

    Abstract: In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec. Our app… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Paper in Submission

  27. arXiv:2309.05908  [pdf, other

    eess.SY

    Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems

    Authors: Han Su, Jiyu Zhu, Shenghua Feng, Yunjun Bai, Bin Gu, Jiang Liu, Mengfei Yang, Naijun Zhan

    Abstract: A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid… ▽ More

    Submitted 27 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 15 pages, 10 figures

  28. arXiv:2309.05906  [pdf, other

    eess.SY

    Correct-by-Construction for Hybrid Systems by Synthesizing Reset Controller

    Authors: Jiang Liu, Han Su, Yunjun Bai, Bin Gu, Bai Xue, Mengfei Yang, Naijun Zhan

    Abstract: Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 26 pages, 8 figures

  29. arXiv:2307.13948  [pdf, other

    cs.CV cs.SD eess.AS

    Rethinking Voice-Face Correlation: A Geometry View

    Authors: Xiang Li, Yandong Wen, Muqiao Yang, **glu Wang, Rita Singh, Bhiksha Raj

    Abstract: Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric mea… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: ACM Multimedia 2023

  30. Siamese Learning-based Monarch Butterfly Localization

    Authors: Sara Shoouri, Mingyu Yang, Gordy Carichner, Yuyang Li, Ehab A. Hamed, Angela Deng, Delbert A. Green II, Inhee Lee, David Blaauw, Hun-Seok Kim

    Abstract: A new GPS-less, daily localization method is proposed with deep learning sensor fusion that uses daylight intensity and temperature sensor data for Monarch butterfly tracking. Prior methods suffer from the location-independent day length during the equinox, resulting in high localization errors around that date. This work proposes a new Siamese learning-based localization model that improves the a… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 2022 IEEE Data Science and Learning Workshop (DSLW)

  31. arXiv:2306.14097  [pdf, other

    eess.IV cs.CV math.NA

    Interpretable Small Training Set Image Segmentation Network Originated from Multi-Grid Variational Model

    Authors: Junying Meng, Weihong Guo, Jun Liu, Mingrui Yang

    Abstract: The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 25 pages, 9 figures, 6 tables

    MSC Class: 94A08; 68U10

  32. arXiv:2306.06524  [pdf, other

    eess.AS cs.CL cs.SD

    What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model

    Authors: Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen

    Abstract: This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task. This problem is addressed based on model probing. Specifically, we conduct a systematic layer-wise analysis of the representations of the Transformer layers on a phoneme correlation task… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by Interspeech 2023

  33. arXiv:2306.01209  [pdf, other

    cs.CV cs.AI eess.IV

    Counting Crowds in Bad Weather

    Authors: Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang

    Abstract: Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding. Numerous methods have been proposed and achieved state-of-the-art performance for real-world tasks. However, existing approaches do not perform well under adverse weather such as haze, rain, and snow since the visual appearances of crowds in such scenes a… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: including supplemental material

  34. arXiv:2305.13899  [pdf, other

    eess.AS cs.CL

    Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

    Authors: Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti

    Abstract: The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acquired knowledge leads to the catastrophic forgetting issue. In this work we tackle the problem of Spoken Language Understanding applied to a continual learning set… ▽ More

    Submitted 31 July, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023. Code (will be) available at https://github.com/umbertocappellazzo/SLURP-SeqKD

  35. arXiv:2305.06899  [pdf, other

    eess.SP cs.IT

    Generalized signals on simplicial complexes

    Authors: Feng Ji, Xingchao Jian, Wee Peng Tay, Maosheng Yang

    Abstract: Topological signal processing (TSP) over simplicial complexes typically assumes observations associated with the simplicial complexes are real scalars. In this paper, we develop TSP theories for the case where observations belong to general abelian groups, including function spaces that are commonly used to represent time-varying signals. Our approach generalizes the Hodge decomposition and allows… ▽ More

    Submitted 11 November, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  36. arXiv:2305.03997  [pdf, other

    eess.IV cs.CV

    Dual Degradation Representation for Joint Deraining and Low-Light Enhancement in the Dark

    Authors: Xin Lin, **gtong Yue, Sixian Ding, Chao Ren, Lu Qi, Ming-Hsuan Yang

    Abstract: Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in… ▽ More

    Submitted 17 June, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  37. arXiv:2305.00216  [pdf, other

    eess.SY cs.LG

    Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis

    Authors: Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding

    Abstract: The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data,… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

  38. arXiv:2303.14701  [pdf, ps, other

    eess.SP

    Mathematical Characterization of Signal Semantics and Rethinking of the Mathematical Theory of Information

    Authors: Guangming Shi, Dahua Gao, Shuai Ma, Minxi Yang, Yong Xiao, Xuemei Xie

    Abstract: Shannon information theory is established based on probability and bits, and the communication technology based on this theory realizes the information age. The original goal of Shannon's information theory is to describe and transmit information content. However, due to information is related to cognition, and cognition is considered to be subjective, Shannon information theory is to describe and… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

  39. Vehicle Sequencing at Signal-Free Intersections: Analytical Performance Guarantees Based on PDMP Formulation

    Authors: Xiangchen Cheng, Wei Tang, Ming Yang, Li **

    Abstract: Signal-free intersections are a representative application of smart and connected vehicle technologies. Although extensive results have been developed for trajectory planning and autonomous driving, the formulation and evaluation of vehicle sequencing have not been well understood.In this paper, we consider theoretical guarantees of macroscopic performance (i.e., capacity and delay) of typical seq… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  40. arXiv:2303.09663  [pdf, other

    cs.CV eess.IV

    Efficient Computation Sharing for Multi-Task Visual Scene Understanding

    Authors: Sara Shoouri, Mingyu Yang, Zichen Fan, Hun-Seok Kim

    Abstract: Solving multiple visual tasks using individual models can be resource-intensive, while multi-task learning can conserve resources by sharing knowledge across different tasks. Despite the benefits of multi-task learning, such techniques can struggle with balancing the loss for each task, leading to potential performance degradation. We present a novel computation- and parameter-sharing framework th… ▽ More

    Submitted 14 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Camera-Ready version. Accepted to ICCV 2023

  41. arXiv:2303.02708  [pdf, other

    cs.RO eess.SY

    Tac-VGNN: A Voronoi Graph Neural Network for Pose-Based Tactile Servoing

    Authors: Wen Fan, Max Yang, Yifan Xing, Nathan F. Lepora, Dandan Zhang

    Abstract: Tactile pose estimation and tactile servoing are fundamental capabilities of robot touch. Reliable and precise pose estimation can be provided by applying deep learning models to high-resolution optical tactile sensors. Given the recent successes of Graph Neural Network (GNN) and the effectiveness of Voronoi features, we developed a Tactile Voronoi Graph Neural Network (Tac-VGNN) to achieve reliab… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: 7 pages, 10 figures, accepted by 2023 IEEE International Conference on Robotics and Automation (ICRA)

  42. arXiv:2302.08095  [pdf, other

    cs.SD cs.CL eess.AS

    PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

    Authors: Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

    Abstract: Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  43. arXiv:2302.08088  [pdf, other

    cs.CL cs.SD eess.AS

    TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

    Authors: Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

    Abstract: Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  44. arXiv:2301.08660  [pdf

    cs.CY eess.SP

    A Big-Data Driven Framework to Estimating Vehicle Volume based on Mobile Device Location Data

    Authors: Mofeng Yang, Weiyu Luo, Mohammad Ashoori, **a Mahmoudi, Chenfeng Xiong, Jiawei Lu, Guangchen Zhao, Saeed Saleh Namadi, Songhua Hu, Aliakbar Kabiri

    Abstract: Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private… ▽ More

    Submitted 24 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  45. arXiv:2212.04054  [pdf, other

    cs.CL cs.SD eess.AS

    Learning to Dub Movies via Hierarchical Prosody Models

    Authors: Gaoxiang Cong, Liang Li, Yuankai Qi, Zhengjun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

    Abstract: Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions a… ▽ More

    Submitted 4 April, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: accepted to CVPR 2023

  46. arXiv:2211.09404  [pdf, other

    eess.IV cs.CV

    Hard Exudate Segmentation Supplemented by Super-Resolution with Multi-scale Attention Fusion Module

    Authors: Jiayi Zhang, Xiaoshan Chen, Zhongxi Qiu, Mingming Yang, Yan Hu, Jiang Liu

    Abstract: Hard exudates (HE) is the most specific biomarker for retina edema. Precise HE segmentation is vital for disease diagnosis and treatment, but automatic segmentation is challenged by its large variation of characteristics including size, shape and position, which makes it difficult to detect tiny lesions and lesion boundaries. Considering the complementary features between segmentation and super-re… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by IEEE BIBM 2022

  47. arXiv:2211.06891  [pdf, other

    eess.IV cs.CV

    Residual Degradation Learning Unfolding Framework with Mixing Priors across Spectral and Spatial for Compressive Spectral Imaging

    Authors: Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, Guangming Shi

    Abstract: To acquire a snapshot spectral image, coded aperture snapshot spectral imaging (CASSI) is proposed. A core problem of the CASSI system is to recover the reliable and fine underlying 3D spectral cube from the 2D measurement. By alternately solving a data subproblem and a prior subproblem, deep unfolding methods achieve good performance. However, in the data subproblem, the used sensing matrix is il… ▽ More

    Submitted 15 November, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  48. arXiv:2210.15715  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Simulating realistic speech overlaps improves multi-talker ASR

    Authors: Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, **yu Li, Takuya Yoshioka

    Abstract: Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlap** speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human transcriptions, a naïve simulation of multi-talker speech by randomly mixing multiple utterances was conventionally used for model training. In this wo… ▽ More

    Submitted 17 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: v2: fix minor typo

  49. arXiv:2209.05735  [pdf, other

    eess.AS cs.CL

    Learning ASR pathways: A sparse multilingual ASR model

    Authors: Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli

    Abstract: Neural network pruning compresses automatic speech recognition (ASR) models effectively. However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activa… ▽ More

    Submitted 28 September, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted by ICASSP 2023

  50. arXiv:2208.04940  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Depth Boundary-Aware Left Atrial Scar Segmentation Network

    Authors: Mengjun Wu, Wangbin Ding, Ming** Yang, Liqin Huang

    Abstract: Automatic segmentation of left atrial (LA) scars from late gadolinium enhanced CMR images is a crucial step for atrial fibrillation (AF) recurrence analysis. However, delineating LA scars is tedious and error-prone due to the variation of scar shapes. In this work, we propose a boundary-aware LA scar segmentation network, which is composed of two branches to segment LA and LA scars, respectively.… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.