Skip to main content

Showing 1–50 of 108 results for author: Chen, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01469  [pdf, other

    eess.IV

    Unrolling Plug-and-Play Gradient Graph Laplacian Regularizer for Image Restoration

    Authors: Jianghe Cai, Gene Cheung, Fei Chen

    Abstract: Generic deep learning (DL) networks for image restoration like denoising and interpolation lack mathematical interpretability, require voluminous training data to tune a large parameter set, and are fragile during covariance shift. To address these shortcomings, for a general linear image formation model, we first formulate a convex optimization problem with a new graph smoothness prior called gra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  3. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  4. arXiv:2405.16398  [pdf, other

    eess.SP

    Networked Integrated Sensing and Communications for 6G Wireless Systems

    Authors: Jiapeng Li, Xiaodan Shao, Feng Chen, Shaohua Wan, Chang Liu, Zhiqiang Wei, Derrick Wing Kwan Ng

    Abstract: Integrated sensing and communication (ISAC) is envisioned as a key pillar for enabling the upcoming sixth generation (6G) communication systems, requiring not only reliable communication functionalities but also highly accurate environmental sensing capabilities. In this paper, we design a novel networked ISAC framework to explore the collaboration among multiple users for environmental sensing. S… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Received by IEEE Internet of Things Journal

  5. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  6. arXiv:2403.09188  [pdf

    cs.LG eess.SP

    Design of an basis-projected layer for sparse datasets in deep learning training using gc-ms spectra as a case study

    Authors: Yu Tang Chang, Shih Fang Chen

    Abstract: Deep learning (DL) models encompass millions or even billions of parameters and learn complex patterns from big data. However, not all data are initially stored in a suitable formation to effectively train a DL model, e.g., gas chromatography-mass spectrometry (GC-MS) spectra and DNA sequence. These datasets commonly contain many zero values, and the sparse data formation causes difficulties in op… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures, 2 tables, conference

    MSC Class: 68-06 ACM Class: I.2.4; J.2

  7. arXiv:2402.01665  [pdf, other

    cs.NI cs.LG eess.SP

    Knowledge-Driven Deep Learning Paradigms for Wireless Network Optimization in 6G

    Authors: Rui** Sun, Nan Cheng, Changle Li, Fangjiong Chen, Wen Chen

    Abstract: In the sixth-generation (6G) networks, newly emerging diversified services of massive users in dynamic network environments are required to be satisfied by multi-dimensional heterogeneous resources. The resulting large-scale complicated network optimization problems are beyond the capability of model-based theoretical methods due to the overwhelming computational complexity and the long processing… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures

  8. arXiv:2401.05819  [pdf

    eess.SP cs.LG

    TAnet: A New Temporal Attention Network for EEG-based Auditory Spatial Attention Decoding with a Short Decision Window

    Authors: Yuting Ding, Fei Chen

    Abstract: Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows ranging from 1 to 5 seconds in previous studies. An end-to-end temporal attention… ▽ More

    Submitted 14 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  9. arXiv:2401.04953  [pdf, other

    eess.IV eess.SP

    Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

    Authors: Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi Zhang

    Abstract: Traditional vision transformer consists of two parts: transformer encoder and multi-layer perception (MLP). The former plays the role of feature learning to obtain better representation, while the latter plays the role of classification. Here, the MLP is constituted of two fully connected (FC) layers, average value computing, FC layer and softmax layer. However, due to the use of average value com… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted for Publication in IEEE ICASSP 2024

  10. arXiv:2401.00605  [pdf, other

    cs.MA eess.SP

    Distributed Multi-Object Tracking Under Limited Field of View Heterogeneous Sensors with Density Clustering

    Authors: Fei Chen, Hoa Van Nguyen, Alex S. Leong, Sabita Panicker, Robin Baker, Damith C. Ranasinghe

    Abstract: We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracki… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  11. arXiv:2312.12824  [pdf, other

    eess.IV cs.CV

    FedSODA: Federated Cross-assessment and Dynamic Aggregation for Histopathology Segmentation

    Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Lotfi Senhadji, Yongyue Wei, Feng Chen, Guanyu Yang

    Abstract: Federated learning (FL) for histopathology image segmentation involving multiple medical sites plays a crucial role in advancing the field of accurate disease diagnosis and treatment. However, it is still a task of great challenges due to the sample imbalance across clients and large data heterogeneity from disparate organs, variable segmentation tasks, and diverse distribution. Thus, we propose a… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP2024

  12. arXiv:2312.12153  [pdf, other

    cs.SD eess.AS

    Noise robust distillation of self-supervised speech models via correlation metrics

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages

  13. arXiv:2312.10979  [pdf, ps, other

    cs.SD eess.AS

    3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

    Authors: Shulin He, **jiang liu, Hao Li, Yang Yang, Fei Chen, Xueliang Zhang

    Abstract: Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE tas… ▽ More

    Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024

  14. arXiv:2312.10741  [pdf, other

    eess.AS cs.CL cs.SD

    StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

    Authors: Yu Zhang, Rongjie Huang, Ruiqi Li, **Zheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expr… ▽ More

    Submitted 2 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  15. arXiv:2312.03246  [pdf, ps, other

    eess.SY cs.MA

    On Topological Conditions for Enabling Transient Control in Leader-follower Networks

    Authors: Fei Chen, Dimos V. Dimarogonas

    Abstract: We derive necessary and sufficient conditions for leader-follower multi-agent systems such that we can further apply prescribed performance control to achieve the desired formation while satisfying certain transient constraints. A leader-follower framework is considered in the sense that a group of agents with external inputs are selected as leaders in order to drive the group of followers in a wa… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: under review at Automatica

  16. arXiv:2312.03001  [pdf

    eess.IV cs.CV cs.LG

    Computer Vision for Increased Operative Efficiency via Identification of Instruments in the Neurosurgical Operating Room: A Proof-of-Concept Study

    Authors: Tanner J. Zachem, Sully F. Chen, Vishal Venkatraman, David AW Sykes, Ravi Prakash, Koumani W. Ntowe, Mikhail A. Bethell, Samantha Spellicy, Alexander D Suarez, Weston Ross, Patrick J. Codd

    Abstract: Objectives Computer vision (CV) is a field of artificial intelligence that enables machines to interpret and understand images and videos. CV has the potential to be of assistance in the operating room (OR) to track surgical instruments. We built a CV algorithm for identifying surgical instruments in the neurosurgical operating room as a potential solution for surgical instrument tracking and mana… ▽ More

    Submitted 29 April, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: Data is openly available through The Open Science Framework: https://doi.org/10.17605/OSF.IO/BCQK2

  17. arXiv:2311.08415  [pdf

    eess.IV physics.optics

    Scanning phase imaging without accurate positioning system

    Authors: Tao Liu, Bingyang Wang, JiangTao Zhao, Fu rong Chen, Fucai Zhang

    Abstract: Ptychography, a high-resolution phase imaging technique using precise in-plane translation information, has been widely applied in modern synchrotron radiation sources across the globe. A key requirement for successful ptychographic reconstruction is the precise knowledge of the scanning positions, which are typically obtained by a physical interferometric positioning system. Whereas high-throughp… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: 9 pages,4 figures

  18. arXiv:2311.03046  [pdf, other

    cs.IT eess.SP

    Antenna Positioning and Beamforming Design for Fluid-Antenna Enabled Multi-user Downlink Communications

    Authors: Haoran Qin, Wen Chen, Zhendong Li, Qingqing Wu, Nan Cheng, Fangjiong Chen

    Abstract: This paper investigates a multiple input single output (MISO) downlink communication system in which users are equipped with fluid antennas (FAs). First, we adopt a field-response based channel model to characterize the downlink channel with respect to FAs' positions. Then, we aim to minimize the total transmit power by jointly optimizing the FAs' positions and beamforming matrix. To solve the res… ▽ More

    Submitted 13 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  19. arXiv:2311.00271  [pdf, other

    cs.DC eess.SY

    EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing

    Authors: Bo Li, Qiang He, Feifei Chen, Lingjuan Lyu, Athman Bouguettaya, Yun Yang

    Abstract: Mobile edge computing (MEC) enables web data caching in close geographic proximity to end users. Popular data can be cached on edge servers located less than hundreds of meters away from end users. This ensures bounded latency guarantees for various latency-sensitive web applications. However, transmitting a large volume of data out of the cloud onto many geographically-distributed web servers ind… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  20. arXiv:2310.05072  [pdf, other

    cs.IT eess.SP

    Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

    Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Jun Li, Nan Cheng, Fangjiong Chen, Changle Li

    Abstract: In this paper, we investigate a practical structure of reconfigurable intelligent surface (RIS)-based double spatial scattering modulation (DSSM) for millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. A suboptimal detector is proposed, in which the beam direction is first demodulated according to the received beam strength, and then the remaining information is demodulated by… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  21. arXiv:2309.09548  [pdf, other

    eess.AS cs.LG cs.SD

    Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

    Authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes m… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to Interspeech 2024

  22. arXiv:2309.08323  [pdf

    cs.RO eess.SY

    MLP Based Continuous Gait Recognition of a Powered Ankle Prosthesis with Serial Elastic Actuator

    Authors: Yanze Li, Feixing Chen, **gqi Cao, Ruoqi Zhao, Xuan Yang, Xingbang Yang, Yubo Fan

    Abstract: Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of mod… ▽ More

    Submitted 30 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to IROS 2024

  23. TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Recently, there has been a growing interest in the field of controllable Text-to-Speech (TTS). While previous studies have relied on users providing specific style factor values based on acoustic knowledge or selecting reference speeches that meet certain requirements, generating speech solely from natural text prompts has emerged as a new challenge for researchers. This challenge arises due to th… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  24. arXiv:2307.15280  [pdf, other

    cs.IT eess.SP

    Active RIS-Assisted MIMO-OFDM System: Analyses and Prototype Measurements

    Authors: De-Ming Chian, Feng-Ji Chen, Yu-Chen Chang, Chao-Kai Wen, Chi-Hung Wu, Fu-Kang Wang, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: In this study, we develop an active reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) prototype compliant with the 5G New Radio standard at 3.5~GHz. The experimental results clearly indicate that active RIS plays a vital role in enhancing MIMO performance, surpassing passive RIS. Furthermore, when considering fac… ▽ More

    Submitted 14 November, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 5 pages, 5 figures, 1 table, accepted by IEEE Communications Letters, for demo video see: https://www.youtube.com/watch?v=3R6eZXizwns

  25. arXiv:2307.06547  [pdf

    eess.IV cs.CV cs.LG

    Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks

    Authors: Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Manoranjan Paul, **g Zhu, Prabal Datta Barua, U. Rajendra Acharya, Fang Chen, Jianlong Zhou

    Abstract: Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies us… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  26. arXiv:2307.05385  [pdf, other

    eess.SP cs.AI cs.LG

    Learned Kernels for Sparse, Interpretable, and Efficient Medical Time Series Processing

    Authors: Sully F. Chen, Zhicheng Guo, Cheng Ding, Xiao Hu, Cynthia Rudin

    Abstract: Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability. Methods: We propose Sparse Mixture of… ▽ More

    Submitted 2 April, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 26 pages, 9 figures

  27. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  28. arXiv:2306.02719  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Multiple output samples per input in a single-output Gaussian process

    Authors: Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

    Abstract: The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty… ▽ More

    Submitted 25 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: This paper is presented in the "Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond", which is a satellite event of the ASRU workshop, on 20 December 2023. https://bayesian40.github.io/

  29. arXiv:2305.19972  [pdf, other

    eess.AS cs.AI cs.CL

    VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

    Authors: Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, **g Shi, Pin Lv, Bo Xu

    Abstract: Enhancing automatic speech recognition (ASR) performance by leveraging additional multimodal information has shown promising results in previous studies. However, most of these works have primarily focused on utilizing visual cues derived from human lip motions. In fact, context-dependent visual and linguistic cues can also benefit in many scenarios. In this paper, we first propose ViLaS (Vision a… ▽ More

    Submitted 18 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2024

  30. arXiv:2305.16342  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition

    Authors: Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, Xinyuan Qian, Li-Fang Wei, Song-Lu Chen, Feng Chen, Xu-Cheng Yin

    Abstract: The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these… ▽ More

    Submitted 29 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  31. arXiv:2305.14049  [pdf, other

    cs.CL cs.SD eess.AS

    Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

    Authors: Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

    Abstract: Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating more accurate and informative semantic states. In this paper, we propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. In particular, unlike vanilla… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  32. arXiv:2305.05107  [pdf, other

    cs.SI eess.SP

    Modeling Viral Information Spreading via Directed Acyclic Graph Diffusion

    Authors: Chinthaka Dinesh, Gene Cheung, Fei Chen, Yuejiang Li, H. Vicky Zhao

    Abstract: Viral information like rumors or fake news is spread over a communication network like a virus infection in a unidirectional manner: entity $i$ conveys information to a neighbor $j$, resulting in two equally informed (infected) parties. Existing graph diffusion works focus only on bidirectional diffusion on an undirected graph. Instead, we propose a new directed acyclic graph (DAG) diffusion model… ▽ More

    Submitted 22 December, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  33. arXiv:2305.04160  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    X-LLM: Bootstrap** Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

    Authors: Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, **g Shi, Shuang Xu, Bo Xu

    Abstract: Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4, based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous visual language models. We attribute this to the use of more advanced LLMs compared with previous multimodal models. Unfortunately, the model architecture and training strategies of GPT-4 are unknown. To endow LLMs with multimod… ▽ More

    Submitted 21 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

  34. arXiv:2301.13003  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

    Authors: Minglun Han, Feilong Chen, **g Shi, Shuang Xu, Bo Xu

    Abstract: Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems,… ▽ More

    Submitted 28 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted by INTERSPEECH 2023

  35. arXiv:2211.12911  [pdf, ps, other

    math.OC eess.SY

    Data-driven approximation of control invariant set for linear system based on convex piecewise linear fitting

    Authors: Jun Xu, Fanglin Chen

    Abstract: Control invariant set is critical for guaranteeing safe control and the problem of computing control invariant set for linear discrete-time system is revisited in this paper by using a data-driven approach. Specifically, sample points on convergent trajectories of linear MPC are recorded, of which the convex hull formulates a control invariant set for the linear system. To approximate the convex h… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  36. arXiv:2211.07283  [pdf, other

    eess.AS cs.SD

    SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to acc… ▽ More

    Submitted 1 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

  37. Tunable Dynamic Walking via Soft Twisted Beam Vibration

    Authors: Yuhao Jiang, Fuchen Chen, Daniel M. Aukes

    Abstract: We propose a novel mechanism that propagates vibration through soft twisted beams, taking advantage of dynamically-coupled anisotropic stiffness to simplify the actuation of walking robots. Using dynamic simulation and experimental approaches, we show that the coupled stiffness of twisted beams with terrain contact can be controlled to generate a variety of complex trajectories by changing the fre… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 8 pages, 5 figure, this paper has been submitted to IEEE Robotics and Automation Letters, copyright may be transferred without notice, after which this version may no longer be accessible, the supplemental video is available at: https://youtu.be/HpvOvaIC1Z4

    Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 4, pp. 1967-1974, April 2023

  38. arXiv:2210.14481  [pdf

    eess.SP

    Calibrationless Reconstruction of Uniformly-Undersampled Multi-Channel MR Data with Deep Learning Estimated ESPIRiT Maps

    Authors: Junhao Zhang, Zheyuan Yi, Yujiao Zhao, Linfang Xiao, Jiahao Hu, Christopher Man, Vick Lau, Shi Su, Fei Chen, Alex T. L. Leong, Ed X. Wu

    Abstract: Purpose: To develop a truly calibrationless reconstruction method that derives ESPIRiT maps from uniformly-undersampled multi-channel MR data by deep learning. Methods: ESPIRiT, one commonly used parallel imaging reconstruction technique, forms the images from undersampled MR k-space data using ESPIRiT maps that effectively represents coil sensitivity information. Accurate ESPIRiT map estimation r… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

  39. arXiv:2209.13112  [pdf, other

    eess.AS cs.SD

    Automated Sex Classification of Children's Voices and Changes in Differentiating Factors with Age

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Weiting Tan

    Abstract: Sex classification of children's voices allows for an investigation of the development of secondary sex characteristics which has been a key interest in the field of speech analysis. This research investigated a broad range of acoustic features from scripted and spontaneous speech and applied a hierarchical clustering-based machine learning model to distinguish the sex of children aged between 5 a… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  40. EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman

    Abstract: Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Journal ref: Interspeech 2022, 823-827 (2022)

  41. arXiv:2208.11439  [pdf, other

    eess.SY cs.MA

    A Consistency Constraint-Based Approach to Coupled State Constraints in Distributed Model Predictive Control

    Authors: Adrian Wiltz, Fei Chen, Dimos V. Dimarogonas

    Abstract: In this paper, we present a distributed model predictive control (DMPC) scheme for dynamically decoupled systems which are subject to state constraints, coupling state constraints and input constraints. In the proposed control scheme, neighbor-to-neighbor communication suffices and all subsystems solve their local optimization problem in parallel. The approach relies on consistency constraints whi… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: accepted for presentation at the 61st IEEE Conference on Decision and Control 2022

  42. arXiv:2208.08131  [pdf, other

    cs.SD eess.AS

    Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation

    Authors: Fang-Ching Chen, Kuan-Dar Chen, Yi-Wen Liu

    Abstract: Semi-supervised learning and domain adaptation techniques have drawn increasing attention in the field of domestic sound event detection thanks to the availability of large amounts of unlabeled data and the relative ease to generate synthetic strongly-labeled data. In a previous work, several semi-supervised learning strategies were designed to boost the performance of a mean-teacher model. Namely… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  43. arXiv:2208.00840  [pdf, other

    q-bio.NC cs.LG eess.IV

    A Transformer-based Neural Language Model that Synthesizes Brain Activation Maps from Free-Form Text Queries

    Authors: Gia H. Ngo, Minh Nguyen, Nancy F. Chen, Mert R. Sabuncu

    Abstract: Neuroimaging studies are often limited by the number of subjects and cognitive processes that can be feasibly interrogated. However, a rapidly growing number of neuroscientific studies have collectively accumulated an extensive wealth of results. Digesting this growing literature and obtaining novel insights remains to be a major challenge, since existing meta-analytic tools are constrained to key… ▽ More

    Submitted 24 July, 2022; originally announced August 2022.

    Comments: arXiv admin note: text overlap with arXiv:2109.13814

    Journal ref: Medical Image Analysis. 2022 Jul 19:102540

  44. arXiv:2207.12941  [pdf, other

    cs.CV eess.IV

    Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

    Authors: Fengjun Li, Xin Feng, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  45. arXiv:2206.04245  [pdf, other

    eess.SP

    Manifold Graph Signal Restoration using Gradient Graph Laplacian Regularizer

    Authors: Fei Chen, Gene Cheung, Xue Zhang

    Abstract: In the graph signal processing (GSP) literature, graph Laplacian regularizer (GLR) was used for signal restoration to promote piecewise smooth / constant reconstruction with respect to an underlying graph. However, for signals slowly varying across graph kernels, GLR suffers from an undesirable "staircase" effect. In this paper, focusing on manifold graphs -- collections of uniform discrete sample… ▽ More

    Submitted 4 April, 2024; v1 submitted 8 June, 2022; originally announced June 2022.

  46. arXiv:2205.07108  [pdf, other

    cs.HC eess.SP

    Formalizing PQRST Complex in Accelerometer-based Gait Cycle for Authentication

    Authors: Frank Sicong Chen, Amith K. Belman, Vir V. Phoha

    Abstract: Accelerometer signals generated through gait present a new frontier of human interface with mobile devices. Gait cycle detection based on these signals has applications in various areas, including authentication, health monitoring, and activity detection. Template-based studies focus on how the entire gait cycle represents walking patterns, but these are compute-intensive. Aggregate feature-based… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  47. arXiv:2204.11448  [pdf, other

    eess.IV cs.CV

    High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation

    Authors: Ming Lu, Fangdong Chen, Shiliang Pu, Zhan Ma

    Abstract: Questing for learned lossy image coding (LIC) with superior compression performance and computation throughput is challenging. The vital factor behind it is how to intelligently explore Adaptive Neighborhood Information Aggregation (ANIA) in transform and entropy coding modules. To this end, Integrated Convolution and Self-Attention (ICSA) unit is first proposed to form a content-adaptive transfor… ▽ More

    Submitted 12 October, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

  48. arXiv:2204.03310  [pdf, other

    eess.AS cs.LG cs.SD

    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

    Authors: Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility sco… ▽ More

    Submitted 30 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  49. arXiv:2204.03305  [pdf, other

    eess.AS cs.LG cs.SD

    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

    Authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale lis… ▽ More

    Submitted 30 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  50. arXiv:2203.16032  [pdf, other

    cs.SD eess.AS

    ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

    Authors: Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

    Abstract: With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laborato… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.