Skip to main content

Showing 1–50 of 198 results for author: Wu, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00129  [pdf

    eess.IV cs.AI cs.HC

    Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction

    Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Rishi Agrawal, Carol C. Wu, Hien Van Nguyen

    Abstract: Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Submitted to the Journal

  2. arXiv:2406.19686  [pdf

    eess.IV cs.AI cs.CV cs.HC

    Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction

    Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Carol C. Wu, Hien Van Nguyen

    Abstract: Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CX… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Under Review in Journal

  3. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  4. arXiv:2406.10873  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies

    Authors: Chung-Wen Wu, Berlin Chen

    Abstract: Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim)… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  5. arXiv:2406.09569  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

    Authors: Frank Seide, Morrie Doulaty, Yangyang Shi, Yashesh Gaur, Junteng Jia, Chunyang Wu

    Abstract: We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio without explicit end-pointing. Speech ReaLLM is a special case of the more general ReaLLM ("real-time LLM") approach, also introduced here for the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.04262  [pdf, other

    eess.SP

    Near-field Beam Training with Sparse DFT Codebook

    Authors: Cong Zhou, Chenyu Wu, Changsheng You, Shuo Shi

    Abstract: Extremely large-scale array (XL-array) has emerged as one promising technology to improve the spectral efficiency and spatial resolution of future sixth generation (6G) wireless systems.The upsurge in the antenna number antennas renders communication users more likely to be located in the near-field region, which requires a more accurate spherical (instead of planar) wavefront propagation modeling… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: In this paper, we propose a novel sparse DFT codebook to reduce near-field beam training overhead, which is equivalent to sparsely activating the dense array

  7. arXiv:2405.19356  [pdf, other

    eess.SP cs.AI cs.LG cs.RO

    An LSTM Feature Imitation Network for Hand Movement Recognition from sEMG Signals

    Authors: Chuheng Wu, S. Farokh Atashzar, Mohammad M. Ghassemi, Tuka Alhanai

    Abstract: Surface Electromyography (sEMG) is a non-invasive signal that is used in the recognition of hand movement patterns, the diagnosis of diseases, and the robust control of prostheses. Despite the remarkable success of recent end-to-end Deep Learning approaches, they are still limited by the need for large amounts of labeled data. To alleviate the requirement for big data, researchers utilize Feature… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to RA-L, and under review

  8. arXiv:2405.18526  [pdf, other

    eess.SY physics.soc-ph

    Unlocking the Potential of Renewable Energy Through Curtailment Prediction

    Authors: Bilge Acun, Brent Morgan, Henry Richardson, Nat Steinsultz, Carole-Jean Wu

    Abstract: A significant fraction (5-15%) of renewable energy generated goes into waste in the grids around the world today due to oversupply issues and transmission constraints. Being able to predict when and where renewable curtailment occurs would improve renewable utilization. The core of this work is to enable the machine learning community to help decarbonize electricity grids by unlocking the potentia… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: The work was presented as a part of the Climate Change AI workshop at NeurIPS 2023

  9. arXiv:2405.17100  [pdf, other

    cs.CR cs.SD eess.AS

    Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

    Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, **g Chen, Kun He, Ruiying Du

    Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.11541  [pdf, other

    cs.IT eess.SP

    R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

    Authors: Huiying Yang, Zihan **, Chenhao Wu, Ru**g Xiong, Robert Caiming Qiu, Zenan Ling

    Abstract: Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  11. arXiv:2405.11155  [pdf, other

    eess.SY cs.CC

    Inner-approximate Reachability Computation via Zonotopic Boundary Analysis

    Authors: De** Ren, Zhen Liang, Chenyu Wu, Jianqiang Ding, Taoran Wu, Bai Xue

    Abstract: Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability… ▽ More

    Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: the extended version of the paper accepted by CAV 2024

  12. arXiv:2405.09539  [pdf, ps, other

    eess.IV cs.CV cs.MM

    MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

    Authors: Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

    Abstract: Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Early accepted to MICCAI 2024 (6/6/5)

  13. arXiv:2405.07717  [pdf, other

    eess.IV

    On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

    Authors: Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

    Abstract: Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. Adversarial samples in these studies are designed to attack only one dimension of either bitrate or distortion, targeting a submodel with a specific compression ratio. However, adversaries in real-world sce… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2405.03141  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

    Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-** Lam, Yong-** Zheng

    Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  15. arXiv:2404.15854  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

    Authors: Haolin Wu, **g Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

    Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TDSC

  16. arXiv:2404.11959  [pdf, other

    eess.SY

    Segmented Model-Based Hydrogen Delivery Control for PEM Fuel Cells: a Port-Hamiltonian Approach

    Authors: Lalitesh Kumar, Jian Chen, Chengshuai Wu, Yuzhu Chen, Arjan van der Schaft

    Abstract: This paper proposes an extended interconnection and dam** assignment passivity-based control technique (IDA-PBC) to control the pressure dynamics in the fuel delivery subsystem (FDS) of proton exchange membrane fuel cells. The fuel cell stack is a distributed parameter model which can be modeled by partial differential equations PDEs). In this paper, the segmentation concept is used to approxima… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 11 Figures

  17. Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  18. arXiv:2404.09140  [pdf, other

    cs.LG cs.IT eess.SP

    RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion

    Authors: Guoxuan Chi, Zheng Yang, Chenshu Wu, **gao Xu, Yuchong Gao, Yunhao Liu, Tony Xiao Han

    Abstract: Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Dif… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by MobiCom 2024

    ACM Class: I.2.0

  19. arXiv:2404.01716  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Effective internal language model training and fusion for factorized transducer model

    Authors: **xi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to ICASSP 2024

  20. arXiv:2403.15528  [pdf, other

    eess.IV cs.AI cs.CV

    Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs

    Authors: Yiliang Zhou, Hanley Ong, Patrick Kennedy, Carol Wu, Jacob Kazam, Keith Hentel, Adam Flanders, George Shih, Yifan Peng

    Abstract: The study examines the application of GPT-4V, a multi-modal large language model equipped with visual recognition, in detecting radiological findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs.

    Submitted 12 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  21. Multi-modal Heart Failure Risk Estimation based on Short ECG and Sampled Long-Term HRV

    Authors: Sergio González, Abel Ko-Chun Yi, Wan-Ting Hsieh, Wei-Chao Chen, Chun-Li Wang, Victor Chien-Chia Wu, Shang-Hung Chang

    Abstract: Cardiovascular diseases, including Heart Failure (HF), remain a leading global cause of mortality, often evading early detection. In this context, accessible and effective risk assessment is indispensable. Traditional approaches rely on resource-intensive diagnostic tests, typically administered after the onset of symptoms. The widespread availability of electrocardiogram (ECG) technology and the… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Journal ref: S. González, A. K.-C. Yi, W.-T. Hsieh, W.-C. Chen, C.-L. Wang, V. C.-C. Wu, S.-H. Chang, Multi-modal heart failure risk estimation based on short ECG and sampled long-term HRV, Information Fusion 107 (2024) 102337

  22. arXiv:2403.04232  [pdf, other

    cs.RO cs.AI cs.LG cs.MA eess.SY

    Generalizing Cooperative Eco-driving via Multi-residual Task Learning

    Authors: Vindula Jayawardana, Sirui Li, Cathy Wu, Yashar Farid, Kentaro Oguchi

    Abstract: Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control poli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted for publication at ICRA 2024

  23. arXiv:2402.17167  [pdf, ps, other

    eess.SY

    Converse Barrier Certificates for Finite-time Safety Verification of Continuous-time Perturbed Deterministic Systems

    Authors: Yonghan Li, Chenyu Wu, Taoran Wu, Shijie Wang, Bai Xue

    Abstract: In this paper, we investigate the problem of verifying the finite-time safety of continuous-time perturbed deterministic systems represented by ordinary differential equations in the presence of measurable disturbances. Given a finite time horizon, if the system is safe, it, starting from a compact initial set, will remain within an open and bounded safe region throughout the specified time horizo… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  24. arXiv:2402.09097  [pdf, other

    cs.RO cs.AI eess.SY

    A Digital Twin prototype for traffic sign recognition of a learning-enabled autonomous vehicle

    Authors: Mohamed AbdElSalam, Loai Ali, Saddek Bensalem, Weicheng He, Panagiotis Katsaros, Nikolaos Kekatos, Doron Peled, Anastasios Temperekidis, Changshun Wu

    Abstract: In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane kee**. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehi… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  25. arXiv:2402.04584  [pdf, other

    eess.IV cs.CV

    Troublemaker Learning for Low-Light Image Enhancement

    Authors: Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

    Abstract: Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for… ▽ More

    Submitted 2 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  26. arXiv:2401.08913  [pdf, other

    cs.CV eess.IV

    Efficient Image Super-Resolution via Symmetric Visual Attention Network

    Authors: Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, **g Hu

    Abstract: An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution ca… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages,4 figures

  27. arXiv:2401.04650  [pdf, other

    cs.RO eess.SY

    Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots

    Authors: Immanuel Ampomah Mensah, Jessica Healey, Celina Wu, Andrea Lacunza, Nathaniel Hanson, Kristen L. Dorsey

    Abstract: An underdeveloped capability in soft robotics is proprioceptive feedback control, where soft actuators can be sensed and controlled using only sensors on the robot's body. Additionally, soft actuators are often unable to support human-scale loads due to the extremely compliant materials in use. Develo** both feedback control and the ability to actuate under large loads (e.g. 500 N) are key capac… ▽ More

    Submitted 18 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  28. arXiv:2401.03150  [pdf, other

    eess.IV

    O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision

    Authors: Kaiyan Li, **gyuan Yang, Wenxuan Liang, Xingde Li, Chenxi Zhang, Lulu Chen, Chan Wu, Xiao Zhang, Zhiyan Xu, Yuelin Wang, Lihui Meng, Yue Zhang, Youxin Chen, S. Kevin Zhou

    Abstract: Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  29. arXiv:2401.00197  [pdf, other

    eess.AS

    ODAQ: Open Dataset of Audio Quality

    Authors: Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

    Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international labora… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

  30. arXiv:2312.17183  [pdf, other

    eess.IV cs.CV

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we focus on building up a model that aims to Segment Anything in medical scenarios, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we combine multiple knowledge sources to construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most compreh… ▽ More

    Submitted 1 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 53 pages

  31. arXiv:2312.10343  [pdf, other

    eess.SP cs.AR cs.LG cs.NE

    In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar

    Authors: Yang Sui, Minning Zhu, Lingyi Huang, Chung-Tse Michael Wu, Bo Yuan

    Abstract: Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  32. arXiv:2312.09436  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

    Authors: Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu

    Abstract: The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic. This paper considers advisory autonomy, in which real-time driving advisories are issued to drivers, thus blending the CAV and the human driver. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinfor… ▽ More

    Submitted 27 November, 2023; originally announced December 2023.

    Comments: 18 pages, 12 figures

  33. arXiv:2312.09429  [pdf

    eess.SP cs.LG

    Deep Learning-Enabled Swallowing Monitoring and Postoperative Recovery Biosensing System

    Authors: Chih-Ning Tsai, Pei-Wen Yang, Tzu-Yen Huang, Jung-Chih Chen, Hsin-Yi Tseng, Che-Wei Wu, Amrit Sarmah, Tzu-En Lin

    Abstract: This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material.

    Submitted 24 November, 2023; originally announced December 2023.

    Comments: the abstract can't uploaded fully

    MSC Class: NA ACM Class: A.0

  34. arXiv:2312.08343  [pdf

    eess.IV cs.CV q-bio.QM

    Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

    Authors: Zhuoyao Xin, Christopher Wu, Dong Liu, Chunming Gu, Jia Guo, Jun Hua

    Abstract: Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR… ▽ More

    Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, 2 tables

  35. arXiv:2311.12581  [pdf, other

    eess.IV cs.CV

    A Region of Interest Focused Triple UNet Architecture for Skin Lesion Segmentation

    Authors: Guoqing Liu, Yu Guo, Caiying Wu, Guoqing Chen, Barintag Saheya, Qiyu **

    Abstract: Skin lesion segmentation is of great significance for skin lesion analysis and subsequent treatment. It is still a challenging task due to the irregular and fuzzy lesion borders, and diversity of skin lesions. In this paper, we propose Triple-UNet to automatically segment skin lesions. It is an organic combination of three UNet architectures with suitable modules. In order to concatenate the first… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 15 pages, 5 figures

  36. arXiv:2311.09850  [pdf, other

    cs.IT eess.SP

    Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation

    Authors: Tianyu Liu, Changsheng You, Zeyang Hu, Chenyu Wu, Yi Gong, Kaibin Huang

    Abstract: Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop

  37. arXiv:2311.03682  [pdf, ps, other

    eess.SY cs.SI math.OC

    Incentive Design for Eco-driving in Urban Transportation Networks

    Authors: M. Umar B. Niazi, Jung-Hoon Cho, Munther A. Dahleh, Roy Dong, Cathy Wu

    Abstract: Eco-driving emerges as a cost-effective and efficient strategy to mitigate greenhouse gas emissions in urban transportation networks. Acknowledging the persuasive influence of incentives in sha** driver behavior, this paper presents the `eco-planner,' a digital platform devised to promote eco-driving practices in urban transportation. At the outset of their trips, users provide the platform with… ▽ More

    Submitted 16 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  38. arXiv:2310.08960  [pdf, other

    eess.SP

    A unified framework for STAR-RIS coefficients optimization

    Authors: Hancheng Zhu, Yuanwei Liu, Yik Chung Wu, Vincent K. N. Lau

    Abstract: Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraint… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  39. arXiv:2310.07689  [pdf, other

    eess.SY

    Hybrid System Stability Analysis of Multi-Lane Mixed-Autonomy Traffic

    Authors: Sirui Li, Roy Dong, Cathy Wu

    Abstract: Autonomous vehicles (AVs) hold vast potential to enhance transportation systems by reducing congestion, improving safety, and lowering emissions. AV controls lead to emergent traffic phenomena; one such intriguing phenomenon is traffic breaks (rolling roadblocks), where a single AV efficiently stabilizes multiple lanes through frequent lane switching, similar to the highway patrolling officers wea… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  40. arXiv:2309.15697  [pdf, other

    cs.CV eess.IV

    Physics Inspired Hybrid Attention for SAR Target Recognition

    Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han

    Abstract: There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly dep… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  41. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, **xi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  42. arXiv:2309.10917  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    End-to-End Speech Recognition Contextualization with Large Language Models

    Authors: Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

    Abstract: In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for contextualizing speech recognition models incorporating LLMs. Our approach casts speech recognition as a mixed-modal language modeling task based on a pretrained LLM. We pro… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  43. arXiv:2309.02539  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

    Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More

    Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

  44. arXiv:2309.01947  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

    Authors: Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

    Abstract: Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficien… ▽ More

    Submitted 27 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Meta AI; Submitted to ICASSP 2024

  45. arXiv:2308.08197  [pdf, other

    eess.IV cs.CV

    Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

    Authors: Jianyu Wen, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

    Abstract: In this paper, we propose a 2-stage low-light image enhancement method called Self-Reference Deep Adaptive Curve Estimation (Self-DACE). In the first stage, we present an intuitive, lightweight, fast, and unsupervised luminance enhancement algorithm. The algorithm is based on a novel low-light enhancement curve that can be used to locally boost image brightness. We also propose a new loss function… ▽ More

    Submitted 10 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  46. arXiv:2308.02356  [pdf, other

    cs.CV eess.IV

    T-UNet: Triplet UNet for Change Detection in High-Resolution Remote Sensing Images

    Authors: Huan Zhong, Chen Wu

    Abstract: Remote sensing image change detection aims to identify the differences between images acquired at different times in the same area. It is widely used in land management, environmental monitoring, disaster assessment and other fields. Currently, most change detection methods are based on Siamese network structure or early fusion structure. Siamese structure focuses on extracting object features at… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 21 pages, 11 figures, 6 tables

  47. arXiv:2307.15280  [pdf, other

    cs.IT eess.SP

    Active RIS-Assisted MIMO-OFDM System: Analyses and Prototype Measurements

    Authors: De-Ming Chian, Feng-Ji Chen, Yu-Chen Chang, Chao-Kai Wen, Chi-Hung Wu, Fu-Kang Wang, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: In this study, we develop an active reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) prototype compliant with the 5G New Radio standard at 3.5~GHz. The experimental results clearly indicate that active RIS plays a vital role in enhancing MIMO performance, surpassing passive RIS. Furthermore, when considering fac… ▽ More

    Submitted 14 November, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 5 pages, 5 figures, 1 table, accepted by IEEE Communications Letters, for demo video see: https://www.youtube.com/watch?v=3R6eZXizwns

  48. arXiv:2307.11795  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Prompting Large Language Models with Speech Recognition Abilities

    Authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, **xi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  49. arXiv:2307.11784  [pdf, other

    cs.LG cs.AI cs.SE eess.SY

    What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

    Authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao

    Abstract: Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verificati… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  50. arXiv:2306.16206  [pdf, other

    eess.SP cs.IT

    Near-Field Beam Management for Extremely Large-Scale Array Communications

    Authors: Changsheng You, Yunpu Zhang, Chenyu Wu, Yong Zeng, Beixiong Zheng, Li Chen, Linglong Dai, A. Lee Swindlehurst

    Abstract: Extremely large-scale arrays (XL-arrays) have emerged as a promising technology to achieve super-high spectral efficiency and spatial resolution in future wireless systems. The large aperture of XL-arrays means that spherical rather than planar wavefronts must be considered, and a paradigm shift from far-field to near-field communications is necessary. Unlike existing works that have mainly consid… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: We studied the new near-field beam management for XL-arrays. This paper has been submitted to IEEE for possible publication