Skip to main content

Showing 1–50 of 238 results for author: Chen, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.17932  [pdf, other

    cs.RO cs.MM cs.SD eess.AS

    SonicSense: Object Perception from In-Hand Acoustic Vibration

    Authors: Jiaxun Liu, Boyuan Chen

    Abstract: We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/SonicSense

  3. arXiv:2406.17661  [pdf, other

    eess.SY

    Neuro-Modeling Infused EMT Analytics

    Authors: Qing Shen, Yifan Zhou, Peng Zhang, Yacov A. Shamash, Xiaochuan Luo, Bin Wang, Huanfeng Zhao, Roshan Sharma, Bo Chen

    Abstract: The paper presents a systematic approach to develo** Physics-Informed neuro-Models (PIM) for the transient analysis of power grids interconnected with renewables. PIM verifies itself as an adequate digital twin of power components, taking full advantage of physical constraints while requiring only a small fraction of data for training. Three new contributions are presented: 1) An PINN-enabled ne… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17577  [pdf, other

    eess.IV cs.CV

    Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images

    Authors: Boyu Chen, Ameenat L. Solebo, Paul Taylor

    Abstract: Anterior uveitis, a common form of eye inflammation, can lead to permanent vision loss if not promptly diagnosed. Monitoring this condition involves quantifying inflammatory cells in the anterior chamber (AC) of the eye, which can be captured using Anterior Segment Optical Coherence Tomography (AS-OCT). However, manually identifying cells in AS-OCT images is time-consuming and subjective. Moreover… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.12292  [pdf, other

    cs.SD cs.AI eess.AS

    JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

    Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

    Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.10873  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies

    Authors: Chung-Wen Wu, Berlin Chen

    Abstract: Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim)… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  7. arXiv:2406.10677  [pdf, ps, other

    eess.SY

    Intermittent Encryption Strategies for Anti-Eavesdrop** Estimation

    Authors: Zhongyao Hu, Bo Chen, Pindi Weng, Jianzheng Wang, Li Yu

    Abstract: In this paper, an anti-eavesdrop** estimation problem is investigated. A linear encryption scheme is utilized, which first linearly transforms innovation via an encryption matrix and then encrypts some components of the transformed innovation. To reduce the computation and energy resources consumed by the linear encryption scheme, both stochastic and deterministic intermittent strategies which p… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

    MSC Class: 93E-xx

  8. arXiv:2406.09082  [pdf

    eess.SY cs.AI

    Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles

    Authors: Hao Zhang, Nuo Lei, Boli Chen, Bingbing Li, Rulong Li, Zhi Wang

    Abstract: Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  9. arXiv:2406.02859   

    eess.AS cs.SD

    ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

    Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

  10. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  11. arXiv:2405.18205  [pdf, other

    eess.SY eess.SP

    Joint Radar Sensing, Location, and Communication Resources Optimization in 6G Network

    Authors: Haijun Zhang, Bowen Chen, Xiangnan Liu, Chao Ren

    Abstract: The possibility of jointly optimizing location sensing and communication resources, facilitated by the existence of communication and sensing spectrum sharing, is what promotes the system performance to a higher level. However, the rapid mobility of user equipment (UE) can result in inaccurate location estimation, which can severely degrade system performance. Therefore, the precise UE location se… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages,9 figures and 4 charts. This paper has been accepted for publication in the IEEE Journal on Selected Areas in Communications

  12. arXiv:2405.15413  [pdf, other

    eess.IV cs.CV cs.IT

    MambaVC: Learned Visual Compression with Selective State Spaces

    Authors: Shiyu Qin, **peng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei Wang

    Abstract: Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity a… ▽ More

    Submitted 28 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 17pages,15 figures

  13. arXiv:2405.09814  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

    Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

    Abstract: In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024 (Journal Track); Project page: https://pku-mocca.github.io/Semantic-Gesticulator-Page

  14. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  15. arXiv:2404.15309  [pdf, other

    eess.SP cs.LG q-bio.NC

    Sparse Bayesian Correntropy Learning for Robust Muscle Activity Reconstruction from Noisy Brain Recordings

    Authors: Yuanhao Li, Badong Chen, Natsue Yoshimura, Yasuharu Koike, Okito Yamashita

    Abstract: Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  16. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  17. arXiv:2404.07575  [pdf

    cs.SD cs.AI eess.AS

    An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

    Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distri… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Findings

  18. arXiv:2403.14268  [pdf

    eess.AS cs.SD

    Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

    Authors: PeiYing Lee, HauYun Guo, Berlin Chen

    Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

    Report number: TAAI2023-Domestic-131

  19. arXiv:2403.10962  [pdf, other

    cs.CV eess.IV

    Exploiting Topological Priors for Boosting Point Cloud Generation

    Authors: Baiyuan Chen

    Abstract: This paper presents an innovative enhancement to the Sphere as Prior Generative Adversarial Network (SP-GAN) model, a state-of-the-art GAN designed for point cloud generation. A novel method is introduced for point cloud generation that elevates the structural integrity and overall quality of the generated point clouds by incorporating topological priors into the training process of the generator.… ▽ More

    Submitted 26 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: 7 pages, 3 figures

  20. arXiv:2403.08654  [pdf, other

    eess.AS cs.SD

    An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

  21. arXiv:2403.01792  [pdf, other

    cs.SD eess.AS

    ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

    Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

    Abstract: Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to i… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  22. arXiv:2403.01785  [pdf, other

    cs.SD eess.AS

    What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

    Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

    Abstract: This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  23. arXiv:2402.17189  [pdf

    cs.CL cs.AI cs.SD eess.AS

    An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

    Authors: Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen

    Abstract: With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR). However, the codeswitching phenomenon remains a major obstacle that hinders ASR from perfection, as the lack of labeled data and the variations between languages often lead to degradation of ASR performance. In this paper, we focus exclus… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: ICASSP 2024

  24. arXiv:2402.15738  [pdf, other

    cs.CR eess.SY

    Privacy-Preserving State Estimation in the Presence of Eavesdroppers: A Survey

    Authors: Xinhao Yan, Guanzhong Zhou, Daniel E. Quevedo, Carlos Murguia, Bo Chen, Hailong Huang

    Abstract: Networked systems are increasingly the target of cyberattacks that exploit vulnerabilities within digital communications, embedded hardware, and software. Arguably, the simplest class of attacks -- and often the first type before launching destructive integrity attacks -- are eavesdrop** attacks, which aim to infer information by collecting system data and exploiting it for malicious purposes. A… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, 4 tables

  25. arXiv:2402.02140  [pdf, other

    cs.CV eess.IV

    Generative Visual Compression: A Review

    Authors: Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye

    Abstract: Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  26. arXiv:2401.12587  [pdf, other

    eess.IV cs.CV

    An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding

    Authors: Xiang Liu, Jiahong Chen, Bin Chen, Zimo Liu, Baoyi An, Shu-Tao Xia, Zhi Wang

    Abstract: Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technol… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  27. arXiv:2401.11095  [pdf, other

    cs.HC cs.SD eess.AS

    SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

    Authors: Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, Anhong Guo

    Abstract: Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum… ▽ More

    Submitted 26 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: DIS 2024

  28. arXiv:2312.16963  [pdf, other

    eess.IV cs.CV

    FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information

    Authors: Yichong Xia, Yujun Huang, Bin Chen, Haoqian Wang, Yaowei Wang

    Abstract: Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC metho… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

  29. arXiv:2312.09583  [pdf

    cs.CL cs.SD eess.AS

    Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

    Authors: Tzu-Ting Yang, Hsin-Wei Wang, Berlin Chen

    Abstract: In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to achieve human-like recognition without the need to build a pronunciation dictionary in advance. However, due to the relative scarcity of training data on code-switchi… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

  30. arXiv:2312.07290  [pdf, other

    cs.RO eess.SY

    Underwater motions analysis and control of a coupling-tiltable unmanned aerial-aquatic quadrotor

    Authors: Dongyue Huang, Chenggang Wang, Minghao Dou, Xuchen Liu, Zixuan Liu, Biao Wang, Ben M. Chen

    Abstract: This paper proposes a method for analyzing a series of potential motions in a coupling-tiltable aerial-aquatic quadrotor based on its nonlinear dynamics. Some characteristics and constraints derived by this method are specified as Singular Thrust Tilt Angles (STTAs), utilizing to generate motions including planar motions. A switch-based control scheme addresses issues of control direction uncertai… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Unmanned Aerial-Aquatic Vehicle

  31. arXiv:2312.06668  [pdf

    cs.CL cs.SD eess.AS

    Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

    Authors: Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

    Abstract: Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted to ASRU 2023

  32. arXiv:2312.00727  [pdf, other

    cs.LG cs.AI eess.SY

    Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

    Authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu

    Abstract: This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  33. arXiv:2311.13847  [pdf, other

    cs.CV cs.IT eess.IV

    Perceptual Image Compression with Cooperative Cross-Modal Side Information

    Authors: Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Xia

    Abstract: The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have… ▽ More

    Submitted 28 November, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

  34. arXiv:2310.19180  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

    Authors: Yao Yao, Peike Li, Boyu Chen, Alex Wang

    Abstract: With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner… ▽ More

    Submitted 2 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: Preprints

  35. arXiv:2310.18780  [pdf, other

    cs.LG cs.AI eess.SP

    Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

    Authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio

    Abstract: Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input se… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  36. arXiv:2310.01839  [pdf

    eess.AS cs.CL cs.SD

    Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

    Authors: Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to quantify the pronunciation proficiency of a second language (L2) learner in a language. Prevailing approaches to APA normally leverage neural models trained with a regression loss function, such as the mean-squared error (MSE) loss, for proficiency level prediction. Despite most regression models can effectively capture the ordinality of proficie… ▽ More

    Submitted 4 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU 2023

  37. arXiv:2309.13753  [pdf, other

    cs.RO eess.SY

    Policy Stitching: Learning Transferable Robot Policies

    Authors: **cheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

    Abstract: Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: CoRL 2023

  38. arXiv:2309.12476  [pdf, other

    eess.SY

    Differentially Private Reward Functions for Markov Decision Processes

    Authors: Alexander Benvenuti, Calvin Hawkins, Brandon Fallin, Bo Chen, Brendan Bialy, Miriam Dennis, Matthew Hale

    Abstract: Markov decision processes often seek to maximize a reward function, but onlookers may infer reward functions by observing agents, which can reveal sensitive information. Therefore, in this paper we introduce and compare two methods for privatizing reward functions in policy synthesis for multi-agent Markov decision processes, which generalize Markov decision processes. Reward functions are privati… ▽ More

    Submitted 4 February, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 11 Pages, 7 figures

  39. arXiv:2308.13777  [pdf, other

    eess.SP cs.CV cs.LG

    Self-Supervised Scalable Deep Compressed Sensing

    Authors: Bin Chen, Xuanyu Zhang, Shuai Liu, Yongbing Zhang, Jian Zhang

    Abstract: Compressed sensing (CS) is a promising tool for reducing sampling costs. Current deep neural network (NN)-based CS methods face challenges in collecting labeled measurement-ground truth (GT) data and generalizing to real applications. This paper proposes a novel $\mathbf{S}$elf-supervised s$\mathbf{C}$alable deep CS method, comprising a $\mathbf{L}$earning scheme called $\mathbf{SCL}$ and a family… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  40. arXiv:2308.12615  [pdf, other

    cs.SD eess.AS

    Naaloss: Rethinking the objective of speech enhancement

    Authors: Kuan-Hsun Ho, En-Lun Yu, Jeih-weih Hung, Berlin Chen

    Abstract: Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates "processing artifacts" that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  41. arXiv:2308.08968  [pdf, other

    eess.SP cs.IT

    On the Performance of Multidimensional Constellation Sha** for Linear and Nonlinear Optical Fiber Channel

    Authors: Bin Chen, Zhiwei Liang, Shen Li, Yi Lei, Gabriele Liga, Alex Alvarado

    Abstract: Multidimensional constellation sha** of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations. The results show that no constellation is universal and the balance of required and effective SNRs should be jointly considered for the specific optical transmission scenario.

    Submitted 18 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: The paper has been accepted by the ECOC 2023

  42. arXiv:2308.04729  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

    Authors: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

    Abstract: Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational ef… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  43. arXiv:2308.01201  [pdf, ps, other

    eess.SY

    A Real-Time Robust Ecological-Adaptive Cruise Control Strategy for Battery Electric Vehicles

    Authors: Sheng Yu, Xiao Pan, Anastasis Georgiou, Boli Chen, Imad M. Jaimoukha, Simos A. Evangelou

    Abstract: This work addresses the ecological-adaptive cruise control problem for connected electric vehicles by a computationally efficient robust control strategy. The problem is formulated in the space-domain with a realistic description of the nonlinear electric powertrain model and motion dynamics to yield a convex optimal control problem (OCP). The OCP is approached by a novel robust model predictive c… ▽ More

    Submitted 15 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 15 pages, 12 figures and 2 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2307.14907  [pdf, other

    eess.IV cs.CV q-bio.QM

    Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples

    Authors: Andrew H. Song, Mane Williams, Drew F. K. Williamson, Guillaume Jaume, Andrew Zhang, Bowen Chen, Robert Serafin, Jonathan T. C. Liu, Alex Baras, Anil V. Parwani, Faisal Mahmood

    Abstract: Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  45. arXiv:2307.12709  [pdf

    eess.SY

    A Dynamic Equivalent Energy Storage Model of Natural Gas Networks for Joint Optimal Dispatch of Electricity-Gas Systems

    Authors: Siyuan Wang, Wenchuan Wu, Chenhui Lin, Binbin Chen

    Abstract: The development of energy conversion techniques enhances the coupling between the gas network and power system. However, challenges remain in the joint optimal dispatch of electricity-gas systems. The dynamic model of the gas network, described by partial differential equations, is complex and computationally demanding for power system operators. Furthermore, information privacy concerns and limit… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 12 pages, 8 figures

  46. arXiv:2307.10495  [pdf, other

    cs.LG cs.CV eess.SP

    Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

    Authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi

    Abstract: Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning s… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 16 pages, 7 figures, Preprint

    ACM Class: I.2.6; I.2.10; I.4.0; I.4.9

    Journal ref: Proc. SPIE. Algorithms for Synthetic Aperture Radar Imagery XXX (Vol. 12520, pp. 96-111). 13 June 2023

  47. arXiv:2307.08950  [pdf, other

    cs.CV eess.IV

    Deep Physics-Guided Unrolling Generalization for Compressed Sensing

    Authors: Bin Chen, Jiechong Song, **gfen Xie, Jian Zhang

    Abstract: By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algori… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted by International Journal of Computer Vision (IJCV) 2023

  48. arXiv:2307.05179  [pdf, other

    cs.IT eess.SP

    A Simplified Method for Optimising Geometrically Shaped Constellations of Higher Dimensionality

    Authors: Kadir Gümüş, Bin Chen, Thomas Bradley, Chigo Okonkwo

    Abstract: We introduce a simplified method for calculating the loss function for use in geometric sha**, allowing for the optimisation of high dimensional constellations. We design constellations up to 12D with 4096 points, with gains up to 0.31 dB compared to the state-of-the-art.

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: This paper was accepted for the European Conference on Optical Communications (ECOC) 2023, this version is a pre-print

  49. Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing

    Authors: Jiechong Song, Bin Chen, Jian Zhang

    Abstract: Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burd… ▽ More

    Submitted 19 February, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: TIP, 2023

  50. arXiv:2306.14119  [pdf, other

    eess.IV cs.CV

    SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology Image

    Authors: Luyuan Xie, Cong Li, Zirui Wang, Xin Zhang, Boyan Chen, Qingni Shen, Zhonghai Wu

    Abstract: The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted by MICCAI 2023