Skip to main content

Showing 1–50 of 122 results for author: Jiang, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00947  [pdf, other

    eess.SY

    Fleet Size and Spill for UAM Operation under Uncertain Demand

    Authors: Shangqing Cao, Xuan Jiang, Emin Burak Onat, Bo Zou, Mark Hansen, Raja Sengupta, Anjan Chakrabarty

    Abstract: Variation and imbalance in demand poses significant challenges to Urban Air Mobility (UAM) operations, affecting strategic decisions such as fleet sizing. To study the implications of demand variation on UAM fleet operations, we propose a stochastic passenger arrival time generation model that uses real-world data to infer demand distributions, and two integer programs that compute the zero-spill… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  3. arXiv:2405.11831  [pdf, other

    eess.AS cs.LG

    SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

    Authors: Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani

    Abstract: Transformers have revolutionized deep learning across various tasks, including audio representation learning, due to their powerful modeling capabilities. However, they often suffer from quadratic complexity in both GPU memory usage and computational inference time, affecting their efficiency. Recently, state space models (SSMs) like Mamba have emerged as a promising alternative, offering a more e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Code at https://github.com/SiavashShams/ssamba

  4. arXiv:2405.11118  [pdf, other

    eess.SY

    A Simulation-Optimization Framework for Develo** Wind-Resilient AAM Networks

    Authors: Emin Burak Onat, Shangqing Cao, Raiyan Rizwan, Xuan Jiang, Mark Hansen, Raja Sengupta, Anjan Chakrabarty

    Abstract: Environmental factors pose a significant challenge to the operational efficiency and safety of advanced air mobility (AAM) networks. This paper presents a simulation-optimization framework that dynamically integrates wind variability into AAM operations. We employ a nonlinear charging model within a multi-vertiport environment to optimize fleet size and scheduling. Our framework assesses the impac… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRAT 2024

  5. arXiv:2405.06342  [pdf, other

    cs.CV eess.IV

    Compression-Realized Deep Structural Network for Video Quality Enhancement

    Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

    Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  6. arXiv:2405.01242  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

    Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

    Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2403.18257  [pdf, other

    eess.AS cs.SD

    Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

    Authors: Xilin Jiang, Cong Han, Nima Mesgarani

    Abstract: Transformers have been the most successful architecture for various speech modeling tasks, including speech separation. However, the self-attention mechanism in transformers with quadratic complexity is inefficient in computation and memory. Recent models incorporate new layers and modules along with transformers for better performance but also introduce extra model complexity. In this work, we re… ▽ More

    Submitted 30 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: work in progress

  8. arXiv:2403.15433  [pdf, other

    eess.SP cs.AI cs.LG eess.IV

    HyPer-EP: Meta-Learning Hybrid Personalized Models for Cardiac Electrophysiology

    Authors: Xiajun Jiang, Sumeet Vadhavkar, Yubo Ye, Maryam Toloubidokhti, Ryan Missel, Linwei Wang

    Abstract: Personalized virtual heart models have demonstrated increasing potential for clinical use, although the estimation of their parameters given patient-specific data remain a challenge. Traditional physics-based modeling approaches are computationally costly and often neglect the inherent structural errors in these models due to model simplifications and assumptions. Modern deep learning approaches,… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  9. arXiv:2403.11651  [pdf, other

    eess.IV

    Overfitted image coding at reduced complexity

    Authors: Théophile Blard, Théo Ladune, Pierrick Philippe, Gordon Clare, Xiaoran Jiang, Olivier Déforges

    Abstract: Overfitted image codecs offer compelling compression performance and low decoder complexity, through the overfitting of a lightweight decoder for each image. Such codecs include Cool-chic, which presents image coding performance on par with VVC while requiring around 2000 multiplications per decoded pixel. This paper proposes to decrease Cool-chic encoding and decoding complexity. The encoding com… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 5 pages, submitted to European Signal Processing Conference (EUSIPCO) 2024

  10. arXiv:2403.06901  [pdf, other

    eess.IV cs.AI cs.LG

    LIBR+: Improving Intraoperative Liver Registration by Learning the Residual of Biomechanics-Based Deformable Registration

    Authors: Dingrong Wang, Soheil Azadvar, Jon Heiselman, Xiajun Jiang, Michael Miga, Linwei Wang

    Abstract: The surgical environment imposes unique challenges to the intraoperative registration of organ shapes to their preoperatively-imaged geometry. Biomechanical model-based registration remains popular, while deep learning solutions remain limited due to the sparsity and variability of intraoperative measurements and the limited ground-truth deformation of an organ that can be obtained during the surg… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 12 pages, Medical Image Computing and Computer Assisted Intervention 2024

  11. arXiv:2403.01692  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV eess.IV

    PI-AstroDeconv: A Physics-Informed Unsupervised Learning Method for Astronomical Image Deconvolution

    Authors: Shulei Ni, Yisheng Qiu, Yunchun Chen, Zihao Song, Hao Chen, Xuejian Jiang, Huaxi Chen

    Abstract: In the imaging process of an astronomical telescope, the deconvolution of its beam or Point Spread Function (PSF) is a crucial task. However, deconvolution presents a classical and challenging inverse computation problem. In scenarios where the beam or PSF is complex or inaccurately measured, such as in interferometric arrays and certain radio telescopes, the resultant blurry images are often chal… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  12. arXiv:2402.10533  [pdf, other

    cs.SD eess.AS

    APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding

    Authors: Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

    Abstract: This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is com… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  13. arXiv:2402.03710  [pdf, other

    eess.AS cs.CL cs.SD

    Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

    Authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

    Abstract: In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: preprint

  14. arXiv:2402.03476  [pdf, other

    eess.IV cs.LG physics.med-ph

    CT Material Decomposition using Spectral Diffusion Posterior Sampling

    Authors: Xiao Jiang, Grace J. Gang, J. Webster Stayman

    Abstract: In this work, we introduce a new deep learning approach based on diffusion posterior sampling (DPS) to perform material decomposition from spectral CT measurements. This approach combines sophisticated prior knowledge from unsupervised training with a rigorous physical model of the measurements. A faster and more stable variant is proposed that uses a jumpstarted process to reduce the number of ti… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 5 pages, 4 figures

  15. arXiv:2402.03468  [pdf, other

    cs.LG eess.SP

    Exact Tensor Completion Powered by Arbitrary Linear Transforms

    Authors: Li Ge, Xue Jiang, Lin Chen

    Abstract: In this work, a tensor completion problem is studied, which aims to perfectly recover the tensor from partial observations. Existing theoretical guarantee requires the involved transform to be orthogonal, which hinders its applications. In this paper, jum** out of the constraints of isotropy or self-adjointness, the theoretical guarantee of exact tensor completion with arbitrary linear transform… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  16. arXiv:2401.12004  [pdf

    eess.IV cs.LG eess.SP

    NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction

    Authors: Xinrui Jiang, Yohan Jun, Jae** Cho, Mengze Gao, Xingwang Yong, Berkin Bilgic

    Abstract: Typical quantitative MRI (qMRI) methods estimate parameter maps after image reconstructing, which is prone to biases and error propagation. We propose a Nonlinear Conjugate Gradient (NLCG) optimizer for model-based T2/T1 estimation, which incorporates U-Net regularization trained in a scan-specific manner. This end-to-end method directly estimates qMRI maps from undersampled k-space data using mon… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 8 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 2024

  17. arXiv:2312.15942  [pdf, other

    cs.CV eess.IV

    Pano-NeRF: Synthesizing High Dynamic Range Novel Views with Geometry from Sparse Low Dynamic Range Panoramic Images

    Authors: Zhan Lu, Qian Zheng, Boxin Shi, Xudong Jiang

    Abstract: Panoramic imaging research on geometry recovery and High Dynamic Range (HDR) reconstruction becomes a trend with the development of Extended Reality (XR). Neural Radiance Fields (NeRF) provide a promising scene representation for both tasks without requiring extensive prior data. However, in the case of inputting sparse Low Dynamic Range (LDR) panoramic images, NeRF often degrades with under-const… ▽ More

    Submitted 23 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  18. arXiv:2312.06197  [pdf, other

    cs.SD cs.MM eess.AS

    MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

    Authors: Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang

    Abstract: Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up s… ▽ More

    Submitted 19 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

  19. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  20. arXiv:2312.01464  [pdf, other

    physics.med-ph cs.CV eess.IV physics.comp-ph

    CT Reconstruction using Diffusion Posterior Sampling conditioned on a Nonlinear Measurement Model

    Authors: Shudong Li, Xiao Jiang, Matthew Tivnan, Grace J. Gang, Yuan Shen, J. Webster Stayman

    Abstract: Diffusion models have been demonstrated as powerful deep learning tools for image generation in CT reconstruction and restoration. Recently, diffusion posterior sampling, where a score-based diffusion prior is combined with a likelihood model, has been used to produce high quality CT images given low-quality measurements. This technique is attractive since it permits a one-time, unsupervised train… ▽ More

    Submitted 11 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: 24 pages, 12 figures, 1 table, submitted to SPIE Journal of Medical Imaging. Updated with more realistic phantom data, Poisson likelihood, and additional evaluations including hallucination evaluation, performance under multiple noise levels, inference time evaluation, and etc. Changes in authorship is based on unanimous agreement to acknowledge the adding authors' contributions in this work

    ACM Class: J.3; I.4.4; I.4.5

  21. arXiv:2311.13616  [pdf, other

    eess.IV cs.CV

    Online Video Quality Enhancement with Spatial-Temporal Look-up Tables

    Authors: Zefan Qu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Cairong Zhao

    Abstract: Low latency rates are crucial for online video-based applications, such as video conferencing and cloud gaming, which make improving video quality in online scenarios increasingly important. However, existing quality enhancement methods are limited by slow inference speed and the requirement for temporal information contained in future frames, making it challenging to deploy them directly in onlin… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  22. arXiv:2311.09655  [pdf, other

    cs.SD cs.CV eess.AS

    Multi-View Spectrogram Transformer for Respiratory Sound Classification

    Authors: Wentao He, Yuchen Yan, Jianfeng Ren, Ruibin Bai, Xudong Jiang

    Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MV… ▽ More

    Submitted 30 May, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: The paper was published at ICASSP 2024

  23. arXiv:2311.00483  [pdf, other

    eess.IV cs.CV

    DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation

    Authors: Xiaohua Jiang, Yihao Guo, Jian Huang, Yuting Wu, Meiyi Luo, Zhaoyang Xu, Qianni Zhang, Xingru Huang, Hong He, Shaowei Jiang, **g Ye, Mang Xiao

    Abstract: The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 36pages,16figures,7tables

    MSC Class: 68; 92 ACM Class: I.4; J.3

  24. arXiv:2310.00912  [pdf

    cs.AR eess.SP

    A Resource-efficient FIR Filter Design Based on an RAG Improved Algorithm

    Authors: Mengwei Hu, Zhengxiong Li, Xianyang Jiang

    Abstract: In modern digital filter chip design, efficient resource utilization is a hot topic. Due to the linear phase characteristics of FIR filters, a pulsed fully parallel structure can be applied to address the problem. To further reduce hardware resource consumption, especially related to multiplication functions, an improved RAG algorithm has been proposed. Filters with different orders and for differ… ▽ More

    Submitted 23 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 4 pages, 3 figures, Conference paper for ICCS (International Conference on Circuits and Systems) 2023

  25. arXiv:2309.15938  [pdf, other

    eess.AS cs.LG cs.SD

    Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

    Authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

    Abstract: In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augment… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  26. arXiv:2309.09493  [pdf, other

    eess.AS cs.AI cs.SD

    HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

    Authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

    Abstract: Recent advancements in speech synthesis have leveraged GAN-based networks like HiFi-GAN and BigVGAN to produce high-fidelity waveforms from mel-spectrograms. However, these networks are computationally expensive and parameter-heavy. iSTFTNet addresses these limitations by integrating inverse short-time Fourier transform (iSTFT) into the network, achieving both speed and parameter efficiency. In th… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  27. arXiv:2308.07221   

    cs.SD cs.LG eess.AS

    AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

    Authors: Zhaohui Li, Haitao Wang, Xinghua Jiang

    Abstract: We propose a method named AudioFormer,which learns audio feature representations through the acquisition of discrete acoustic codes and subsequently fine-tunes them for audio classification tasks. Initially,we introduce a novel perspective by considering the audio classification task as a form of natural language understanding (NLU). Leveraging an existing neural audio codec model,we generate disc… ▽ More

    Submitted 25 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Need to supplement more detailed experiments

  28. arXiv:2306.00714  [pdf, other

    cs.CV cs.LG eess.IV

    Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

    Authors: Ruibin Li, Qihua Zhou, Song Guo, Jie Zhang, **gcai Guo, Xinyang Jiang, Yifei Shen, Zhenhua Han

    Abstract: Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hard… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  29. DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

    Authors: Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

    Abstract: Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio repre… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, pp.2818--2822

  30. arXiv:2304.13583  [pdf, other

    eess.IV cs.CV

    Multi-Modality Deep Network for Extreme Learned Image Compression

    Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

    Abstract: Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior i… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 13 pages, 14 figures, accepted by AAAI 2023

  31. arXiv:2303.00334  [pdf, other

    eess.IV cs.CV

    Online Streaming Video Super-Resolution with Convolutional Look-Up Table

    Authors: Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, Xiaohong Liu, Huan Yang, Yuqing Yang, Dongsheng Li, Lili Qiu

    Abstract: Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to… ▽ More

    Submitted 25 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  32. arXiv:2302.00301  [pdf, other

    cs.IT eess.SP

    Covert Communication in Hybrid Microwave/mmWave A2G Systems with Transmission Mode Selection

    Authors: Wenhao Zhang, Ji He, Yulong Shen, Xiaohong Jiang

    Abstract: This paper investigates the covert communication in an air-to-ground (A2G) system, where a UAV (Alice) can adopt the omnidirectional microwave (OM) or directional mmWave (DM) transmission mode to transmit covert data to a ground user (Bob) while suffering from the detection of an adversary (Willie). For both the OM and DM modes, we first conduct theoretical analysis to reveal the inherent relation… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  33. arXiv:2301.12688  [pdf, other

    cs.GR cs.CV cs.HC cs.MM eess.IV

    Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

    Authors: Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao **, Dahua Lin, Bo Dai

    Abstract: Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments, where the filming staff can easily test the settings of shots before the actual… ▽ More

    Submitted 21 July, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Project page: https://virtualfilmstudio.github.io/

  34. arXiv:2301.08810  [pdf, other

    cs.CL cs.SD eess.AS

    Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

    Authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

    Abstract: Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we pro… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  35. arXiv:2301.05911  [pdf, other

    cs.LG eess.SY

    Day-Ahead PV Power Forecasting Based on MSTL-TFT

    Authors: Xuetao Jiang, Meiyu Jiang, Qingguo Zhou

    Abstract: In recent years, renewable energy resources have accounted for an increasing share of electricity energy.Among them, photovoltaic (PV) power generation has received broad attention due to its economic and environmental benefits.Accurate PV generation forecasts can reduce power dispatch from the grid, thus increasing the supplier's profit in the day-ahead electricity market.The power system of a PV… ▽ More

    Submitted 31 January, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

  36. arXiv:2212.08653  [pdf, other

    cs.CV eess.IV

    Attentive Mask CLIP

    Authors: Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

    Abstract: Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incor… ▽ More

    Submitted 9 October, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2771-2781

  37. arXiv:2211.11960  [pdf, other

    cs.SD cs.LG eess.AS

    Disentangled Feature Learning for Real-Time Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs. This is mostly achieved by following the VQ-VAE paradigm where blind features are learned, vector-quantized and coded. In this paper, instead of blind end-to-end learning, we propose to learn disentangled features for real-time neural speech coding. Specificall… ▽ More

    Submitted 24 February, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023 (Accepted)

  38. arXiv:2211.11557  [pdf

    eess.IV cs.CV cs.LG

    Decomposing 3D Neuroimaging into 2+1D Processing for Schizophrenia Recognition

    Authors: Mengjiao Hu, Xudong Jiang, Kang Sim, Juan Helen Zhou, Cuntai Guan

    Abstract: Deep learning has been successfully applied to recognizing both natural images and medical images. However, there remains a gap in recognizing 3D neuroimaging data, especially for psychiatric diseases such as schizophrenia and depression that have no visible alteration in specific slices. In this study, we propose to process the 3D data by a 2+1D framework so that we can exploit the powerful deep… ▽ More

    Submitted 21 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

  39. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  40. arXiv:2209.02196  [pdf, other

    eess.SP

    Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

    Authors: Li Ge, Xue Jiang, Lin Chen, Qibo Qin, Xingzhao Liu

    Abstract: With the scale of antenna arrays and the bandwidth increasing, many existing narrowband channel estimation methods ignoring the effect of beam squint may face severe performance degradation in wideband millimeter-wave (mmWave) communication systems. In this letter, a wideband Newtonized orthogonal matching pursuit (wNOMP) algorithm has been proposed to perform channel estimation. The proposed meth… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  41. Latent Variable Models in the Era of Industrial Big Data: Extension and Beyond

    Authors: Xiangyin Kong, Xiaoyu Jiang, Bingxin Zhang, **song Yuan, Zhiqiang Ge

    Abstract: A rich supply of data and innovative algorithms have made data-driven modeling a popular technique in modern industry. Among various data-driven methods, latent variable models (LVMs) and their counterparts account for a major share and play a vital role in many industrial modeling areas. LVM can be generally divided into statistical learning-based classic LVM and neural networks-based deep LVM (D… ▽ More

    Submitted 5 October, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

  42. arXiv:2208.08868  [pdf

    eess.SP physics.optics

    Physics-Informed Neural Operator for Fast and Scalable Optical Fiber Channel Modelling in Multi-Span Transmission

    Authors: Yuchen Song, Danshi Wang, Qirui Fan, Xiaotian Jiang, Xiao Luo, Min Zhang

    Abstract: We propose efficient modelling of optical fiber channel via NLSE-constrained physics-informed neural operator without reference solutions. This method can be easily scalable for distance, sequence length, launch power, and signal formats, and is implemented for ultra-fast simulations of 16-QAM signal transmission with ASE noise.

    Submitted 11 July, 2022; originally announced August 2022.

    Comments: accepted by ECOC2022

  43. arXiv:2207.08363  [pdf, other

    cs.SD cs.LG eess.AS

    Latent-Domain Predictive Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a convolutional neural network for encoding, by which there are still temporal redundancies within encoded features. This paper introduces latent-domai… ▽ More

    Submitted 25 May, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING (TASLP)

  44. arXiv:2207.06617  [pdf, other

    eess.IV cs.CV

    Perception-Oriented Stereo Image Super-Resolution

    Authors: Chenxi Ma, Bo Yan, Weimin Tan, Xuhao Jiang

    Abstract: Recent studies of deep learning based stereo image super-resolution (StereoSR) have promoted the development of StereoSR. However, existing StereoSR models mainly concentrate on improving quantitative evaluation metrics and neglect the visual quality of super-resolved stereo images. To improve the perceptual performance, this paper proposes the first perception-oriented stereo image super-resoluti… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 9 pages, 10 figures, ACM MM 2021

  45. arXiv:2207.03067  [pdf, other

    cs.SD cs.LG eess.AS

    Cross-Scale Vector Quantization for Scalable Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022(Accepted)

  46. arXiv:2207.00993  [pdf, other

    cs.SD cs.MM eess.AS

    Towards Error-Resilient Neural Speech Coding

    Authors: Huaying Xue, Xiulian Peng, Xue Jiang, Yan Lu

    Abstract: Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend to be extremely sensitive to channel noises, especially in wireless channels with high error rate. In this paper, we investigate how to elevate the error resili… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 5 pages, Interspeech 2022(Accepted)

  47. Learning the policy for mixed electric platoon control of automated and human-driven vehicles at signalized intersection: a random search approach

    Authors: Xia Jiang, Jian Zhang, Xiaoyu Shi, Jian Cheng

    Abstract: The upgrading and updating of vehicles have accelerated in the past decades. Out of the need for environmental friendliness and intelligence, electric vehicles (EVs) and connected and automated vehicles (CAVs) have become new components of transportation systems. This paper develops a reinforcement learning framework to implement adaptive control for an electric platoon composed of CAVs and human-… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Journal ref: IEEE Transactions on Intelligent Transportation Systems (2023)

  48. arXiv:2206.00947  [pdf, other

    cs.CV eess.IV

    A Bhattacharyya Coefficient-Based Framework for Noise Model-Aware Random Walker Image Segmentation

    Authors: Dominik Drees, Florian Eilers, Ang Bian, Xiaoyi Jiang

    Abstract: One well established method of interactive image segmentation is the random walker algorithm. Considerable research on this family of segmentation methods has been continuously conducted in recent years with numerous applications. These methods are common in using a simple Gaussian weight function which depends on a parameter that strongly influences the segmentation performance. In this work we p… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: Dominik Drees and Florian Eilers contributed equally to this work

  49. arXiv:2205.14833  [pdf, other

    cs.LG cs.DC eess.SY

    Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

    Authors: Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, **de Song, Bin Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen

    Abstract: To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a c… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

    Comments: Accepted by OSDI 2022

  50. arXiv:2205.07390  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Learning Representations for New Sound Classes With Continual Self-Supervised Learning

    Authors: Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis

    Abstract: In this paper, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framework where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically rel… ▽ More

    Submitted 13 December, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Signal Processing Letters