Skip to main content

Showing 1–50 of 113 results for author: Tian, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

    Authors: Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu

    Abstract: Recent advances in pre-trained vision transformers have shown promise in parameter-efficient audio-visual learning without audio pre-training. However, few studies have investigated effective methods for aligning multimodal features in parameter-efficient audio-visual transformers. In this paper, we propose MA-AVT, a new parameter-efficient audio-visual transformer employing deep modality alignmen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted in Efficient Deep Learning for Computer Vision CVPR Workshop 2024

  2. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  3. arXiv:2406.01331  [pdf, other

    cs.IT eess.SP

    Performance Trade-off of Integrated Sensing and Communications for Multi-User Backscatter Systems

    Authors: Yuanming Tian, Dan Wang, Chuan Huang, Wei Zhang

    Abstract: This paper studies the performance trade-off in a multi-user backscatter communication (BackCom) system for integrated sensing and communications (ISAC), where the multi-antenna ISAC transmitter sends excitation signals to power multiple single-antenna passive backscatter devices (BD), and the multi-antenna ISAC receiver performs joint sensing (localization) and communication tasks based on the ba… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2405.09291  [pdf, other

    cs.CV cs.AI eess.IV

    Sensitivity Decouple Learning for Image Compression Artifacts Reduction

    Authors: Li Ma, Yifan Zhao, Peixi Peng, Yonghong Tian

    Abstract: With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a map** from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing ta… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by Transactions on Image Processing

  5. arXiv:2405.03567  [pdf, other

    cs.SD cs.AI eess.AS

    Deep Space Separable Distillation for Lightweight Acoustic Scene Classification

    Authors: ShuQi Ye, Yuan Tian

    Abstract: Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2404.01751  [pdf, other

    cs.CV cs.SD eess.AS

    T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

    Authors: Tanvir Mahmud, Yapeng Tian, Diana Marculescu

    Abstract: Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video. Existing self-supervised and weakly supervised source localization methods struggle to accurately distinguish the semantic regions of each sounding object, particularly in multi-source mixtures. These methods often rely on audio-visual correspondence as guidance… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Tech report. Accepted in CVPR-2024

  7. arXiv:2403.19002  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Robust Active Speaker Detection in Noisy Environments

    Authors: Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

    Abstract: This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation a… ▽ More

    Submitted 30 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 15 pages, 5 figures

  8. arXiv:2403.16699  [pdf, other

    cs.IT eess.SP

    Resonant Beam Communications: A New Design Paradigm and Challenges

    Authors: Yuanming Tian, Dongxu Li, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchro… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  9. arXiv:2403.16694  [pdf, other

    cs.IT eess.SP

    Design and Performance of Resonant Beam Communications -- Part II: Mobile Scenario

    Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario,… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  10. arXiv:2403.16676  [pdf, other

    cs.IT eess.SP

    Design and Performance of Resonant Beam Communications -- Part I: Quasi-Static Scenario

    Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the loc… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.07938  [pdf, other

    cs.SD cs.AI cs.CV cs.LG cs.MM eess.AS

    Text-to-Audio Generation Synchronized with Videos

    Authors: Shentong Mo, **g Shi, Yapeng Tian

    Abstract: In recent times, the focus on text-to-audio (TTA) generation has intensified, as researchers strive to synthesize audio from textual descriptions. However, most existing methods, though leveraging latent diffusion models to learn the correlation between audio and text embeddings, fall short when it comes to maintaining a seamless synchronization between the produced audio and its video. This often… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.12903

  12. arXiv:2402.17187  [pdf, other

    eess.IV cs.CV

    PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction

    Authors: Zhaoxin Guo, Zhipeng Wang, Ruiquan Ge, Jianxun Yu, Feiwei Qin, Yuan Tian, Yuqing Peng, Yonghong Li, Changmiao Wang

    Abstract: The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrat… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  13. arXiv:2402.16631  [pdf, other

    cs.AI cs.NI eess.SP

    GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning

    Authors: Hang Zou, Qiyang Zhao, Lina Bariah, Yu Tian, Mehdi Bennis, Samson Lasaulce, Merouane Debbah, Faouzi Bader

    Abstract: Generative artificial intelligence (GenAI) and communication networks are expected to have groundbreaking synergies in 6G. Connecting GenAI agents over a wireless network can potentially unleash the power of collective intelligence and pave the way for artificial general intelligence (AGI). However, current wireless networks are designed as a "data pipe" and are not suited to accommodate and lever… ▽ More

    Submitted 28 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  14. arXiv:2402.09461  [pdf, other

    eess.SP cs.LG

    A Novel Approach to WaveNet Architecture for RF Signal Separation with Learnable Dilation and Data Augmentation

    Authors: Yu Tian, Ahmed Alhammadi, Abdullah Quran, Abubakar Sani Ali

    Abstract: In this paper, we address the intricate issue of RF signal separation by presenting a novel adaptation of the WaveNet architecture that introduces learnable dilation parameters, significantly enhancing signal separation in dense RF spectrums. Our focused architectural refinements and innovative data augmentation strategies have markedly improved the model's ability to discern complex signal source… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  15. arXiv:2401.06149  [pdf, other

    cs.CV cs.LG eess.IV

    Image Classifier Based Generative Method for Planar Antenna Design

    Authors: Yang Zhong, Wei** Dou, Andrew Cohen, Dia'a Bisharat, Yuandong Tian, Jiang Zhu, Qing Huo Liu

    Abstract: To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in se… ▽ More

    Submitted 16 December, 2023; originally announced January 2024.

    Comments: 13 pages, 18 figures

  16. arXiv:2401.03816  [pdf, other

    eess.AS cs.SD

    Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss

    Authors: Yusheng Tian, **gyu Li, Tan Lee

    Abstract: This research is about the creation of personalized synthetic voices for head and neck cancer survivors. It is focused particularly on tongue cancer patients whose speech might exhibit severe articulation impairment. Our goal is to restore normal articulation in the synthesized speech, while maximally preserving the target speaker's individuality in terms of both the voice timbre and speaking styl… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  17. arXiv:2401.02101  [pdf, ps, other

    eess.SP

    ICI-Free Channel Estimation and Wireless Gesture Recognition Based on Cellular Signals

    Authors: Rui Peng, Yafei Tian, Shengqian Han

    Abstract: Device-free wireless sensing attracts enormous attentions since it senses the environment without additional devices. While cellular signals are good opportunistic radio sources, the influence of inter-cell interference (ICI) on wireless sensing has not been adequately addressed. In this letter, we first investigate the cause of ICI and its impact on wireless sensing. Then we propose an ICI-free c… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  18. arXiv:2312.09420  [pdf, other

    eess.SP cs.AI cs.IT

    Fairness-Driven Optimization of RIS-Augmented 5G Networks for Seamless 3D UAV Connectivity Using DRL Algorithms

    Authors: Yu Tian, Ahmed Alhammadi, Jiguang He, Aymen Fakhreddine, Faouzi Bader

    Abstract: In this paper, we study the problem of joint active and passive beamforming for reconfigurable intelligent surface (RIS)-assisted massive multiple-input multiple-output systems towards the extension of the wireless cellular coverage in 3D, where multiple RISs, each equipped with an array of passive elements, are deployed to assist a base station (BS) to simultaneously serve multiple unmanned aeria… ▽ More

    Submitted 14 November, 2023; originally announced December 2023.

  19. arXiv:2312.01573  [pdf

    eess.IV cs.CV

    Survey on deep learning in multimodal medical imaging for cancer detection

    Authors: Yan Tian, Zhaocheng Xu, Yujun Ma, Wei** Ding, Ruili Wang, Zhihong Gao, Guohua Cheng, Linyang He, Xuran Zhao

    Abstract: The task of multimodal cancer detection is to determine the locations and categories of lesions by using different imaging techniques, which is one of the key research methods for cancer diagnosis. Recently, deep learning-based object detection has made significant developments due to its strength in semantic feature extraction and nonlinear function fitting. However, multimodal cancer detection r… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Journal ref: Neural Computing and Applications. 2023 Nov 29:1-6

  20. arXiv:2311.16856  [pdf, other

    cs.LG eess.SP stat.ML

    Attentional Graph Neural Networks for Robust Massive Network Localization

    Authors: Wenzhong Yan, Juntao Wang, Feng Yin, Yang Tian, Abdelhak M. Zoubir

    Abstract: In recent years, Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. However, their application in regression tasks remains underexplored. To tap the potential of GNNs in regression, this paper integrates GNNs with attention mechanism, a technique that revolutionized sequential learning tasks with its adaptability and robustness, to tackle a… ▽ More

    Submitted 14 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  21. arXiv:2310.20446  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

    Authors: Yuxin Ye, Wenming Yang, Yapeng Tian

    Abstract: Existing machine learning research has achieved promising results in monaural audio-visual separation (MAVS). However, most MAVS methods purely consider what the sound source is, not where it is located. This can be a problem in VR/AR scenarios, where listeners need to be able to distinguish between similar audio sources located in different directions. To address this limitation, we have generali… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted by WACV2024

  22. arXiv:2310.11713  [pdf, other

    cs.CV cs.SD eess.AS

    Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

    Authors: Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

    Abstract: The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view. Current methods struggle with such sounds lacking visible cues. This paper introduces a novel "Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a semantic parser for visible and invisible sounds and a separator for scene-informed separation.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023 - AV4D, 4 figures, 3 tables

  23. arXiv:2310.09221  [pdf, other

    eess.IV cs.CV

    Ultrasound Image Segmentation of Thyroid Nodule via Latent Semantic Feature Co-Registration

    Authors: Xuewei Li, Yaqiao Zhu, Jie Gao, Xi Wei, Ruixuan Zhang, Yuan Tian, ZhiQiang Liu

    Abstract: Segmentation of nodules in thyroid ultrasound imaging plays a crucial role in the detection and treatment of thyroid cancer. However, owing to the diversity of scanner vendors and imaging protocols in different hospitals, the automatic segmentation model, which has already demonstrated expert-level accuracy in the field of medical image segmentation, finds its accuracy reduced as the result of its… ▽ More

    Submitted 21 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  24. arXiv:2309.15977  [pdf, other

    cs.SD cs.CV eess.AS

    Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactor… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  25. arXiv:2309.15512  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

    Authors: Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

    Abstract: Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from inform… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2307.15484; text overlap with arXiv:2309.00424

  26. arXiv:2309.11811  [pdf, other

    eess.SP cs.AI

    Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction

    Authors: Yu Tian, Qiyang Zhao, Zine el abidine Kherroubi, Fouzi Boukhalfa, Kebin Wu, Faouzi Bader

    Abstract: Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  27. arXiv:2309.00424  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Learning Speech Representation From Contrastive Token-Acoustic Pretraining

    Authors: Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

    Abstract: For fine-grained generation and recognition tasks such as minimally-supervised text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), the intermediate representations extracted from speech should serve as a "bridge" between text and acoustic information, containing information from both modalities. The semantic content is emphasized, while the paralinguistic informati… ▽ More

    Submitted 18 December, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  28. arXiv:2308.00122  [pdf, other

    cs.CV cs.SD eess.AS

    DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

    Authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  29. arXiv:2307.02334  [pdf, ps, other

    eess.IV cs.CV

    Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI

    Authors: Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian

    Abstract: Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research. Benefiting from the diverse and complementary information of multi-contrast MR images in different imaging modalities, multi-contrast Super-Resolution (SR) reconstruction is promising to yield SR images with higher quality. In the medical scen… ▽ More

    Submitted 10 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted by MICCAI2023

  30. arXiv:2306.17008  [pdf

    eess.IV cs.CV

    MLA-BIN: Model-level Attention and Batch-instance Style Normalization for Domain Generalization of Federated Learning on Medical Image Segmentation

    Authors: Fubao Zhu, Yanhui Tian, Chuang Han, Yanting Li, Jiaofen Nan, Ni Yao, Weihua Zhou

    Abstract: The privacy protection mechanism of federated learning (FL) offers an effective solution for cross-center medical collaboration and data sharing. In multi-site medical image segmentation, each medical site serves as a client of FL, and its data naturally forms a domain. FL supplies the possibility to improve the performance of seen domains model. However, there is a problem of domain generalizatio… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 8 figures, 2 tables

  31. arXiv:2306.05627  [pdf

    math.OC eess.SP

    A Macro-Micro Approach to Reconstructing Vehicle Trajectories on Multi-Lane Freeways with Lane Changing

    Authors: Xuejian Chen, Guoyang Qin, Toru Seo, Ye Tian, Jian Sun

    Abstract: Vehicle trajectories can offer the most precise and detailed depiction of traffic flow and serve as a critical component in traffic management and control applications. Various technologies have been applied to reconstruct vehicle trajectories from sparse fixed and mobile detection data. However, existing methods predominantly concentrate on single-lane scenarios and neglect lane-changing (LC) beh… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  32. arXiv:2305.19228  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Unsupervised Melody-to-Lyric Generation

    Authors: Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, **g Huang, Nanyun Peng

    Abstract: Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationshi… ▽ More

    Submitted 22 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023. arXiv admin note: substantial text overlap with arXiv:2305.07760

  33. Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

    Authors: Yusheng Tian, Guangyan Zhang, Tan Lee

    Abstract: This paper is about develo** personalized speech synthesis systems with recordings of mildly impaired speech. In particular, we consider consonant and vowel alterations resulted from partial glossectomy, the surgical removal of part of the tongue. The aim is to restore articulation in the synthesized speech and maximally preserve the target speaker's individuality. We propose to tackle the probl… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: submitted to INTERSPEECH 2023

    Journal ref: INTERSPEECH 2023

  34. arXiv:2305.17183  [pdf

    q-bio.QM cs.AI eess.IV

    ProGroTrack: Deep Learning-Assisted Tracking of Intracellular Protein Growth Dynamics

    Authors: Kai San Chan, Huimiao Chen, Chenyu **, Yuxuan Tian, Dingchang Lin

    Abstract: Accurate tracking of cellular and subcellular structures, along with their dynamics, plays a pivotal role in understanding the underlying mechanisms of biological systems. This paper presents a novel approach, ProGroTrack, that combines the You Only Look Once (YOLO) and ByteTrack algorithms within the detection-based tracking (DBT) framework to track intracellular protein nanostructures. Focusing… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  35. arXiv:2305.11440  [pdf

    eess.SY

    Coordinated Frequency-Constrained Stochastic Economic Dispatch for Integrated Transmission and Distribution System via Distributed Optimization

    Authors: Ye Tian, Zhengshuo Li

    Abstract: When large-scale uncertain centralized and distributed renewable energy sources are connected to a power system, separate dispatching of the transmission power system (TPS) and the active distribution network (ADN) will lower the network security and frequency security of the system. To address these problems, this paper proposes a coordinated frequency-constrained stochastic economic dispatch (CF… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  36. arXiv:2305.10891  [pdf, other

    eess.AS

    Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data

    Authors: Yusheng Tian, Wei Liu, Tan Lee

    Abstract: Creating synthetic voices with found data is challenging, as real-world recordings often contain various types of audio degradation. One way to address this problem is to pre-enhance the speech with an enhancement model and then use the enhanced data for text-to-speech (TTS) model training. This paper investigates the use of conditional diffusion models for generalized speech enhancement, which ai… ▽ More

    Submitted 29 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to ASRU 2023

  37. arXiv:2305.04661  [pdf, ps, other

    eess.SP

    Unleashing 3D Connectivity in Beyond 5G Networks with Reconfigurable Intelligent Surfaces

    Authors: Jiguang He, Aymen Fakhreddine, Arthur S. de Sena, Yu Tian, Merouane Debbah

    Abstract: Reconfigurable intelligent surfaces (RISs) bring various benefits to the current and upcoming wireless networks, including enhanced spectrum and energy efficiency, soft handover, transmission reliability, and even localization accuracy. These remarkable improvements result from the reconfigurability, programmability, and adaptation capabilities of RISs for fine-tuning radio propagation environment… ▽ More

    Submitted 2 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: 5 pages, 4 figures, invited paper to Asilomar Conference on Signals, Systems, and Computers 2023 (accepted)

  38. arXiv:2305.01836  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

    Authors: Shentong Mo, Yapeng Tian

    Abstract: Segment Anything Model (SAM) has recently shown its powerful effectiveness in visual segmentation tasks. However, there is less exploration concerning how SAM works on audio-visual tasks, such as visual sound localization and segmentation. In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that ca… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  39. Picking Up Quantization Steps for Compressed Image Classification

    Authors: Li Ma, Peixi Peng, Guangyao Chen, Yifan Zhao, Siwei Dong, Yonghong Tian

    Abstract: The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed im… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1884-1898, April 2023

  40. arXiv:2303.13471  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Egocentric Audio-Visual Object Localization

    Authors: Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person view. Likewise, machines are advanced to approach human intelligence by learning with multisensory inputs from an egocentric perspective. In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even w… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  41. arXiv:2303.03939  [pdf

    eess.SY

    Joint Chance-Constrained Economic Dispatch Involving Joint Optimization of Frequency-related Inverter Control and Regulation Reserve Allocation

    Authors: Ye Tian, Zhengshuo Li, Wenchuan Wu, Miao Fan

    Abstract: The issues of uncertainty and frequency security could become significantly serious in power systems with the high penetration of volatile inverter-based renewables (IBRs). These issues make it necessary to consider the uncertainty and frequency-related constraints in the economic dispatch (ED) programs. However, existing ED studies rarely proactively optimize the control parameters of inverter-ba… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  42. arXiv:2302.02088  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with s… ▽ More

    Submitted 16 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  43. arXiv:2301.12340  [pdf

    eess.IV cs.CV

    Incremental Value and Interpretability of Radiomics Features of Both Lung and Epicardial Adipose Tissue for Detecting the Severity of COVID-19 Infection

    Authors: Ni Yao, Yanhui Tian, Daniel Gama das Neves, Chen Zhao, Claudio Tinoco Mesquita, Wolney de Andrade Martins, Alair Augusto Sarmet Moreira Damas dos Santos, Yanting Li, Chuang Han, Fubao Zhu, Neng Dai, Weihua Zhou

    Abstract: Epicardial adipose tissue (EAT) is known for its pro-inflammatory properties and association with Coronavirus Disease 2019 (COVID-19) severity. However, current EAT segmentation methods do not consider positional information. Additionally, the detection of COVID-19 severity lacks consideration for EAT radiomics features, which limits interpretability. This study investigates the use of radiomics f… ▽ More

    Submitted 6 December, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: 20 pages, 7 figures

  44. arXiv:2212.14511  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

    Authors: Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

    Abstract: We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particul… ▽ More

    Submitted 13 March, 2024; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 37 pages; Updated structure and proofs

  45. arXiv:2211.16928  [pdf, other

    eess.IV cs.CV

    Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

    Authors: Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool

    Abstract: Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to… ▽ More

    Submitted 16 February, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: ICLR2023, code is available at https://github.com/Zj-BinXia/KDSR

  46. arXiv:2210.17310  [pdf, other

    eess.AS cs.SD

    Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

    Authors: **gyu Li, Yusheng Tian, Tan Lee

    Abstract: Deep convolutional neural networks (CNNs) have been applied to extracting speaker embeddings with significant success in speaker verification. Incorporating the attention mechanism has shown to be effective in improving the model performance. This paper presents an efficient two-dimensional convolution-based attention module, namely C2D-Att. The interaction between the convolution channel and freq… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  47. arXiv:2210.06836  [pdf, other

    eess.SP

    SNN-SC: A Spiking Semantic Communication Framework for Feature Transmission

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan, Yonghong Tian

    Abstract: In Collaborative Intelligence (CI), Artificial Intelligence (AI) models are split between edge devices and cloud. Features extracted from input on edge devices are transmitted to the cloud for subsequent tasks. Extracting task-related and compact information is critical when transmission bandwidth is limited. In this paper, we propose a task-oriented Semantic Communication (SC) framework (SNN-SC)… ▽ More

    Submitted 17 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

  48. arXiv:2210.00405  [pdf, other

    cs.CV eess.IV

    Basic Binary Convolution Unit for Binarized Image Restoration Network

    Authors: Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool

    Abstract: Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use… ▽ More

    Submitted 16 February, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: ICLR2023, code is available at https://github.com/Zj-BinXia/BBCU

  49. arXiv:2209.13645  [pdf, other

    eess.SP cs.LG

    PearNet: A Pearson Correlation-based Graph Attention Network for Sleep Stage Recognition

    Authors: Jianchao Lu, Yuzhe Tian, Shuang Wang, Michael Sheng, Xi Zheng

    Abstract: Sleep stage recognition is crucial for assessing sleep and diagnosing chronic diseases. Deep learning models, such as Convolutional Neural Networks and Recurrent Neural Networks, are trained using grid data as input, making them not capable of learning relationships in non-Euclidean spaces. Graph-based deep models have been developed to address this issue when investigating the external relationsh… ▽ More

    Submitted 16 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

  50. arXiv:2208.03648  [pdf, other

    cs.CV cs.AI eess.IV

    Weakly Supervised Online Action Detection for Infant General Movements

    Authors: Tongyi Luo, Jia Xiao, Chuncao Zhang, Siheng Chen, Yuan Tian, Guangjun Yu, Kang Dang, Xiaowei Ding

    Abstract: To make the earlier medical intervention of infants' cerebral palsy (CP), early diagnosis of brain damage is critical. Although general movements assessment(GMA) has shown promising results in early CP detection, it is laborious. Most existing works take videos as input to make fidgety movements(FMs) classification for the GMA automation. Those methods require a complete observation of videos and… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: MICCAI 2022

    MSC Class: 68T06 ACM Class: I.2; I.4; J.3