Skip to main content

Showing 1–50 of 325 results for author: Shi, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15675  [pdf, other

    eess.SY cs.AI cs.SC

    Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

    Authors: Jie Feng, Haohan Zou, Yuanyuan Shi

    Abstract: We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regressi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Workshop paper, accepted by Workshop on Foundations of Reinforcement Learning and Control at the 41st International Conference on Machine Learning, Vienna, Austria

  2. arXiv:2406.13209  [pdf, other

    eess.IV cs.CV physics.med-ph

    Diffusion Model-based FOD Restoration from High Distortion in dMRI

    Authors: Shuo Huang, Lujia Zhong, Yonggang Shi

    Abstract: Fiber orientation distributions (FODs) is a popular model to represent the diffusion MRI (dMRI) data. However, imaging artifacts such as susceptibility-induced distortion in dMRI can cause signal loss and lead to the corrupted reconstruction of FODs, which prohibits successful fiber tracking and connectivity analysis in affected brain regions such as the brain stem. Generative models, such as the… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures

  3. arXiv:2406.10434  [pdf, other

    eess.SY

    Risk-Aware Value-Oriented Net Demand Forecasting for Virtual Power Plants

    Authors: Yufan Zhang, Jiajun Han, Yuanyuan Shi

    Abstract: This paper develops a risk-aware net demand forecasting product for virtual power plants, which helps reduce the risk of high operation costs. At the training phase, a bilevel program for parameter estimation is formulated, where the upper level optimizes over the forecast model parameter to minimize the conditional value-at-risk (a risk metric) of operation costs. The lower level solves the opera… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to The 56th North American Power Symposium (NAPS 2024)

  4. arXiv:2406.09569  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

    Authors: Frank Seide, Morrie Doulaty, Yangyang Shi, Yashesh Gaur, Junteng Jia, Chunyang Wu

    Abstract: We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio without explicit end-pointing. Speech ReaLLM is a special case of the more general ReaLLM ("real-time LLM") approach, also introduced here for the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.20489  [pdf, other

    eess.SY

    Stability-Constrained Learning for Frequency Regulation in Power Grids with Variable Inertia

    Authors: Jie Feng, Manasa Muralidharan, Rodrigo Henriquez-Auba, Patricia Hidalgo-Gonzalez, Yuanyuan Shi

    Abstract: The increasing penetration of converter-based renewable generation has resulted in faster frequency dynamics, and low and variable inertia. As a result, there is a need for frequency control methods that are able to stabilize a disturbance in the power system at timescales comparable to the fast converter dynamics. This paper proposes a combined linear and neural network controller for inverter-ba… ▽ More

    Submitted 11 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper is to appear in IEEE Control System Letters (L-CSS)

  6. arXiv:2405.16258  [pdf, other

    cs.LG cs.AI eess.SY

    USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

    Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Zelin Zang

    Abstract: Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both nor… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures, under review

  7. arXiv:2405.13199  [pdf, ps, other

    eess.IV cs.CV

    TauAD: MRI-free Tau Anomaly Detection in PET Imaging via Conditioned Diffusion Models

    Authors: Lujia Zhong, Shuo Huang, Jiaxin Yue, Jianwei Zhang, Zhiwei Deng, Wenhao Chi, Yonggang Shi

    Abstract: The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2405.11401  [pdf, other

    eess.SY cs.AI cs.CE cs.LG math.OC

    PDE Control Gym: A Benchmark for Data-Driven Boundary Control of Partial Differential Equations

    Authors: Luke Bhan, Yuexin Bian, Miroslav Krstic, Yuanyuan Shi

    Abstract: Over the last decade, data-driven methods have surged in popularity, emerging as valuable tools for control theory. As such, neural network approximations of control feedback laws, system dynamics, and even Lyapunov functions have attracted growing attention. With the ascent of learning based control, the need for accurate, fast, and easy-to-use benchmarks has increased. In this work, we present t… ▽ More

    Submitted 23 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 26 pages 10 figures. Accepted L4DC 2024

  9. arXiv:2405.09004  [pdf, other

    eess.SY cs.LG

    Improving Sequential Market Clearing via Value-oriented Renewable Energy Forecasting

    Authors: Yufan Zhang, Honglin Wen, Yuexin Bian, Yuanyuan Shi

    Abstract: Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  10. arXiv:2405.04285  [pdf, other

    cs.AI eess.SP

    On the Foundations of Earth and Climate Foundation Models

    Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

    Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  11. arXiv:2404.15585  [pdf, other

    cs.LG eess.IV

    Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification

    Authors: Liang Qu, Cunze Wang, Yuhui Shi

    Abstract: The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine le… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  12. arXiv:2404.06007  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Collaborative Edge AI Inference over Cloud-RAN

    Authors: Pengfei Zhang, Dingzhu Wen, Guangxu Zhu, Qimei Chen, Kaifeng Han, Yuanming Shi

    Abstract: In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregatio… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by IEEE Transactions on Communications on 08-Apr-2024

  13. arXiv:2404.01875  [pdf, other

    eess.SP cs.DC cs.IT cs.LG

    Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

    Authors: Yuanming Shi, Li Zeng, **gyang Zhu, Yong Zhou, Chunxiao Jiang, Khaled B. Letaief

    Abstract: The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 15 figures

  14. arXiv:2403.16402  [pdf, other

    eess.SY

    A Distributionally Robust Model Predictive Control for Static and Dynamic Uncertainties in Smart Grids

    Authors: Qi Li, Ye Shi, Yuning Jiang, Yuanming Shi, Haoyu Wang, H. Vincent Poor

    Abstract: The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current s… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  15. IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

    Authors: Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

    Abstract: Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TGRS, we first present an iterative diffusion process for cloud removal, the code is available at: https://github.com/SongYxing/IDF-CR

  16. arXiv:2403.08996  [pdf, other

    eess.SY

    Ventilation and Temperature Control for Energy-efficient and Healthy Buildings: A Differentiable PDE Approach

    Authors: Yuexin Bian, Xiaohan Fu, Rajesh K. Gupta, Yuanyuan Shi

    Abstract: In this paper, we introduce a novel framework for building learning and control, focusing on ventilation and thermal management to enhance energy efficiency. We validate the performance of the proposed framework in system model learning via two case studies: a synthetic study focusing on the joint learning of temperature and CO2 fields, and an application to a real-world dataset for CO2 field lear… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  17. arXiv:2403.04531  [pdf, ps, other

    eess.IV

    Anatomy-Guided Surface Diffusion Model for Alzheimer's Disease Normative Modeling

    Authors: Jianwei Zhang, Yonggang Shi

    Abstract: Normative modeling has emerged as a pivotal approach for characterizing heterogeneity and individual variance in neurodegenerative diseases, notably Alzheimer's disease(AD). One of the challenges of cortical normative modeling is the anatomical structure mismatch due to folding pattern variability. Traditionally, registration is applied to address this issue and recently many studies have utilized… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  18. arXiv:2402.18070  [pdf, other

    cs.AR eess.SP

    A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

    Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

    Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 7 figures, conference

  19. arXiv:2402.13076  [pdf, other

    cs.SD cs.LG eess.AS

    Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition

    Authors: Yang Li, Yuan Shangguan, Yuhao Wang, Liangzhen Lai, Ernie Chang, Changsheng Zhao, Yangyang Shi, Vikas Chandra

    Abstract: Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are inv… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  20. arXiv:2401.15824  [pdf, other

    eess.SY

    Innovation-triggered Learning for Data-driven Predictive Control: Deterministic and Stochastic Formulations

    Authors: Kaikai Zheng, Dawei Shi, Sandra Hirche, Yang Shi

    Abstract: Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first-principle. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. {To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  21. arXiv:2401.07862  [pdf, other

    eess.SY cs.AI cs.LG math.DS math.OC

    Adaptive Neural-Operator Backstep** Control of a Benchmark Hyperbolic PDE

    Authors: Maxence Lamarque, Luke Bhan, Yuanyuan Shi, Miroslav Krstic

    Abstract: To stabilize PDEs, feedback controllers require gain kernel functions, which are themselves governed by PDEs. Furthermore, these gain-kernel PDEs depend on the PDE plants' functional coefficients. The functional coefficients in PDE plants are often unknown. This requires an adaptive approach to PDE control, i.e., an estimation of the plant coefficients conducted concurrently with control, where a… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 16.5 pages, 3 figures

  22. arXiv:2401.04283  [pdf, ps, other

    eess.AS cs.SD

    FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

    Authors: Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

    Abstract: Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stan… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  23. arXiv:2401.02516  [pdf, other

    eess.SY cs.AI math.AP math.DS math.OC

    Moving-Horizon Estimators for Hyperbolic and Parabolic PDEs in 1-D

    Authors: Luke Bhan, Yuanyuan Shi, Iasson Karafyllis, Miroslav Krstic, James B. Rawlings

    Abstract: Observers for PDEs are themselves PDEs. Therefore, producing real time estimates with such observers is computationally burdensome. For both finite-dimensional and ODE systems, moving-horizon estimators (MHE) are operators whose output is the state estimate, while their inputs are the initial state estimate at the beginning of the horizon as well as the measured output and input signals over the m… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 7 pages, 1 figure, submitted to ACC 2024

  24. arXiv:2312.17516  [pdf, other

    cs.NI eess.SP

    Robust TOA-based Localization with Inaccurate Anchors for MANET

    Authors: Xinkai Yu, Yang Zheng, Min Sheng, Yan Shi, Jiandong Li

    Abstract: Accurate node localization is vital for mobile ad hoc networks (MANETs). Current methods like Time of Arrival (TOA) can estimate node positions using imprecise baseplates and achieve the Cramér-Rao lower bound (CRLB) accuracy. In multi-hop MANETs, some nodes lack direct links to base anchors, depending on neighbor nodes as dynamic anchors for chain localization. However, the dynamic nature of MANE… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  25. arXiv:2312.09621  [pdf, other

    eess.SY

    Inter-domain Resource Collaboration in Satellite Networks: An Intelligent Scheduling Approach Towards Hybrid Missions

    Authors: Chenxi Bao, Di Zhou, Min Sheng, Yan Shi, Jiandong Li

    Abstract: Since the next-generation satellite network consisting of various service function domains, such as communication, observation, navigation, etc., is moving towards large-scale, using single-domain resources is difficult to provide satisfied and timely service guarantees for the rapidly increasing mission demands of each domain. Breaking the barriers of independence of resources in each domain, and… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  26. arXiv:2312.05930  [pdf, other

    eess.IV cs.CV cs.LG

    A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

    Authors: Linxi Zhao, Jiankai Tang, Dongyu Chen, Xiaohong Liu, Yong Zhou, Yuanchun Shi, Guangyu Wang, Yuntao Wang

    Abstract: Nailfold capillaroscopy is widely used in assessing health conditions, highlighting the pressing need for an automated nailfold capillary analysis system. In this study, we present a pioneering effort in constructing a comprehensive nailfold capillary dataset-321 images, 219 videos from 68 subjects, with clinic reports and expert annotations-that serves as a crucial resource for training deep-lear… ▽ More

    Submitted 14 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Dataset, code, pretrained models: https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary

  27. Implementing Digital Twin in Field-Deployed Optical Networks: Uncertain Factors, Operational Guidance, and Field-Trial Demonstration

    Authors: Yuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Bingli Guo, Shanguo Huang, Danshi Wang

    Abstract: Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environmen… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures Accepted by IEEE Network Magazine, early access

  28. arXiv:2311.09590  [pdf, other

    eess.IV cs.CV

    MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images

    Authors: Yuxuan Shi, Jun Xu, Dinggang Shen

    Abstract: Cone Beam Computed Tomography (CBCT) plays a key role in dental diagnosis and surgery. However, the metal teeth implants could bring annoying metal artifacts during the CBCT imaging process, interfering diagnosis and downstream processing such as tooth segmentation. In this paper, we develop an efficient Transformer to perform metal artifacts reduction (MAR) from dental CBCT images. The proposed M… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: under consideration of Computer Vision and Image Understanding journal

  29. arXiv:2311.05935  [pdf, other

    eess.SY

    Resilient and constrained consensus against adversarial attacks: A distributed MPC framework

    Authors: Henglai Wei, Kunwu Zhang, Hui Zhang, Yang Shi

    Abstract: There has been a growing interest in realizing the resilient consensus of the multi-agent system (MAS) under cyber-attacks, which aims to achieve the consensus of normal agents (i.e., agents without attacks) in a network, depending on the neighboring information. The literature has developed mean-subsequence-reduced (MSR) algorithms for the MAS with F adversarial attacks and has shown that the con… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  30. 3DGAUnet: 3D generative adversarial networks with a 3D U-Net based generator to achieve the accurate and effective synthesis of clinical tumor image data for pancreatic cancer

    Authors: Yu Shi, Hannah Tang, Michael Baine, Michael A. Hollingsworth, Hui**g Du, Dandan Zheng, Chi Zhang, Hongfeng Yu

    Abstract: Pancreatic ductal adenocarcinoma (PDAC) presents a critical global health challenge, and early detection is crucial for improving the 5-year survival rate. Recent medical imaging and computational algorithm advances offer potential solutions for early diagnosis. Deep learning, particularly in the form of convolutional neural networks (CNNs), has demonstrated success in medical image analysis tasks… ▽ More

    Submitted 27 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Published on Cancers: Shi, Yu, Hannah Tang, Michael J. Baine, Michael A. Hollingsworth, Hui**g Du, Dandan Zheng, Chi Zhang, and Hongfeng Yu. 2023. "3DGAUnet: 3D Generative Adversarial Networks with a 3D U-Net Based Generator to Achieve the Accurate and Effective Synthesis of Clinical Tumor Image Data for Pancreatic Cancer" Cancers 15, no. 23: 5496

  31. arXiv:2311.03712  [pdf, other

    eess.SY

    Contributions of Individual Generators to Nodal Carbon Emissions

    Authors: Yize Chen, Deepjyoti Deka, Yuanyuan Shi

    Abstract: Recent shifts toward sustainable energy systems have witnessed the fast deployment of carbon-free and carbon-efficient generations across the power networks. However, the benefits of carbon reduction are not experienced evenly throughout the grid. Each generator can have distinct carbon emission rates. Due to the existence of physical power flows, nodal power consumption is met by a combination of… ▽ More

    Submitted 7 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ACM e-Energy 2024. Code available at https://github.com/chennnnnyize/Carbon_Emission_Power_Grids

  32. arXiv:2311.03062  [pdf

    physics.optics cs.LG eess.SP

    Imaging through multimode fibres with physical prior

    Authors: Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen

    Abstract: Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mappi… ▽ More

    Submitted 13 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  33. arXiv:2311.00897  [pdf, other

    cs.SD cs.CL eess.AS

    On The Open Prompt Challenge In Conditional Audio Generation

    Authors: Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra

    Abstract: Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two ke… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 4 tables

  34. arXiv:2311.00895  [pdf, other

    cs.SD cs.CL eess.AS

    In-Context Prompt Editing For Conditional Audio Generation

    Authors: Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 2 tables

  35. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, **chuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by develo** impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  36. arXiv:2310.11998  [pdf, other

    eess.SP

    One-Bit Byzantine-Tolerant Distributed Learning via Over-the-Air Computation

    Authors: Yuhan Yang, Youlong Wu, Yuning Jiang, Yuanming Shi

    Abstract: Distributed learning has become a promising computational parallelism paradigm that enables a wide scope of intelligent applications from the Internet of Things (IoT) to autonomous driving and the healthcare industry. This paper studies distributed learning in wireless data center networks, which contain a central edge server and multiple edge workers to collaboratively train a shared global model… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  37. arXiv:2310.10992  [pdf, other

    cs.SD eess.AS

    A High Fidelity and Low Complexity Neural Audio Coding

    Authors: Wenzhe Liu, Wei Xiao, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu

    Abstract: Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  38. arXiv:2310.10089  [pdf, other

    cs.LG cs.IT eess.SP

    Over-the-Air Federated Learning and Optimization

    Authors: **gyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief

    Abstract: Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to mode… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 31 pages, 11 figures

  39. arXiv:2310.07255  [pdf, other

    cs.CV eess.IV

    ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

    Authors: **ghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi

    Abstract: Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propos… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Geoscience and Remote Sensing Letters. Code is released at https://github.com/fangfang11-plog/ADASR

  40. arXiv:2310.06949  [pdf, other

    eess.IV cs.LG physics.med-ph

    Diffusion Prior Regularized Iterative Reconstruction for Low-dose CT

    Authors: Wenjun Xia, Yongyi Shi, Chuang Niu, Wenxiang Cong, Ge Wang

    Abstract: Computed tomography (CT) involves a patient's exposure to ionizing radiation. To reduce the radiation dose, we can either lower the X-ray photon count or down-sample projection views. However, either of the ways often compromises image quality. To address this challenge, here we introduce an iterative reconstruction algorithm regularized by a diffusion prior. Drawing on the exceptional imaging pro… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  41. arXiv:2310.05352  [pdf, other

    cs.CL cs.SD eess.AS

    A Glance is Enough: Extract Target Sentence By Looking at A keyword

    Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han

    Abstract: This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input. For example, in social security applications, the keyword might be "help", and the goal is to identify what the person who called for help is articulating while ignoring other speakers. To address this problem, we propose using the Transformer architecture to embed both t… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: submitted to ICASSP 2024

  42. arXiv:2310.04679  [pdf, other

    eess.IV cs.CV

    High Visual-Fidelity Learned Video Compression

    Authors: Meng Li, Yibo Shi, **g Wang, Yunqi Huang

    Abstract: With the growing demand for video applications, many advanced learned video compression methods have been developed, outperforming traditional methods in terms of objective quality metrics such as PSNR. Existing methods primarily focus on objective quality but tend to overlook perceptual quality. Directly incorporating perceptual loss into a learned video compression framework is nontrivial and ra… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: ACMMM 2023

  43. arXiv:2310.03118  [pdf

    eess.IV cs.CV

    Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator

    Authors: Yongyi Shi, Wenjun Xia, Ge Wang, Xuanqin Mou

    Abstract: Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intrigu… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 10 pages, 6 figures

  44. arXiv:2310.00571  [pdf, other

    eess.SY

    Deriving Loss Function for Value-oriented Renewable Energy Forecasting

    Authors: Yufan Zhang, Honglin Wen, Yuexin Bian, Yuanyuan Shi

    Abstract: Renewable energy forecasting is the workhorse for efficient energy dispatch. However, forecasts with small mean squared errors (MSE) may not necessarily lead to low operation costs. Here, we propose a forecasting approach specifically tailored for operational purposes, by incorporating operational problems into the estimation of forecast models via designing a loss function. We formulate a bilevel… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: submitted to PSCC 2024

  45. arXiv:2309.16468  [pdf, other

    eess.SP

    HyperLISTA-ABT: An Ultra-light Unfolded Network for Accurate Multi-component Differential Tomographic SAR Inversion

    Authors: Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

    Abstract: Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  46. arXiv:2309.14094  [pdf, other

    cs.SD eess.AS

    VoiceLens: Controllable Speaker Generation and Editing with Flow

    Authors: Yao Shi, Ming Li

    Abstract: Currently, many multi-speaker speech synthesis and voice conversion systems address speaker variations with an embedding vector. Modeling it directly allows new voices outside of training data to be synthesized. GMM based approaches such as Tacospawn are favored in literature for this generation task, but there are still some limitations when difficult conditionings are involved. In this paper, we… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  47. arXiv:2309.14089  [pdf, other

    eess.AS cs.LG cs.SD

    BiSinger: Bilingual Singing Voice Synthesis

    Authors: Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

    Abstract: Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To a… ▽ More

    Submitted 9 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  48. arXiv:2309.11109  [pdf, other

    cs.CV eess.IV

    Self-supervised Domain-agnostic Domain Adaptation for Satellite Images

    Authors: Fahong Zhang, Yilei Shi, Xiao Xiang Zhu

    Abstract: Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve t… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  49. arXiv:2309.10795  [pdf, other

    eess.AS

    Exploring Speech Enhancement for Low-resource Speech Synthesis

    Authors: Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

    Abstract: High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive. Applying speech enhancement on Automatic Speech Recognition (ASR) corpus mitigates the issue by augmenting the training data, while how the nonlinear speech distortion brought by speech enhancement models affects TTS… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  50. arXiv:2309.10537  [pdf, other

    eess.AS cs.MM cs.SD

    FoleyGen: Visually-Guided Audio Generation

    Authors: Xinhao Mei, Varun Nagaraja, Gael Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

    Abstract: Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.