Skip to main content

Showing 1–50 of 240 results for author: Zhang, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18547  [pdf

    eess.IV cs.CV

    Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

    Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

    Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  2. arXiv:2406.18327  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

    Authors: Yuxuan Qi, Li Lin, Jiajun Wang, **gya Zhang, Bin Zhang

    Abstract: Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by network… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  5. arXiv:2406.00974  [pdf, other

    eess.SY

    Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach

    Authors: Borui Zhang, Chaojie Li, Guo Chen, Zhaoyang Dong

    Abstract: To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.18782  [pdf, other

    eess.IV cs.CV stat.ML

    Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

    Authors: Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman

    Abstract: Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior de… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.10977  [pdf, other

    eess.SY physics.app-ph

    Frequency stabilization of self-sustained oscillations in a sideband-driven electromechanical resonator

    Authors: B. Zhang, Yingming Yan, X. Dong, M. I. Dykman, H. B. Chan

    Abstract: We present a method to stabilize the frequency of self-sustained vibrations in micro- and nanomechanical resonators. The method refers to a two-mode system with the vibrations at significantly different frequencies. The signal from one mode is used to control the other mode. In the experiment, self-sustained oscillations of micromechanical modes are excited by pum** at the blue-detuned sideband… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  8. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  9. arXiv:2404.16407  [pdf, other

    cs.CL eess.AS

    U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fu** Pan, Chao Yang

    Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  10. arXiv:2404.08805  [pdf, other

    eess.IV cs.CV cs.LG

    Real-time guidewire tracking and segmentation in intraoperative x-ray

    Authors: Baochang Zhang, Mai Bui, Cheng Wang, Felix Bourier, Heribert Schunkert, Nassir Navab

    Abstract: During endovascular interventions, physicians have to perform accurate and immediate operations based on the available real-time information, such as the shape and position of guidewires observed on the fluoroscopic images, haptic information and the patients' physiological signals. For this purpose, real-time and accurate guidewire segmentation and tracking can enhance the visualization of guidew… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  11. arXiv:2404.07551  [pdf, other

    eess.IV cs.CV

    Event-Enhanced Snapshot Compressive Videography at 10K FPS

    Authors: Bo Zhang, **li Suo, Qionghai Dai

    Abstract: Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and onl… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  12. arXiv:2404.07188  [pdf, other

    cs.DC cs.CV eess.IV

    GCV-Turbo: End-to-end Acceleration of GNN-based Computer Vision Tasks on FPGA

    Authors: Bingyi Zhang, Rajgopal Kannan, Carl Busart, Viktor Prasanna

    Abstract: Graph neural networks (GNNs) have recently empowered various novel computer vision (CV) tasks. In GNN-based CV tasks, a combination of CNN layers and GNN layers or only GNN layers are employed. This paper introduces GCV-Turbo, a domain-specific accelerator on FPGA for end-to-end acceleration of GNN-based CV tasks. GCV-Turbo consists of two key components: (1) a \emph{novel} hardware architecture o… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  13. arXiv:2403.17598  [pdf

    eess.SY

    Ultrafast Adaptive Primary Frequency Tuning and Secondary Frequency Identification for S/S WPT system

    Authors: Chang Liu, Wei Han, Guangyu Yan, Bowang Zhang, Chunlin Li

    Abstract: Magnetic resonance wireless power transfer (WPT) technology is increasingly being adopted across diverse applications. However, its effectiveness can be significantly compromised by parameter shifts within the resonance network, owing to its high system quality factor. Such shifts are inherent and challenging to mitigate during the manufacturing process. In response, this article introduces a rapi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 11 pages,16 figures,to be published in IEEE Transactions on Industrial Electronics

  14. arXiv:2403.15616  [pdf, other

    cs.GT cs.MA eess.SY

    Balancing Fairness and Efficiency in Energy Resource Allocations

    Authors: Jiayi Li, Matthew Motoki, Baosen Zhang

    Abstract: Bringing fairness to energy resource allocation remains a challenge, due to the complexity of system structures and economic interdependencies among users and system operators' decision-making. The rise of distributed energy resources has introduced more diverse heterogeneous user groups, surpassing the capabilities of traditional efficiency-oriented allocation schemes. Without explicitly bringing… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  15. arXiv:2403.14508  [pdf, other

    cs.LG cs.AI eess.SY

    Constrained Reinforcement Learning with Smoothed Log Barrier Function

    Authors: Baohe Zhang, Yuan Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker

    Abstract: Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  16. arXiv:2403.05989  [pdf, other

    cs.SD eess.AS

    HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling

    Authors: Chunhui Wang, Chang Zeng, Bowen Zhang, Ziyang Ma, Yefan Zhu, Zifeng Cai, Jian Zhao, Zhonglin Jiang, Yong Chen

    Abstract: Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train i… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  17. arXiv:2402.16765  [pdf, other

    eess.SY

    Oscillations-Aware Frequency Security Assessment via Efficient Worst-Case Frequency Nadir Computation

    Authors: Yan Jiang, Hancheng Min, Baosen Zhang

    Abstract: Frequency security assessment following major disturbances has long been one of the central tasks in power system operations. The standard approach is to study the center of inertia frequency, an aggregate signal for an entire system, to avoid analyzing the frequency signal at individual buses. However, as the amount of low-inertia renewable resources in a grid increases, the center of inertia fre… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  18. arXiv:2402.15335  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection

    Authors: Chenyu Li, Bing Zhang, Danfeng Hong, **g Yao, Jocelyn Chanussot

    Abstract: Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness on the separation of background and target features and the reliance on manual parameter selection. To this end, we build a new set… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  19. arXiv:2402.01194  [pdf, other

    eess.SP

    A Robust Super-resolution Gridless Imaging Framework for UAV-borne SAR Tomography

    Authors: Silin Gao, Wenlong Wang, Muhan Wang, Zhe Zhang, Zai Yang, Xiaolan Qiu, Bingchen Zhang, Yirong Wu

    Abstract: Synthetic aperture radar (SAR) tomography (TomoSAR) retrieves three-dimensional (3-D) information from multiple SAR images, effectively addresses the layover problem, and has become pivotal in urban map**. Unmanned aerial vehicle (UAV) has gained popularity as a TomoSAR platform, offering distinct advantages such as the ability to achieve 3-D imaging in a single flight, cost-effectiveness, rapid… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  20. arXiv:2401.10242  [pdf, other

    cs.OH cs.GR cs.HC cs.SD eess.AS

    DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis

    Authors: Xin Gao, Li Hu, Peng Zhang, Bang Zhang, Liefeng Bo

    Abstract: In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance… ▽ More

    Submitted 30 November, 2023; originally announced January 2024.

    Comments: 10 pages, 8 figures

  21. arXiv:2401.08049  [pdf, other

    cs.CV cs.SD eess.AS

    EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

    Authors: Bingyuan Zhang, Xulong Zhang, Ning Cheng, Jun Yu, **g Xiao, Jianzong Wang

    Abstract: In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions. However, existing methods face challenges related to limited generalization, particularly when dealing with challenging identities. Furthermore, methods for editing expressions are often confined to a singul… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  22. arXiv:2401.03473  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

    Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  23. arXiv:2401.02687  [pdf, other

    cs.CV cs.LG eess.IV

    PAHD: Perception-Action based Human Decision Making using Explainable Graph Neural Networks on SAR Images

    Authors: Sasindu Wijeratne, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart

    Abstract: Synthetic Aperture Radar (SAR) images are commonly utilized in military applications for automatic target recognition (ATR). Machine learning (ML) methods, such as Convolutional Neural Networks (CNN) and Graph Neural Networks (GNN), are frequently used to identify ground-based objects, including battle tanks, personnel carriers, and missile launchers. Determining the vehicle class, such as the BRD… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  24. arXiv:2312.15863  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

    Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiang** Yin

    Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

  25. arXiv:2312.13752  [pdf

    eess.IV cs.AI cs.CV

    Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

    Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

    Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More

    Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 19 pages

  26. arXiv:2312.11930  [pdf, other

    cs.RO eess.SY

    Collision-Free Navigation of Wheeled Mobile Robots: An Integrated Path Planning and Tube-Following Control Approach

    Authors: Xiaodong Shao, Bin Zhang, Jose Guadalupe Romero, Bowen Fan, Qinglei Hu, David Navarro-Alarcon

    Abstract: In this paper, an integrated path planning and tube-following control scheme is proposed for collision-free navigation of a wheeled mobile robot (WMR) in a compact convex workspace cluttered with sufficiently separated spherical obstacles. An analytical path planning algorithm is developed based on Bouligand's tangent cones and Nagumo's invariance theorem, which enables the WMR to navigate towards… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  27. arXiv:2311.13824  [pdf, other

    cs.RO eess.SY

    Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

    Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

    Abstract: As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant rel… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: The first three authors contributed equally to the work. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  28. arXiv:2311.11510  [pdf, other

    eess.SY

    Controlling Grid-Connected Inverters under Time-Varying Voltage Constraints

    Authors: Zixiao Ma, Baosen Zhang

    Abstract: Inverter-based resources (IBRs) are becoming increasingly prevalent in power systems. Due to the inherently low inertia of inverters, there is a heightened risk of disruptive voltage oscillations. A particular challenge in the operation of grid connected IBRs is the variations in the grid side voltage. The changes in the grid side voltage introduces nonlinear and time-varying constriants on the in… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  29. arXiv:2311.08336  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI

    Authors: Nick Bryan-Kinns, Bingyuan Zhang, Songyan Zhao, Berker Banar

    Abstract: Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of eXplainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This pape… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Preprint. Springer MIR journal submission under review

  30. arXiv:2311.00263  [pdf, other

    eess.SY

    The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks

    Authors: Shuai Feng, Maopeng Ran, Baoyong Zhang, Lihua Xie, Shengyuan Xu

    Abstract: In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  31. arXiv:2310.13254  [pdf, other

    cs.GT eess.SY

    Socially Optimal Energy Usage via Adaptive Pricing

    Authors: Jiayi Li, Matthew Motoki, Baosen Zhang

    Abstract: A central challenge in using price signals to coordinate the electricity consumption of a group of users is the operator's lack of knowledge of the users due to privacy concerns. In this paper, we develop a two-time-scale incentive mechanism that alternately updates between the users and a system operator. As long as the users can optimize their own consumption subject to a given price, the operat… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted to Power Systems Computation Conference (PSCC-2024)

  32. arXiv:2310.07608  [pdf, other

    cs.MA cs.RO eess.SY

    Leader-Follower Formation Control of Perturbed Nonholonomic Agents along Parametric Curves with Directed Communication

    Authors: Bin Zhang, Hui Zhi, Jose Guadalupe Romero, David Navarro-Alarcon

    Abstract: In this paper, we propose a novel formation controller for nonholonomic agents to form general parametric curves. First, we derive a unified parametric representation for both open and closed curves. Then, a leader-follower formation controller is designed to form the parametric curves. We consider directed communications and constant input disturbances rejection in the controller design. Rigorous… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 6 pages, 6 figures

  33. arXiv:2310.06246  [pdf, other

    eess.IV

    Compression Ratio Learning and Semantic Communications for Video Imaging

    Authors: Bowen Zhang, Zhi** Qin, Geoffrey Ye Li

    Abstract: Camera sensors have been widely used in intelligent robotic systems. Develo** camera sensors with high sensing efficiency has always been important to reduce the power, memory, and other related resources. Inspired by recent success on programmable sensors and deep optic methods, we design a novel video compressed sensing system with spatially-variant compression ratios, which achieves higher im… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  34. arXiv:2310.04657  [pdf, other

    eess.AS cs.SD

    Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

    Authors: Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

    Abstract: The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  35. arXiv:2310.02802  [pdf, other

    eess.AS

    VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

    Authors: Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

    Abstract: This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior encoder, a posterior encoder, a decoder, and a parallel bank of transposed convolutions (PBTC) module. We particularly leverage Whisper, a powerful pre-trained ASR… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  36. arXiv:2310.00549  [pdf, other

    math.OC eess.SY

    Convex Restriction of Feasible Sets for AC Radial Networks

    Authors: Ling Zhang, Daniel Tabas, Baosen Zhang

    Abstract: Many problems in power systems involve optimizing a certain objective function subject to power flow equations and engineering constraints. A long-standing challenge in solving them is the nonconvexity of their feasible sets. In this paper, we propose an analytical method to construct the convex restriction of the feasible set for AC power flows in radial networks. The construction relies on simpl… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  37. arXiv:2310.00473  [pdf, ps, other

    eess.SY

    Optimal Control of Grid-Interfacing Inverters With Current Magnitude Limits

    Authors: Trager Joswig-Jones, Baosen Zhang

    Abstract: Grid-interfacing inverters act as the interface between renewable resources and the electric grid, and have the potential to offer fast and programmable controls compared to synchronous generators. With this flexibility there has been significant research efforts into determining the best way to control these inverters. Inverters are limited in their maximum current output in order to protect semi… ▽ More

    Submitted 26 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: 6 pages, 6 figures, 1 table. Submitted to CDC'2024

  38. arXiv:2309.16499  [pdf, other

    cs.CV eess.IV

    Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

    Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, **g Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

    Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-ed… ▽ More

    Submitted 3 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  39. arXiv:2309.09969  [pdf, other

    cs.RO cs.LG eess.SY

    Prompt a Robot to Walk with Large Language Models

    Authors: Yen-Jen Wang, Bike Zhang, Jianyu Chen, Koushil Sreenath

    Abstract: Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and… ▽ More

    Submitted 16 November, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

  40. arXiv:2309.03111  [pdf, other

    cs.RO eess.SY math.OC

    Serving Time: Real-Time, Safe Motion Planning and Control for Manipulation of Unsecured Objects

    Authors: Zachary Brei, Jonathan Michaux, Bohao Zhang, Patrick Holmes, Ram Vasudevan

    Abstract: A key challenge to ensuring the rapid transition of robotic systems from the industrial sector to more ubiquitous applications is the development of algorithms that can guarantee safe operation while in close proximity to humans. Motion planning and control methods, for instance, must be able to certify safety while operating in real-time in arbitrary environments and in the presence of model unce… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 8 pages, 3 figures. For project page with code, videos, and supplementary appendices, see https://roahmlab.github.io/waitr-dev/. arXiv admin note: text overlap with arXiv:2301.13308

  41. LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

    Authors: Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fu** Pan, Zhiyong Wu

    Abstract: Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces h… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by ICASSP 2023

  42. arXiv:2308.14213  [pdf

    eess.IV cs.CV

    Post-Hoc Explainability of BI-RADS Descriptors in a Multi-task Framework for Breast Cancer Detection and Segmentation

    Authors: Mohammad Karimzadeh, Aleksandar Vakanski, Min Xian, Boyu Zhang

    Abstract: Despite recent medical advancements, breast cancer remains one of the most prevalent and deadly diseases among women. Although machine learning-based Computer-Aided Diagnosis (CAD) systems have shown potential to assist radiologists in analyzing medical images, the opaque nature of the best-performing CAD systems has raised concerns about their trustworthiness and interpretability. This paper prop… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures. Published at 2023 IEEE Workshop on MLSP

  43. arXiv:2308.10312  [pdf, other

    eess.SY cs.DC cs.LG cs.NI

    Demystifying the Performance of Data Transfers in High-Performance Research Networks

    Authors: Ehsan Saeedizade, Bing Zhang, Engin Arslan

    Abstract: High-speed research networks are built to meet the ever-increasing needs of data-intensive distributed workflows. However, data transfers in these networks often fail to attain the promised transfer rates for several reasons, including I/O and network interference, server misconfigurations, and network anomalies. Although understanding the root causes of performance issues is critical to mitigatin… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 11 pages, 7 figures, 6 tables

  44. arXiv:2308.07231  [pdf, other

    eess.SY

    Large-scale environment map** and immersive human-robot interaction for agricultural mobile robot teleoperation

    Authors: Tao Liu, Baohua Zhang, Qianqiu Tan

    Abstract: Remote operation is a crucial solution to problems encountered in agricultural machinery operations. However, traditional video streaming control methods fall short in overcoming the challenges of single perspective views and the inability to obtain 3D information. In light of these issues, our research proposes a large-scale digital map reconstruction and immersive human-machine remote control fr… ▽ More

    Submitted 1 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  45. arXiv:2307.12262  [pdf, other

    cs.SD cs.CL cs.HC eess.AS

    A meta learning scheme for fast accent domain expansion in Mandarin speech recognition

    Authors: Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu

    Abstract: Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  46. arXiv:2307.03246  [pdf, other

    eess.IV

    Semantic-Aware Image Compressed Sensing

    Authors: Bowen Zhang, Zhi** Qin, Geoffrey Ye Li

    Abstract: Deep learning based image compressed sensing (CS) has achieved great success. However, existing CS systems mainly adopt a fixed measurement matrix to images, ignoring the fact the optimal measurement numbers and bases are different for different images. To further improve the sensing efficiency, we propose a novel semantic-aware image CS system. In our system, the encoder first uses a fixed number… ▽ More

    Submitted 10 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Modified version

  47. arXiv:2305.17777  [pdf, other

    eess.SY

    Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees

    Authors: Wenqi Cui, Yan Jiang, Baosen Zhang, Yuanyuan Shi

    Abstract: We study the optimal control of multiple-input and multiple-output dynamical systems via the design of neural network-based controllers with stability and output tracking guarantees. While neural network-based nonlinear controllers have shown superior performance in various applications, their lack of provable guarantees has restricted their adoption in high-stake real-world applications. This pap… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.00261

  48. arXiv:2305.12044  [pdf, other

    eess.SY

    Leveraging Predictions in Power System Frequency Control: an Adaptive Approach

    Authors: Wenqi Cui, Guanya Shi, Yuanyuan Shi, Baosen Zhang

    Abstract: Ensuring the frequency stability of electric grids with increasing renewable resources is a key problem in power system operations. In recent years, a number of advanced controllers have been designed to optimize frequency control. These controllers, however, almost always assume that the net load in the system remains constant over a sufficiently long time. Given the intermittent and uncertain na… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  49. ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fu** Pan, Zhiyong Wu

    Abstract: In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: accepted by interspeech 2023

    ACM Class: I.2.7

    Journal ref: @inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}

  50. arXiv:2305.09002  [pdf, other

    eess.SY

    Equilibria of Fully Decentralized Learning in Networked Systems

    Authors: Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cortés

    Abstract: Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We fir… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.