Skip to main content

Showing 1–50 of 486 results for author: Zhang, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.17066  [pdf, other

    eess.SY cs.AI cs.LO cs.RO

    Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems

    Authors: Changjian Zhang, Parv Kapoor, Eunsuk Kang, Romulo Meira-Goes, David Garlan, Akila Ganlath, Shatadal Mishra, Nejib Ammar

    Abstract: Cyber-physical systems (CPS) with reinforcement learning (RL)-based controllers are increasingly being deployed in complex physical environments such as autonomous vehicles, the Internet-of-Things(IoT), and smart cities. An important property of a CPS is tolerance; i.e., its ability to function safely under possible disturbances and uncertainties in the actual operation. In this paper, we introduc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.07462

  3. arXiv:2406.16326  [pdf, other

    eess.AS

    RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

    Authors: Mingyang Zhang, Yi Zhou, Yi Ren, Chen Zhang, Xiang Yin, Haizhou Li

    Abstract: This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker identity, which does not account for the changing timbre of speech that occurs with different pronunciations. To address this, our method uses both global and loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Manuscript under review by TASLP

  4. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  5. arXiv:2406.15787  [pdf, other

    eess.SY

    On Physics-Informed Neural Network Control for Power Electronics

    Authors: Peifeng Hui, Chenggang Cui, Pengfeng Lin, Amer M. Y. M. Ghias, Xitong Niu, Chuanlin Zhang

    Abstract: Considering the growing necessity for precise modeling of power electronics amidst operational and environmental uncertainties, this paper introduces an innovative methodology that ingeniously combines model-driven and data-driven approaches to enhance the stability of power electronics interacting with grid-forming microgrids. By employing the physics-informed neural network (PINN) as a foundatio… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2406.14795  [pdf, other

    cs.RO eess.SY

    Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device

    Authors: Fulan Li, Yunfei Guo, Wenda Xu, Weide Zhang, Fangyun Zhao, Baiyu Wang, Huaguang Du, Chengkun Zhang

    Abstract: This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 15 figures

  7. arXiv:2406.12699  [pdf, other

    cs.SD eess.AS eess.SP

    Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition

    Authors: Kuan-Chen Wang, You-** Li, Wei-Lun Chen, Yu-Wen Chen, Yi-Ching Wang, **-Cheng Yeh, Chao Zhang, Yu Tsao

    Abstract: Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study intro… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12628  [pdf, other

    eess.SY

    Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics

    Authors: Chenggang Cui, Jiaming Liu, Junkang Feng, Peifeng Hui, Amer M. Y. M. Ghias, Chuanlin Zhang

    Abstract: Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 6 pages, 6 figures

  9. arXiv:2406.10869  [pdf, other

    eess.IV cs.CV

    Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

    Authors: Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guo** Qiu

    Abstract: As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI sup… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages, 12 figures, journal

  10. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  11. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Kai Yu, Aidi Lin, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Wei Chen, Yilong Luo, Yifan Chen, **gcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Yanda Meng, Yukun Zhou , et al. (11 additional authors not shown)

    Abstract: The current retinal artificial intelligence models were trained using data with a limited category of diseases and limited knowledge. In this paper, we present a retinal vision-language foundation model (RetiZero) with knowledge of over 400 fundus diseases. Specifically, we collected 341,896 fundus images paired with text descriptions from 29 publicly available datasets, 180 ophthalmic books, and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  12. arXiv:2406.08858  [pdf, other

    cs.RO cs.CV cs.LG eess.SY

    OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

    Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

    Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://omni.human2humanoid.com/

  13. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin **

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  15. arXiv:2406.06005  [pdf, other

    cs.RO cs.GR eess.SY

    WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

    Authors: Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

    Abstract: Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still r… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Website and Videos: https://lecar-lab.github.io/wococo/

  16. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  17. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  18. arXiv:2406.04633  [pdf, ps, other

    eess.AS

    Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study

    Authors: Chong Zhang, Yanqing Liu, Yang Zheng, Sheng Zhao

    Abstract: Scaling text-to-speech (TTS) with autoregressive language model (LM) to large-scale datasets by quantizing waveform into discrete speech tokens is making great progress to capture the diversity and expressiveness in human speech, but the speech reconstruction quality from discrete speech token is far from satisfaction depending on the compressed speech token compression ratio. Generative diffusion… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  19. arXiv:2406.03882  [pdf, other

    cs.CL cs.SD eess.AS

    Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

    Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

    Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  20. arXiv:2406.02009  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

    Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  21. arXiv:2406.00654  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    Authors: Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

    Abstract: In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even st… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 19 pages, Preprint

  22. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  23. arXiv:2405.18969  [pdf, ps, other

    eess.SY

    Global and local observability of hypergraphs

    Authors: Chencheng Zhang, Hao Yang, Shaoxuan Cui, Bin Jiang, Ming Cao

    Abstract: This paper studies observability for non-uniform hypergraphs with inputs and outputs. To capture higher-order interactions, we define a canonical non-homogeneous dynamical system with nonlinear outputs on hypergraphs. We then construct algebraic necessary and sufficient conditions based on polynomial ideals and varieties for global observability at an initial state of hypergraphs. An example is gi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  24. arXiv:2405.18167  [pdf, other

    eess.IV cs.CV

    Confidence-aware multi-modality learning for eye disease screening

    Authors: Ke Zou, Tian Lin, Zongbo Han, Meng Wang, Xuedong Yuan, Haoyu Chen, Changqing Zhang, Xiao**g Shen, Huazhu Fu

    Abstract: Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures, 9 tables

  25. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  26. arXiv:2405.14161  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

    Abstract: We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifica… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, Preprint

  27. arXiv:2405.10825  [pdf, other

    eess.SY cs.LG

    Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

    Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili **, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

    Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  28. arXiv:2405.09079  [pdf, other

    eess.SP cs.IT

    Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave Systems

    Authors: Murat Bayraktar, Nuria González-Prelcic, Mikko Valkama, Hao Chen, Charlie Jianzhong Zhang

    Abstract: In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures

  29. arXiv:2405.07808  [pdf, ps, other

    eess.SP

    Goal-oriented compression for $L_p$-norm-type goal functions: Application to power consumption scheduling

    Authors: Yifei Sun, Hang Zou, Chao Zhang, Samson Lasaulce, Michel Kieffer

    Abstract: Conventional data compression schemes aim at implementing a trade-off between the rate required to represent the compressed data and the resulting distortion between the original and reconstructed data. However, in more and more applications, what is desired is not reconstruction accuracy but the quality of the realization of a certain task by the receiver. In this paper, the receiver task is mode… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  30. Teacher-Student Network for Real-World Face Super-Resolution with Progressive Embedding of Edge Information

    Authors: Zhilei Liu, Chenggong Zhang

    Abstract: Traditional face super-resolution (FSR) methods trained on synthetic datasets usually have poor generalization ability for real-world face images. Recent work has utilized complex degradation models or training networks to simulate the real degradation process, but this limits the performance of these methods due to the domain differences that still exist between the generated low-resolution image… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICIP 2023

  31. arXiv:2405.02784  [pdf, other

    eess.IV cs.CV

    MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging

    Authors: Chaojie Zhang, Shengjia Chen, Ozkan Cigdem, Haresh Rengaraj Rajamohan, Kyunghyun Cho, Richard Kijowski, Cem M. Deniz

    Abstract: A transformer-based deep learning model, MR-Transformer, was developed for total knee replacement (TKR) prediction using magnetic resonance imaging (MRI). The model incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation from the MR images. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury dia… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  32. arXiv:2405.00069  [pdf, other

    eess.IV

    Estimation of Time-to-Total Knee Replacement Surgery

    Authors: Ozkan Cigdem, Shengjia Chen, Chaojie Zhang, Kyunghyun Cho, Richard Kijowski, Cem M. Deniz

    Abstract: A survival analysis model for predicting time-to-total knee replacement (TKR) was developed using features from medical images and clinical measurements. Supervised and self-supervised deep learning approaches were utilized to extract features from radiographs and magnetic resonance images. Extracted features were combined with clinical and image assessments for survival analysis using random surv… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: 11 pages, 3 figures,4 tables, submitted to a conference

  33. arXiv:2404.17022  [pdf

    cs.SD eess.AS

    Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

    Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

    Abstract: Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  34. arXiv:2404.16137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Learned Pulse Sha** Design for PAPR Reduction in DFT-s-OFDM

    Authors: Fabrizio Carpi, Soheil Rostami, Joonyoung Cho, Siddharth Garg, Elza Erkip, Charlie Jianzhong Zhang

    Abstract: High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum sha** (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this wor… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 5 pages, under review

  35. arXiv:2404.14716  [pdf, other

    cs.CL cs.AI cs.CV cs.SD eess.AS

    Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

    Authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

    Abstract: Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayes… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures

  36. arXiv:2404.13918  [pdf

    eess.SP

    Emerging Advancements in 6G NTN Radio Access Technologies: An Overview

    Authors: Husnain Shahid, Carla Amatetti, Riccardo Campana, Sorya Tong, Dorin Panaitopol, Alessandro Vanelli Coralli, Abdelhamed Mohamed, Chao Zhang, Ebraam Khalifa, Eduardo Medeiros, Estefania Recayte, Fatemeh Ghasemifard, Ji Lianghai, Juan Bucheli, Karthik Anantha Swamy, Marius Caus, Mehmet Gurelli, Miguel A. Vazquez, Musbah Shaat, Nathan Borios, Per-Erik Eriksson, Sebastian Euler, Zheng Li, Xiaotian Fu

    Abstract: The efforts on the development, standardization and improvements to communication systems towards 5G Advanced and 6G are on track to provide benefits such as an unprecedented level of connectivity and performance, enabling a diverse range of vertical services. The full integration of non-terrestrial components into 6G plays a pivotal role in realizing this paradigm shift towards ubiquitous communi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: accepted in 2024 EuCNC and 6G Summit, Antwerp, Belgium, 3_6 June 2024

  37. arXiv:2404.13388  [pdf

    eess.IV cs.CV cs.LG

    Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

    Authors: Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

    Abstract: Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  38. arXiv:2404.13386  [pdf

    eess.IV cs.CV cs.LG

    SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

    Authors: Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

    Abstract: Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISBI 2024

  39. arXiv:2404.11304  [pdf

    eess.SY

    Dynamic Phasor Modeling of Single-Phase Grid-Forming Converters

    Authors: Wenjia Si, Chenming Liu, Steven Liu, Hongchang Li, Chenghui Zhang, **gyang Fang

    Abstract: In modern power systems, grid-forming power converters (GFMCs) have emerged as an enabling technology. However, the modeling of single-phase GFMCs faces new challenges. In particular, the nonlinear orthogonal signal generation unit, crucial for power measurement, still lacks an accurate model. To overcome the challenges, this letter proposes a dynamic phasor model of single-phase GFMCs. Moreover,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  40. arXiv:2404.11278  [pdf, other

    physics.ins-det eess.IV

    Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

    Authors: Dikai Li, Jian Yu, Qian Chen, Chunhui Zhang, Xiangyu Wan, Leifeng Cao

    Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  41. arXiv:2404.10021  [pdf

    eess.SY

    Monitoring Based Fatigue Damage Prognosis of Wind Turbine Composite Blades under Uncertain Wind Loads

    Authors: Chizhi Zhang, Hua-Peng Chen

    Abstract: Lifecycle assessment of wind turbines is essential to improve their design and to optimum maintenance plans for preventing failures during the design life. A critical element of wind turbines is the composite blade due to uncertain cyclic wind loads with relatively high frequency and amplitude in offshore environments. It is critical to detect the wind fatigue damage evolution in composite blades… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  42. arXiv:2404.08408  [pdf, other

    cs.LG cs.AI eess.SP physics.geo-ph

    Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

    Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo

    Abstract: Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a no… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  43. Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

    Authors: Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

    Abstract: This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 5 pages, 4 figures, published by TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2023

  44. arXiv:2404.04283  [pdf, other

    cs.CV cs.LG eess.IV

    Translation-based Video-to-Video Synthesis

    Authors: Pratim Saha, Chengcui Zhang

    Abstract: Translation-based Video Synthesis (TVS) has emerged as a vital research area in computer vision, aiming to facilitate the transformation of videos between distinct domains while preserving both temporal continuity and underlying content features. This technique has found wide-ranging applications, encompassing video super-resolution, colorization, segmentation, and more, by extending the capabilit… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 25 pages, 9 figures

  45. arXiv:2404.03329  [pdf

    cs.LG eess.SP stat.ML

    DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data

    Authors: Yukun Xie, Juan Du, Chen Zhang

    Abstract: In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant infor… ▽ More

    Submitted 24 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Revised version for submission to IISE Transactions

  46. arXiv:2404.00622  [pdf, other

    cs.MA eess.SY

    OpenMines: A Light and Comprehensive Mining Simulation Environment for Truck Dispatching

    Authors: Shi Meng, Bin Tian, Xiaotong Zhang, Shuangying Qi, Caiji Zhang, Qiang Zhang

    Abstract: Mine fleet management algorithms can significantly reduce operational costs and enhance productivity in mining systems. Most current fleet management algorithms are evaluated based on self-implemented or proprietary simulation environments, posing challenges for replication and comparison. This paper models the simulation environment for mine fleet management from a complex systems perspective. Bu… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: accepted in: 2024 35th IEEE Intelligent Vehicles Symposium (IV) 4 figures, 1 table

  47. arXiv:2403.19002  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Robust Active Speaker Detection in Noisy Environments

    Authors: Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

    Abstract: This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation a… ▽ More

    Submitted 30 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 15 pages, 5 figures

  48. arXiv:2403.11699  [pdf, other

    eess.IV cs.CV

    A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

    Authors: Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

    Abstract: Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  49. arXiv:2403.10585  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Solving General Noisy Inverse Problem via Posterior Sampling: A Policy Gradient Viewpoint

    Authors: Haoyue Tang, Tian Xie, Aosong Feng, Hanyu Wang, Chenyang Zhang, Yang Bai

    Abstract: Solving image inverse problems (e.g., super-resolution and inpainting) requires generating a high fidelity image that matches the given input (the low-resolution image or the masked image). By using the input image as guidance, we can leverage a pretrained diffusion generative model to solve a wide range of image inverse tasks without task specific model fine-tuning. To precisely estimate the guid… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted and to Appear, AISTATS 2024

  50. arXiv:2403.10002  [pdf, ps, other

    cs.IT eess.SP

    Fast Group Scheduling for Downlink Large-Scale Multi-Group Multicast Beamforming

    Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed

    Abstract: Next-generation wireless networks need to handle massive user access effectively. This paper addresses the problem of joint group scheduling and multicast beamforming for downlink transmission with many active user groups. Aiming to maximize the minimum user throughput, we propose a three-phase approach to tackle this difficult joint optimization problem efficiently. In Phase 1, we utilize the opt… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 13 pages, 8 figures