Skip to main content

Showing 1–50 of 101 results for author: Guo, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2406.10056  [pdf, other

    cs.SD eess.AS

    UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

    Authors: Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng

    Abstract: The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.02940  [pdf, other

    cs.SD eess.AS

    Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

    Authors: Haohan Guo, Fenglong Xie, Dongchao Yang, Hui Lu, Xixin Wu, Helen Meng

    Abstract: VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2406.02328  [pdf, other

    cs.SD eess.AS

    SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng

    Abstract: In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compac… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  6. arXiv:2405.08949  [pdf, other

    eess.SP

    Task-Oriented Mulsemedia Communication using Unified Perceiver and Conformal Prediction in 6G Metaverse Systems

    Authors: Hongzhi Guo, Ian F. Akyildiz

    Abstract: The growing prominence of extended reality (XR), holographic-type communications, and metaverse demands truly immersive user experiences by using many sensory modalities, including sight, hearing, touch, smell, taste, etc. Additionally, the widespread deployment of sensors in areas such as agriculture, manufacturing, and smart homes is generating a diverse array of sensory data. A new media format… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  7. arXiv:2404.17069  [pdf, other

    cs.IT cs.LG eess.SP

    Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

    Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

    Abstract: The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operat… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  9. arXiv:2404.07985  [pdf, other

    cs.CV eess.IV

    WaveMo: Learning Wavefront Modulations to See Through Scattering

    Authors: Mingyang Xie, Haiyun Guo, Brandon Y. Feng, Lingbo **, Ashok Veeraraghavan, Christopher A. Metzler

    Abstract: Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introdu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  10. arXiv:2404.04878  [pdf, other

    eess.IV cs.CV

    CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

    Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

    Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR accepted paper

  11. arXiv:2403.19785  [pdf, other

    cs.IT eess.SP

    Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

    Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

    Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  12. arXiv:2403.19238  [pdf, other

    cs.CV cs.AI eess.IV

    Taming Lookup Tables for Efficient Image Retouching

    Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

    Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  13. arXiv:2403.14268  [pdf

    eess.AS cs.SD

    Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

    Authors: PeiYing Lee, HauYun Guo, Berlin Chen

    Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

    Report number: TAAI2023-Domestic-131

  14. arXiv:2403.13820  [pdf, other

    cs.LG cs.CR eess.SP

    Identity information based on human magnetocardiography signals

    Authors: Pengju Zhang, Chenxi Sun, Jianwei Zhang, Hong Guo

    Abstract: We have developed an individual identification system based on magnetocardiography (MCG) signals captured using optically pumped magnetometers (OPMs). Our system utilizes pattern recognition to analyze the signals obtained at different positions on the body, by scanning the matrices composed of MCG signals with a 2*2 window. In order to make use of the spatial information of MCG signals, we transf… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures. Author manuscript accepted for AAAI 2024 Spring Symposium on Clinical Foundation Models

  15. arXiv:2402.11769  [pdf, other

    eess.SY cs.GT math.OC

    Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection

    Authors: Cheng Feng, Kedi Zheng, Lanqing Shan, Hani Alers, Lampros Stergioulas, Hongye Guo, Qixin Chen

    Abstract: Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE PES Transactions

  16. arXiv:2402.08093  [pdf, other

    cs.LG cs.CL eess.AS

    BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

    Authors: Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

    Abstract: We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: v1.1 (fixed typos)

  17. arXiv:2402.03886  [pdf, other

    cs.IT eess.SP

    Full-Duplex Millimeter Wave MIMO Channel Estimation: A Neural Network Approach

    Authors: Mehdi Sattari, Hao Guo, Deniz Gündüz, Ashkan Panahi, Tommy Svensson

    Abstract: Millimeter wave (mmWave) multiple-input-multi-output (MIMO) is now a reality with great potential for further improvement. We study full-duplex transmissions as an effective way to improve mmWave MIMO systems. Compared to half-duplex systems, full-duplex transmissions may offer higher data rates and lower latency. However, full-duplex transmission is hindered by self-interference (SI) at the recei… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  18. arXiv:2401.11479  [pdf, other

    eess.SP

    Battery-Free Sensor Array for Wireless Multi-Depth In-Situ Sensing

    Authors: Hongzhi Guo, Adam Kamrath

    Abstract: Underground in-situ sensing plays a vital role in precision agriculture and infrastructure monitoring. While existing sensing systems utilize wires to connect an array of sensors at various depths for spatial-temporal data collection, wireless underground sensor networks offer a cable-free alternative. However, these wireless sensors are typically battery-powered, necessitating periodic recharging… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  19. arXiv:2401.09013  [pdf, other

    cs.NI eess.SP

    An Improved Virtual Force Approach for UAV Deployment and Resource Allocation in Emergency Communications

    Authors: Hongying Guo, Li Wang, Ruoguang Li, Luyang Hou, Lianming Xu, Aiguo Fei

    Abstract: In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate,… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  20. arXiv:2401.07791  [pdf, other

    eess.SP

    Near-Far Field Channel Modeling for Holographic MIMO Using Expectation-Maximization Methods

    Authors: Houfeng Chen, Shuhao Zeng, Hao Guo, Tommy Svensson, Hongliang Zhang

    Abstract: Holographic Multiple-Input Multiple-Output (HMIMO), which densely integrates numerous antennas into a limited space, is anticipated to provide higher rates for future 6G wireless communications. The increase in antenna aperture size makes the near-field region enlarge, causing some users to be located in the near-field region. Thus, we are facing a hybrid near-field and far-field communication pro… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  21. arXiv:2401.04152  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

    Authors: Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: End-to-end multi-talker speech recognition has garnered great interest as an effective approach to directly transcribe overlapped speech from multiple speakers. Current methods typically adopt either 1) single-input multiple-output (SIMO) models with a branched encoder, or 2) single-input single-output (SISO) models based on attention-based encoder-decoder architecture with serialized output train… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP2024

  22. arXiv:2311.11260  [pdf, other

    cs.RO cs.CV eess.SP

    Radarize: Enhancing Radar SLAM with Generalizable Doppler-Based Odometry

    Authors: Emerson Sie, Xinyu Wu, Heyu Guo, Deepak Vasisht

    Abstract: Millimeter-wave (mmWave) radar is increasingly being considered as an alternative to optical sensors for robotic primitives like simultaneous localization and map** (SLAM). While mmWave radar overcomes some limitations of optical sensors, such as occlusions, poor lighting conditions, and privacy concerns, it also faces unique challenges, such as missed obstacles due to specular reflections or fa… ▽ More

    Submitted 29 April, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  23. arXiv:2311.11086  [pdf

    eess.IV cs.CV

    LightBTSeg: A lightweight breast tumor segmentation model using ultrasound images via dual-path joint knowledge distillation

    Authors: Hongjiang Guo, Shengwen Wang, Hao Dang, Kangle Xiao, Yaru Yang, Wenpei Liu, Tongtong Liu, Yiying Wan

    Abstract: The accurate segmentation of breast tumors is an important prerequisite for lesion detection, which has significant clinical value for breast tumor research. The mainstream deep learning-based methods have achieved a breakthrough. However, these high-performance segmentation methods are formidable to implement in clinical scenarios since they always embrace high computation complexity, massive par… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 7 pages, 7 figures, conference

  24. arXiv:2311.02551  [pdf

    eess.SY cs.GT cs.LG

    High-dimensional Bid Learning for Energy Storage Bidding in Energy Markets

    Authors: **yu Liu, Hongye Guo, Qinghu Tang, En Lu, Qiuna Cai, Qixin Chen

    Abstract: With the growing penetration of renewable energy resource, electricity market prices have exhibited greater volatility. Therefore, it is important for Energy Storage Systems(ESSs) to leverage the multidimensional nature of energy market bids to maximize profitability. However, current learning methods cannot fully utilize the high-dimensional price-quantity bids in the energy markets. To address t… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, Accepted by the 15th International Conference on Applied Energy (ICAE2023)

  25. arXiv:2310.18529  [pdf, other

    physics.optics eess.IV

    FPM-INR: Fourier ptychographic microscopy image stack reconstruction using implicit neural representations

    Authors: Haowen Zhou, Brandon Y. Feng, Haiyun Guo, Siyu Lin, Mingshu Liang, Christopher A. Metzler, Changhuei Yang

    Abstract: Image stacks provide invaluable 3D information in various biological and pathological imaging applications. Fourier ptychographic microscopy (FPM) enables reconstructing high-resolution, wide field-of-view image stacks without z-stack scanning, thus significantly accelerating image acquisition. However, existing FPM methods take tens of minutes to reconstruct and gigabytes of memory to store a hig… ▽ More

    Submitted 31 October, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: Project Page: https://hwzhou2020.github.io/FPM-INR-Web/

  26. arXiv:2309.13602  [pdf, other

    eess.SP cs.IT

    6G Positioning and Sensing Through the Lens of Sustainability, Inclusiveness, and Trustworthiness

    Authors: Henk Wymeersch, Hui Chen, Hao Guo, Musa Furkan Keskin, Bahare M. Khorsandi, Mohammad H. Moghaddam, Alejandro Ramirez, Kim Schindhelm, Athanasios Stavridis, Tommy Svensson, Vijaya Yajnanarayana

    Abstract: 6G promises a paradigm shift in which positioning and sensing are inherently integrated, enhancing not only the communication performance but also enabling location- and context-aware services. Historically, positioning and sensing have been viewed through the lens of cost and performance trade-offs, implying an escalated demand for resources, such as radio, physical, and computational resources,… ▽ More

    Submitted 14 February, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Submitted and under review for IEEE Wireless Communications

  27. arXiv:2309.09088  [pdf, other

    cs.SD eess.AS

    Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition

    Authors: Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland

    Abstract: Vocoder models have recently achieved substantial progress in generating authentic audio comparable to human quality while significantly reducing memory requirement and inference time. However, these data-hungry generative models require large-scale audio data for learning good representations. In this paper, we apply contrastive learning methods in training the vocoder to improve the perceptual q… ▽ More

    Submitted 18 December, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

  28. arXiv:2309.06981  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

    Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

    Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by Mobicom 2023

  29. arXiv:2309.05276  [pdf, other

    cs.IT cs.LG cs.NI eess.SP

    Beamforming in Wireless Coded-Caching Systems

    Authors: Sneha Madhusudan, Charitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson

    Abstract: Increased capacity in the access network poses capacity challenges on the transport network due to the aggregated traffic. However, there are spatial and time correlation in the user data demands that could potentially be utilized. To that end, we investigate a wireless transport network architecture that integrates beamforming and coded-caching strategies. Especially, our proposed design entails… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE Future Networks World Forum, 2023

  30. arXiv:2309.00126  [pdf, other

    cs.SD cs.CL eess.AS

    QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

    Authors: Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

    Abstract: This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio. This framework comprises two VQ-S3R learners: first, the principal learner aims to provide a generative Multi-Stage Multi-Codebook (MSMC) VQ-S3R via the… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

  31. Mobility-Aware Computation Offloading for Swarm Robotics using Deep Reinforcement Learning

    Authors: Xiucheng Wang, Hongzhi Guo

    Abstract: Swarm robotics is envisioned to automate a large number of dirty, dangerous, and dull tasks. Robots have limited energy, computation capability, and communication resources. Therefore, current swarm robotics have a small number of robots, which can only provide limited spatio-temporal information. In this paper, we propose to leverage the mobile edge computing to alleviate the computation burden.… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Journal ref: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC)

  32. arXiv:2308.04666  [pdf, other

    cs.SD eess.AS

    Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

    Authors: Zirui Ge, Xinzhou Xu, Haiyan Guo, Tingting Wang, Zhen Yang

    Abstract: The emergence of self-supervised representation (i.e., wav2vec 2.0) allows speaker-recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub-optimal temporal pooling strategies. Despite of improved strategies considering graph learning and… ▽ More

    Submitted 23 February, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: 9 pages, 4 figures

  33. arXiv:2308.02494  [pdf, other

    eess.IV cs.CV cs.GR

    Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization

    Authors: Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka

    Abstract: Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a do… ▽ More

    Submitted 6 April, 2024; v1 submitted 16 July, 2023; originally announced August 2023.

    Comments: Accepted to IEEE VIS 2023. https://www.computer.org/csdl/journal/tg/2024/01/10297599/1RyYguiNBLO

    Journal ref: In IEEE Transactions on Visualization & Computer Graphics, vol. 30, no. 01, pp. 965-974, 2024

  34. arXiv:2307.00393  [pdf, other

    eess.AS

    Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

    Authors: Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro

    Abstract: Voice conversion systems have made significant advancements in terms of naturalness and similarity in common voice conversion tasks. However, their performance in more complex tasks such as cross-lingual voice conversion and expressive voice conversion remains imperfect. In this study, we propose a novel approach that combines a jointly trained speaker encoder and content features extracted from t… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  35. arXiv:2305.05736  [pdf, other

    cs.SD cs.CR eess.AS

    VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

    Authors: Yuanda Wang, Hanqing Guo, Guang**g Wang, Bocheng Chen, Qiben Yan

    Abstract: Deep learning based voice synthesis technology generates artificial human-like speeches, which has been used in deepfakes or identity theft attacks. Existing defense mechanisms inject subtle adversarial perturbations into the raw speech audios to mislead the voice synthesis models. However, optimizing the adversarial perturbation not only consumes substantial computation time, but it also requires… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  36. A Low-cost Through-metal Communication System for Sensors in Metallic Pipes

    Authors: Hongzhi Guo, Marlin Prince, Javionn Ramsey, Jarvis Turner, Marcus Allen, Chevel Samuels, Jordan Atta Nuako

    Abstract: Metallic pipes and other containers are widely used to store and transport toxic gases and liquids. Various sensors have been designed to monitor the environment inside metallic pipes and containers, such as pressure, liquid-level, and chemical sensors. Moreover, sensors are also used to inspect and detect pipe leakages. However, sensors are usually placed outside of metallic pipes and containers… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Journal ref: IEEE Sensors Journal, 2023

  37. arXiv:2303.10556  [pdf, ps, other

    eess.AS cs.SD

    The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework

    Authors: Zirui Ge, Haiyan Guo, Zhen Yang

    Abstract: Pre-trained wav2vec2.0 model has been proved its effectiveness for speaker recognition. However, current feature processing methods are focusing on classical pooling on the output features of the pre-trained wav2vec2.0 model, such as mean pooling, max pooling etc. That methods take the features as the independent and irrelevant units, ignoring the inter-relationship among all the features, and do… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

  38. arXiv:2302.08296  [pdf, other

    cs.SD eess.AS

    QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion

    Authors: Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro

    Abstract: With the development of automatic speech recognition (ASR) and text-to-speech (TTS) technology, high-quality voice conversion (VC) can be achieved by extracting source content information and target speaker information to reconstruct waveforms. However, current methods still require improvement in terms of inference speed. In this study, we propose a lightweight VITS-based VC model that uses the H… ▽ More

    Submitted 23 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  39. Real-time Path Planning of Driver-less Mining Trains with Time-dependent Physical Constraints

    Authors: Xiaojiang Ren, Hui Guo, Sheng Kai, Guoqiang Mao

    Abstract: While the increased automation levels of production and operation equipment have led to improved productivity of mining activity in open pit mines, the capacity of mine transport system become a bottleneck. The optimization of mine transport system is of great practical significance to reduce the production and operation cost and improve the production and organizational efficiency of mines. In th… ▽ More

    Submitted 6 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  40. arXiv:2211.06974  [pdf

    cs.NI cs.IT eess.SP

    A Comparison between Network-Controlled Repeaters and Reconfigurable Intelligent Surfaces

    Authors: Hao Guo, Charitha Madapatha, Behrooz Makki, Boris Dortschy, Lei Bao, Magnus Åström, Tommy Svensson

    Abstract: Network-controlled repeater (NCR) has been recently considered as a study-item in 3GPP Release 18, and the discussions are continuing in a work-item. In this paper, we introduce the concept of NCRs, as a possible low-complexity device to support for network densification and compare the performance of the NCRs with those achieved by reconfigurable intelligent surfaces (RISs). The results are prese… ▽ More

    Submitted 13 November, 2022; originally announced November 2022.

    Comments: 7 pages, 7 figures, submitted to potential IEEE publication

  41. arXiv:2211.00272  [pdf, other

    eess.SP

    RF-CHORD: Towards Deployable RFID Localization System for Logistics Network

    Authors: Bo Liang, Purui Wang, Renjie Zhao, Heyu Guo, Pengyu Zhang, Junchen Guo, Shunmin Zhu, Hongqiang Harry Liu, Xinyu Zhang, Chenren Xu

    Abstract: RFID localization is considered the key enabler of automating the process of inventory tracking and management for high-performance logistic network. A practical and deployable RFID localization system needs to meet reliability, throughput, and range requirements. This paper presents RF-Chord, the first RFID localization system that simultaneously meets all three requirements. RF-Chord features a… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: To be published in NSDI 2023

  42. arXiv:2210.15131  [pdf, other

    cs.SD cs.CL eess.AS

    Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

    Authors: Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng

    Abstract: This paper aims to enhance low-resource TTS by reducing training data requirements using compact speech representations. A Multi-Stage Multi-Codebook (MSMC) VQ-GAN is trained to learn the representation, MSMCR, and decode it to waveforms. Subsequently, we train the multi-stage predictor to predict MSMCRs from the text for TTS synthesis. Moreover, we optimize the training strategy by leveraging mor… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  43. arXiv:2209.10887  [pdf, other

    cs.SD cs.CL eess.AS

    A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

    Authors: Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

    Abstract: We propose a Multi-Stage, Multi-Codebook (MSMC) approach to high-performance neural TTS synthesis. A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks,… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  44. arXiv:2209.10367  [pdf, other

    cs.IT eess.SP

    Electromagnetic Field Exposure Avoidance thanks to Non-Intended User Equipment and RIS

    Authors: Hao Guo, Dinh-Thuy Phan-Huy, Tommy Svensson

    Abstract: On the one hand, there is a growing demand for high throughput which can be satisfied thanks to the deployment of new networks using massive multiple-input multiple-output (MIMO) and beamforming. On the other hand, in some countries or cities, there is a demand for arbitrarily low electromagnetic field exposure (EMFE) of people not concerned by the ongoing communication, which slows down the deplo… ▽ More

    Submitted 7 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 6 pages, 6 figures. Accepted in Globecom 2022 Workshop

  45. arXiv:2208.01382  [pdf

    eess.IV cs.CV

    A New Probabilistic V-Net Model with Hierarchical Spatial Feature Transform for Efficient Abdominal Multi-Organ Segmentation

    Authors: Minfeng Xu, Heng Guo, Jianfeng Zhang, Ke Yan, Le Lu

    Abstract: Accurate and robust abdominal multi-organ segmentation from CT imaging of different modalities is a challenging task due to complex inter- and intra-organ shape and appearance variations among abdominal organs. In this paper, we propose a probabilistic multi-organ segmentation network with hierarchical spatial-wise feature modulation to capture flexible organ semantic variants and inject the learn… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 12 pages, 6 figures

  46. arXiv:2207.05848  [pdf, other

    cs.SD eess.AS

    NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

    Authors: Hanqing Guo, Chenning Li, Lingkun Li, Zhichao Cao, Qiben Yan, Li Xiao

    Abstract: In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a D… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: 12 pages

  47. arXiv:2207.01219  [pdf, other

    eess.SP cs.AI cs.LG

    Masked Self-Supervision for Remaining Useful Lifetime Prediction in Machine Tools

    Authors: Haoren Guo, Haiyue Zhu, Jiahui Wang, Vadakkepat Prahlad, Weng Khuen Ho, Tong Heng Lee

    Abstract: Prediction of Remaining Useful Lifetime(RUL) in the modern manufacturing and automation workplace for machines and tools is essential in Industry 4.0. This is clearly evident as continuous tool wear, or worse, sudden machine breakdown will lead to various manufacturing failures which would clearly cause economic loss. With the availability of deep learning approaches, the great potential and prosp… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  48. arXiv:2207.01218  [pdf, other

    eess.IV cs.AI cs.CV

    CAM/CAD Point Cloud Part Segmentation via Few-Shot Learning

    Authors: Jiahui Wang, Haiyue Zhu, Haoren Guo, Abdullah Al Mamun, Vadakkepat Prahlad, Tong Heng Lee

    Abstract: 3D part segmentation is an essential step in advanced CAM/CAD workflow. Precise 3D segmentation contributes to lower defective rate of work-pieces produced by the manufacturing equipment (such as computer controlled CNCs), thereby improving work efficiency and attaining the attendant economic benefits. A large class of existing works on 3D model segmentation are mostly based on fully-supervised le… ▽ More

    Submitted 16 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: 7 pages, 5 figures

  49. arXiv:2205.14496  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

    Authors: Hanqing Guo, Qiben Yan, Nikolay Ivanov, Ying Zhu, Li Xiao, Eric J. Hunter

    Abstract: Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrogr… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

  50. arXiv:2203.14748  [pdf, ps, other

    eess.SP

    Universal Graph Filter Design based on Butterworth, Chebyshev and Elliptic Functions

    Authors: Zirui Ge, Haiyan Guo, Tingting Wang, Zhen Yang

    Abstract: Graph filters are crucial tools in processing the spectrum of graph signals. In this paper, we propose to design universal IIR graph filters with low computational complexity by using three kinds of functions, which are Butterworth, Chebyshev, and Elliptic functions, respectively. Specifically, inspired by the classical analog filter design method, we first derive the zeros and poles of graph freq… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 17 pages, 7 figures

    MSC Class: 05C31 ACM Class: G.1.10