Skip to main content

Showing 1–33 of 33 results for author: Fu, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.01700  [pdf, other

    cs.SD cs.MM eess.AS

    Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

    Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

    Abstract: In recent years, neural network-based Wake Word Spotting achieves good performance on clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting (AVWWS) receives lots of attention because visual lip movement information is not affected by complex acoustic scenes. Previous works usually use simple addition or concatenation for multi-modal fusion. The inter-modal correl… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP 2024

  2. arXiv:2401.03835  [pdf, other

    cs.CV eess.IV

    Limitations of Data-Driven Spectral Reconstruction -- Optics-Aware Analysis and Mitigation

    Authors: Qiang Fu, Matheus Souza, Eunsue Choi, Suhyun Shin, Seung-Hwan Baek, Wolfgang Heidrich

    Abstract: Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware. In this paper we systematically analyze the performance of such m… ▽ More

    Submitted 2 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 12 pages, 7 figures, 8 tables

  3. arXiv:2311.10261  [pdf, other

    cs.CV eess.SP

    Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

    Authors: Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

    Abstract: Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an eff… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  4. arXiv:2310.08080  [pdf

    eess.IV cs.CV

    RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray Projection

    Authors: Miao Zhu, Qiming Fu, Bo Liu, Mengxi Zhang, Bojian Li, Xiaoyan Luo, Fugen Zhou

    Abstract: Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imagin… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  5. arXiv:2303.04654  [pdf, other

    cs.CV eess.IV physics.optics

    Aberration-Aware Depth-from-Focus

    Authors: Xinge Yang, Qiang Fu, Mohammed Elhoseiny, Wolfgang Heidrich

    Abstract: Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-… ▽ More

    Submitted 17 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: [ICCP & TPAMI 2023] Considering optical aberrations during network training can improve the generalizability

  6. arXiv:2303.02348  [pdf, other

    cs.SD eess.AS

    The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

    Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

    Abstract: This paper further explores our previous wake word spotting system ranked 2-nd in Track 1 of the MISP Challenge 2021. First, we investigate a robust unimodal approach based on 3D and 2D convolution and adopt the simple attention module (SimAM) for our system to improve performance. Second, we explore different combinations of data augmentation methods for better performance. Finally, we study the… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  7. arXiv:2302.11502  [pdf

    physics.bio-ph cs.RO eess.SY

    Snake and Snake Robot Locomotion in Complex, 3-D Terrain

    Authors: Qiyuan Fu

    Abstract: Snakes can traverse almost all types of environments by bending their elongate bodies in 3-D to interact with the terrain. Similarly, a snake robot is a promising platform to perform critical tasks in various environments. Understanding how 3-D body bending effectively interacts with the terrain for propulsion and stability can not only inform how snakes traverse natural environments, but also all… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: This is a dissertation submitted to and accepted by Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

  8. arXiv:2302.04469  [pdf, other

    cs.SD eess.AS

    Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters

    Authors: Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

    Abstract: This paper proposes a joint acoustic echo cancellation (AEC) and speech dereverberation (DR) algorithm in the short-time Fourier transform domain. The reverberant microphone signals are described using an auto-regressive (AR) model. The AR coefficients and the loudspeaker-to-microphone acoustic transfer functions (ATFs) are considered time-varying and are modeled simultaneously using a first-order… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  9. arXiv:2302.01089  [pdf, other

    cs.CV cs.LG eess.IV physics.optics

    Curriculum Learning for ab initio Deep Learned Refractive Optics

    Authors: Xinge Yang, Qiang Fu, Wolfgang Heidrich

    Abstract: Deep optical optimization has recently emerged as a new paradigm for designing computational imaging systems using only the output image as the objective. However, it has been limited to either simple optical systems consisting of a single element such as a diffractive optical element (DOE) or metalens, or the fine-tuning of compound lenses from good initial designs. Here we present a DeepLens des… ▽ More

    Submitted 17 March, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Automatically design computational lenses from scratch with differentiable ray tracing

  10. Vector Quantized Semantic Communication System

    Authors: Qifan Fu, Huiqiang Xie, Zhi** Qin, Gregory Slabaugh, Xiaoming Tao

    Abstract: Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extr… ▽ More

    Submitted 12 April, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: This five pages article has been accepted for publication in IEEE Wireless Communications Letters. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/LWC.2023.3255221

  11. arXiv:2208.10701  [pdf, other

    eess.IV cs.CV

    CM-MLP: Cascade Multi-scale MLP with Axial Context Relation Encoder for Edge Segmentation of Medical Image

    Authors: **kai Lv, Yuyong Hu, Quanshui Fu, Zhiwang Zhang, Yuqiang Hu, Lin Lv, Guoqing Yang, **peng Li, Yi Zhao

    Abstract: The convolutional-based methods provide good segmentation performance in the medical image segmentation task. However, those methods have the following challenges when dealing with the edges of the medical images: (1) Previous convolutional-based methods do not focus on the boundary relationship between foreground and background around the segmentation edge, which leads to the degradation of segme… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  12. arXiv:2205.15195  [pdf, other

    cs.SD eess.AS

    Personalized Acoustic Echo Cancellation for Full-duplex Communications

    Authors: Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie

    Abstract: Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background n… ▽ More

    Submitted 29 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: submitted to INTERSPEECH 22

  13. arXiv:2204.05445  [pdf, other

    cs.SD eess.AS

    Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness

    Authors: Dianwen Ng, ** Hui Pang, Yang Xiao, Biao Tian, Qiang Fu, Eng Siong Chng

    Abstract: It is critical for a keyword spotting model to have a small footprint as it typically runs on-device with low computational resources. However, maintaining the previous SOTA performance with reduced model size is challenging. In addition, a far-field and noisy environment with multiple signals interference aggravates the problem causing the accuracy to degrade significantly. In this paper, we pres… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: submitted to INTERSPEECH 2022

  14. arXiv:2203.06517  [pdf, other

    cs.SD eess.AS

    SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

    Authors: Zhongwei Teng, Quchen Fu, Jules White, Maria E. Powell, Douglas C. Schmidt

    Abstract: Research in the past several years has boosted the performance of automatic speaker verification systems and countermeasure systems to deliver low Equal Error Rates (EERs) on each system. However, research on joint optimization of both systems is still limited. The Spoofing-Aware Speaker Verification (SASV) 2022 challenge was proposed to encourage the development of integrated SASV systems with ne… ▽ More

    Submitted 24 March, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

    Comments: Update Experiment Results in ASV2019 protocol

  15. Multi-Task Deep Residual Echo Suppression with Echo-aware Loss

    Authors: Shimin Zhang, Ziteng Wang, Jiayao Sun, Yihui Fu, Biao Tian, Qiang Fu, Lei Xie

    Abstract: This paper introduces the NWPU Team's entry to the ICASSP 2022 AEC Challenge. We take a hybrid approach that cascades a linear AEC with a neural post-filter. The former is used to deal with the linear echo components while the latter suppresses the residual non-linear echo components. We use gated convolutional F-T-LSTM neural network (GFTNN) as the backbone and shape the post-filter by a multi-ta… ▽ More

    Submitted 20 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICASSP 2022

  16. ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting

    Authors: Dianwen Ng, Yunqi Chen, Biao Tian, Qiang Fu, Eng Siong Chng

    Abstract: Building efficient architecture in neural speech processing is paramount to success in keyword spotting deployment. However, it is very challenging for lightweight models to achieve noise robustness with concise neural operations. In a real-world application, the user environment is typically noisy and may also contain reverberations. We proposed a novel feature interactive convolutional model wit… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

    Comments: submitted to ICASSP 2022

  17. arXiv:2112.07815  [pdf

    physics.bio-ph eess.SY

    Contact feedback helps snake robots propel against uneven terrain using vertical bending

    Authors: Qiyuan Fu, Chen Li

    Abstract: Snakes can bend their elongate bodies in various forms to traverse various environments. We understand how snakes use lateral bending to push against asperities on flat ground for propulsion, and snake robots can do so effectively. However, snakes can also use vertical bending to push against terrain of large height variation for propulsion, and they can adjust it to adapt to novel terrain presuma… ▽ More

    Submitted 1 August, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: 62 pages, 20 figures

    Journal ref: Bioinspir. Biomim., vol. 18, no. 5, p. 56002, Aug. 2023

  18. arXiv:2112.02522  [pdf, other

    eess.IV cs.CV

    Snapshot HDR Video Construction Using Coded Mask

    Authors: Masheal Alghamdi, Qiang Fu, Ali Thabet, Wolfgang Heidrich

    Abstract: This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining the consistency between successive frames. HDR image acquisition from single image capture, also known as snapshot HDR imaging, can be achieved in several ways. For example, the reconfigurable snapshot HDR ca… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 13 pages, 7 figures

  19. arXiv:2110.08439  [pdf, other

    cs.SD eess.AS

    Controllable Multichannel Speech Dereverberation based on Deep Neural Networks

    Authors: Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

    Abstract: Neural network based speech dereverberation has achieved promising results in recent studies. Nevertheless, many are focused on recovery of only the direct path sound and early reflections, which could be beneficial to speech perception, are discarded. The performance of a model trained to recover clean speech degrades when evaluated on early reverberation targets, and vice versa. This paper propo… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP2022

  20. arXiv:2110.08437  [pdf, other

    cs.SD eess.AS

    NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications

    Authors: Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

    Abstract: Acoustic echo cancellation (AEC), noise suppression (NS) and automatic gain control (AGC) are three often required modules for real-time communications (RTC). This paper proposes a neural network supported algorithm for RTC, namely NN3A, which incorporates an adaptive filter and a multi-task model for residual echo suppression, noise reduction and near-end speech activity detection. The proposed a… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP2022

  21. arXiv:2109.08123  [pdf, other

    eess.IV cs.CV physics.optics

    Neural Étendue Expander for Ultra-Wide-Angle High-Fidelity Holographic Display

    Authors: Ethan Tseng, Grace Kuo, Seung-Hwan Baek, Nathan Matsuda, Andrew Maimone, Florian Schiffers, Praneeth Chakravarthula, Qiang Fu, Wolfgang Heidrich, Douglas Lanman, Felix Heide

    Abstract: Holographic displays can generate light fields by dynamically modulating the wavefront of a coherent beam of light using a spatial light modulator, promising rich virtual and augmented reality applications. However, the limited spatial resolution of existing dynamic spatial light modulators imposes a tight bound on the diffraction angle. As a result, modern holographic displays possess low étendue… ▽ More

    Submitted 26 April, 2024; v1 submitted 16 September, 2021; originally announced September 2021.

  22. arXiv:2109.02774  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

    Authors: Quchen Fu, Zhongwei Teng, Jules White, Maria Powell, Douglas C. Schmidt

    Abstract: Voice assistants, such as smart speakers, have exploded in popularity. It is currently estimated that the smart speaker adoption rate has exceeded 35% in the US adult population. Manufacturers have integrated speaker identification technology, which attempts to determine the identity of the person speaking, to provide personalized services to different members of the same family. Speaker identific… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  23. arXiv:2109.02773  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

    Authors: Zhongwei Teng, Quchen Fu, Jules White, Maria Powell, Douglas C. Schmidt

    Abstract: An emerging trend in audio processing is capturing low-level speech representations from raw waveforms. These representations have shown promising results on a variety of tasks, such as speech recognition and speech separation. Compared to handcrafted features, learning speech features via backpropagation provides the model greater flexibility in how it represents data for different tasks theoreti… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  24. arXiv:2107.08406  [pdf, other

    cs.CV eess.SY

    A Miniature Biological Eagle-Eye Vision System for Small Target Detection

    Authors: Shutai Wang, Qiang Fu, Yinhao Hu, Chunhua Zhang, Wei He

    Abstract: Small target detection is known to be a challenging problem. Inspired by the structural characteristics and physiological mechanism of eagle-eye, a miniature vision system is designed for small target detection in this paper. First, a hardware platform is established, which consists of a pan-tilt, a short-focus camera and a long-focus camera. Then, based on the visual attention mechanism of eagle-… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: submitted to 2021 Chinese Automation Congress (CAC 2021)

  25. arXiv:2107.07084  [pdf, other

    cs.RO eess.SY

    Vision-Based Target Localization for a Flap**-Wing Aerial Vehicle

    Authors: Xinghao Dong, Qiang Fu, Chunhua Zhang, Wei He

    Abstract: The flap**-wing aerial vehicle (FWAV) is a new type of flying robot that mimics the flight mode of birds and insects. However, FWAVs have their special characteristics of less load capacity and short endurance time, so that most existing systems of ground target localization are not suitable for them. In this paper, a vision-based target localization algorithm is proposed for FWAVs based on a ge… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: Submitted to the 36th Youth Academic Annual Conference of Chinese Association of Automation (YAC2021)

  26. A survey on computational spectral reconstruction methods from RGB to hyperspectral imaging

    Authors: **gang Zhang, Runmu Su, Wenqi Ren, Qiang Fu, Felix Heide, Yunfeng Nie

    Abstract: Hyperspectral imaging enables versatile applications due to its competence in capturing abundant spatial and spectral information, which are crucial for identifying substances. However, the devices for acquiring hyperspectral images are expensive and complicated. Therefore, many alternative spectral imaging methods have been proposed by directly reconstructing the hyperspectral information from lo… ▽ More

    Submitted 13 July, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Journal ref: Scientific Reports | (2022) 12:11905

  27. arXiv:2104.04325  [pdf, other

    cs.SD eess.AS

    Joint Online Multichannel Acoustic Echo Cancellation, Speech Dereverberation and Source Separation

    Authors: Yueyue Na, Ziteng Wang, Zhang Liu, Biao Tian, Qiang Fu

    Abstract: This paper presents a joint source separation algorithm that simultaneously reduces acoustic echo, reverberation and interfering sources. Target speeches are separated from the mixture by maximizing independence with respect to the other sources. It is shown that the separation process can be decomposed into cascading sub-processes that separately relate to acoustic echo cancellation, speech derev… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: submitted to INTERSPEECH 2021

  28. arXiv:2103.16693  [pdf, other

    eess.IV cs.CV

    Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging

    Authors: Ilya Chugunov, Seung-Hwan Baek, Qiang Fu, Wolfgang Heidrich, Felix Heide

    Abstract: We introduce Mask-ToF, a method to reduce flying pixels (FP) in time-of-flight (ToF) depth captures. FPs are pervasive artifacts which occur around depth edges, where light paths from both an object and its background are integrated over the aperture. This light mixes at a sensor pixel to produce erroneous depth estimates, which can adversely affect downstream 3D vision tasks. Mask-ToF starts at t… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021. Project page and code: https://light.princeton.edu/publication/mask-tof

  29. arXiv:2103.05198  [pdf, ps, other

    physics.bio-ph eess.SY q-bio.QM

    Continuous body 3-D reconstruction of limbless animals

    Authors: Qiyuan Fu, Thomas W. Mitchel, ** Seob Kim, Gregory S. Chirikjian, Chen Li

    Abstract: Limbless animals such as snakes, limbless lizards, worms, eels, and lampreys move their slender, long bodies in three dimensions to traverse diverse environments. Accurately quantifying their continuous body's 3-D shape and motion is important for understanding body-environment interactions in complex terrain, but this is difficult to achieve (especially for local orientation and rotation). Here,… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Journal ref: Journal of Experimental Biology (2021), 224 (6), jeb220731

  30. arXiv:2102.08551  [pdf, other

    cs.SD eess.AS

    Weighted Recursive Least Square Filter and Neural Network based Residual Echo Suppression for the AEC-Challenge

    Authors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu

    Abstract: This paper presents a real-time Acoustic Echo Cancellation (AEC) algorithm submitted to the AEC-Challenge. The algorithm consists of three modules: Generalized Cross-Correlation with PHAse Transform (GCC-PHAT) based time delay compensation, weighted Recursive Least Square (wRLS) based linear adaptive filtering and neural network based residual echo suppression. The wRLS filter is derived from a no… ▽ More

    Submitted 18 February, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2021

  31. arXiv:2004.10447  [pdf, other

    eess.IV cs.CV

    Learning an Adaptive Model for Extreme Low-light Raw Image Processing

    Authors: Qingxu Fu, Xiaoguang Di, Yu Zhang

    Abstract: Low-light images suffer from severe noise and low illumination. Current deep learning models that are trained with real-world images have excellent noise reduction, but a ratio parameter must be chosen manually to complete the enhancement pipeline. In this work, we propose an adaptive low-light raw image enhancement network to avoid parameter-handcrafting and to improve image quality. The proposed… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  32. arXiv:2003.13733  [pdf

    physics.bio-ph eess.SY q-bio.QM

    Lateral oscillation and body compliance help snakes and snake robots stably traverse large, smooth obstacles

    Authors: Qiyuan Fu, Sean W. Gart, Thomas W. Mitchel, ** Seob Kim, Gregory S. Chirikjian, Chen Li

    Abstract: Snakes can move through almost any terrain. Similarly, snake robots hold the promise as a versatile platform to traverse complex environments like earthquake rubble. Unlike snake locomotion on flat surfaces which is inherently stable, when snakes traverse complex terrain by deforming their body out of plane, it becomes challenging to maintain stability. Here, we review our recent progress in under… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Journal ref: Integrative and Comparative Biology (2020), 60 (1), 171

  33. arXiv:2002.09711  [pdf

    physics.bio-ph eess.SY q-bio.QM

    Robotic modeling of snake traversing large, smooth obstacles reveals stability benefits of body compliance

    Authors: Qiyuan Fu, Chen Li

    Abstract: Snakes can move through almost any terrain. Although their locomotion on flat surfaces using planar gaits is inherently stable, when snakes deform their body out of plane to traverse complex terrain, maintaining stability becomes a challenge. On trees and desert dunes, snakes grip branches or brace against depressed sand for stability. However, how they stably surmount obstacles like boulders too… ▽ More

    Submitted 3 May, 2021; v1 submitted 22 February, 2020; originally announced February 2020.

    Journal ref: Royal Society Open Science (2020), 7, 191192