Search | arXiv e-print repository

Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm

Authors: Boxiang Wang, Junwei Ji, Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan

Abstract: Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven… ▽ More Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12996 [pdf, other]

Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

arXiv:2405.12496 [pdf, other]

A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead of using microphone arrays, which increase the computation complexity of the ANC system, to isolate multiple noise sources to improve noise reduction performance, the application of the wireless technique avoids extra computation demand. Wireless transmissions of reference, error, and control signals are also applied to improve the convergence performance of the ANC system. Furthermore, this paper lists some wireless ANC applications, such as earbuds, headphones, windows, and headrests, underscoring their adaptability and efficiency in various settings. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07536 [pdf, other]

Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Authors: Xin Li, Wenyang Gan, Pang Wen, Daqi Zhu

Abstract: To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network meth… ▽ More To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2403.18139 [pdf, other]

Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic Model

Authors: Weijie Gan, Huidong Xie, Carl von Gall, Günther Platsch, Michael T. Jurkiewicz, Andrea Andrade, Udunna C. Anazodo, Ulugbek S. Kamilov, Hongyu An, Jorge Cabello

Abstract: Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET re… ▽ More Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2402.09460 [pdf, other]

doi 10.1109/ICASSP48485.2024.10448277

Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. During training, the co-processor and real-time controller are integrated into an end-to-end differentiable ANC system. This enables us to use the accumulated squared error signal as the loss for training the 1D CNN. With this unsupervised learning paradigm, the unsupervised-GFANC method not only omits the labelling process but also exhibits better noise reduction performance compared to the supervised GFANC method in real noise experiments. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2402.02694 [pdf, other]

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift. △ Less

Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.13998 [pdf, other]

WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this approach relies on obtaining a substantial amount of challenging-to-acquire segmentation annotations. This paper proposes a novel weakly supervised auxiliary task learning network model (WAL-Net) to explore the interdependence between carotid plaque classification and segmentation tasks. The plaque classification task is primary task, while the plaque segmentation task serves as an auxiliary task, providing valuable information to enhance the performance of the primary task. Weakly supervised learning is adopted in the auxiliary task to completely break away from the dependence on segmentation annotations. Experiments and evaluations are conducted on a dataset comprising 1270 carotid plaque ultrasound images from Wuhan University Zhongnan Hospital. Results indicate that the proposed method achieved an approximately 1.3% improvement in carotid plaque classification accuracy compared to the baseline network. Specifically, the accuracy of mixed-echoic plaques classification increased by approximately 3.3%, demonstrating the effectiveness of our approach. △ Less

Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.08678 [pdf, other]

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Submitted to ICASSP 2024

arXiv:2311.18073 [pdf, other]

DiffGEPCI: 3D MRI Synthesis from mGRE Signals using 2.5D Diffusion Model

Authors: Yuyang Hu, Satya V. V. N. Kothapalli, Weijie Gan, Alexander L. Sukstanskii, Gregory F. Wu, Manu Goyal, Dmitriy A. Yablonskiy, Ulugbek S. Kamilov

Abstract: We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI… ▽ More We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI signals as conditional inputs. DiffGEPCI operates in a two-step fashion: it initially estimates a 3D volume slice-by-slice using the axial plane and subsequently applies a refinement algorithm (referred to as 2.5D) to enhance the quality of the coronal and sagittal planes. Experimental validation on real mGRE data shows that DiffGEPCI achieves excellent performance, surpassing generative adversarial networks (GANs) and traditional diffusion models. △ Less

Submitted 18 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15445 [pdf, other]

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

Authors: Zihao Zou, Jiaming Liu, Shirin Shoushtari, Yubo Wang, Weijie Gan, Ulugbek S. Kamilov

Abstract: Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces.… ▽ More Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 32 pages, 27 figures

arXiv:2311.14068 [pdf, other]

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. The SIM is initially generated through a statistical approach, referred as SIM-V1. However, the fixed artificial mask may mismatch the SED model, resulting in limited effectiveness. Therefore, we further propose SIM-V2, which employs a word embedding model for adaptive SIM estimation. Experimental results show that the proposed IDC module can effectively utilize the information from soft labels, and the integration of SIM-V1 can further improve the accuracy. In addition, the impact of different word embedding dimensions on SIM-V2 is explored, and the results show that the appropriate dimension can enable SIM-V2 achieve superior performance than SIM-V1. In DCASE 2023 Challenge Task4B, the proposed system achieved the top ranking performance on the evaluation dataset of MAESTRO Real. △ Less

Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: to be improved (unfinished)

arXiv:2311.12371 [pdf, other]

AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-semantic audio Transformer by incorporating contrastive learning between hybrid acoustic representations. We then leverage LLMs to generate audio logs that summarize textual descriptions of the acoustic environment. Finally, we evaluate the AudioLog system on two datasets with both scene and event annotations. Experiments show that the proposed system achieves exceptional performance in acoustic scene classification and sound event detection, surpassing existing methods in the field. Further analysis of the prompts to LLMs demonstrates that AudioLog can effectively summarize long audio sequences. To the best of our knowledge, this approach is the first attempt to leverage LLMs for summarizing long audio sequences. △ Less

Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.04248 [pdf, other]

DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Xiongchao Chen, Qiong Liu, Xueqi Guo, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Ge Wang, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image… ▽ More As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image reconstructions due to the memory burden. Directly stacking 2D slices together to create 3D image volumes would results in severe inconsistencies between slices. Previous works tried to either apply a penalty term along the z-axis to remove inconsistencies or reconstruct the 3D image volumes with 2 pre-trained perpendicular 2D diffusion models. Nonetheless, these previous methods failed to produce satisfactory results in challenging cases for PET image denoising. In addition to administered dose, the noise levels in PET images are affected by several other factors in clinical settings, e.g. scan time, medical history, patient size, and weight, etc. Therefore, a method to simultaneously denoise PET images with different noise-levels is needed. Here, we proposed a Dose-aware Diffusion model for 3D low-dose PET imaging (DDPET-3D) to address these challenges. We extensively evaluated DDPET-3D on 100 patients with 6 different low-dose levels (a total of 600 testing studies), and demonstrated superior performance over previous diffusion models for 3D imaging problems as well as previous noise-aware medical image denoising models. The code is available at: https://github.com/xxx/xxx. △ Less

Submitted 28 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Paper under review. 16 pages, 11 figures, 4 tables

arXiv:2311.02003 [pdf, other]

A Structured Pruning Algorithm for Model-based Deep Learning

Authors: Chicago Park, Weijie Gan, Zihao Zou, Yuyang Hu, Zhixin Sun, Ulugbek S. Kamilov

Abstract: There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits the… ▽ More There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits their applicability in certain large-scale applications. We address this issue by presenting structured pruning algorithm for model-based deep learning (SPADE) as the first structured pruning algorithm for MBDL networks. SPADE reduces the computational complexity of CNNs used within MBDL networks by pruning its non-essential weights. We propose three distinct strategies to fine-tune the pruned MBDL networks to minimize the performance loss. Each fine-tuning strategy has a unique benefit that depends on the presence of a pre-trained model and a high-quality ground truth. We validate SPADE on two distinct inverse problems, namely compressed sensing MRI and image super-resolution. Our results highlight that MBDL models pruned by SPADE can achieve substantial speed up in testing time while maintaining competitive performance. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.07504 [pdf, other]

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Authors: Weijie Gan, Qiuchen Zhai, Michael Thompson McCann, Cristina Garcia Cardona, Ulugbek S. Kamilov, Brendt Wohlberg

Abstract: Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In… ▽ More Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance. △ Less

Submitted 6 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04297 [pdf, other]

A Plug-and-Play Image Registration Network

Authors: Junhao Hu, Weijie Gan, Zhixin Sun, Hongyu An, Ulugbek S. Kamilov

Abstract: Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in develo** DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclu… ▽ More Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in develo** DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclusively relies on a pre-trained CNN without explicitly enforcing fidelity between the registered image and the reference. We present plug-and-play image registration network (PIRATE) as a new DIR method that addresses this issue by integrating an explicit data-fidelity penalty and a CNN prior. PIRATE pre-trains a CNN denoiser on the registration field and "plugs" it into an iterative method as a regularizer. We additionally present PIRATE+ that fine-tunes the CNN prior in PIRATE using deep equilibrium models (DEQ). PIRATE+ interprets the fixed-point iteration of PIRATE as a network with effectively infinite layers and then trains the resulting network end-to-end, enabling it to learn more task-specific information and boosting its performance. Our numerical results on OASIS and CANDI datasets show that our methods achieve state-of-the-art performance on DIR. △ Less

Submitted 19 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2308.07767 [pdf, other]

Preliminary investigation of the short-term in situ performance of an automatic masker selection system

Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Woon-Seng Gan

Abstract: Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an… ▽ More Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an automatic masker selection system (AMSS) was recently developed. The AMSS uses a deep-learning model trained on a large-scale dataset of subjective responses to maximize the derived ISO pleasantness (ISO 12913-2). Hence, this study investigates the short-term in situ performance of the AMSS implemented in a gazebo in an urban park. Firstly, the predicted ISO pleasantness from the AMSS is evaluated in comparison to the in situ subjective evaluation scores. Secondly, the effect of various masker selection schemes on the perceived affective quality and appropriateness would be evaluated. In total, each participant evaluated 6 conditions: (1) ambient environment with no maskers; (2) AMSS; (3) bird and (4) water masker from prior art; (5) random selection from same pool of maskers used to train the AMSS; and (6) selection of best-performing maskers based on the analysis of the dataset used to train the AMSS. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: paper submitted to the 52nd International Congress and Exposition on Noise Control Engineering held in Chiba, Greater Tokyo, Japan, on 20-23 August 2023 (Inter-Noise 2023)

ACM Class: J.2; J.4

arXiv:2308.03684 [pdf, other]

Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of dealing with quickly varying disturbances, such as piling noise. Furthermore, the noise power variation also deteriorates the robustness of the algorithm when it adopts the fixed step size. To solve these issues, we integrated the normalized multichannel FxLMS with the momentum method, which hence, effectively avoids the interference of the primary noise power and accelerates the convergence of the algorithm. To validate its effectiveness, we deployed this algorithm in a multichannel noise control window to control the real machine noise. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

arXiv:2307.10913 [pdf, other]

Practical Active Noise Control: Restriction of Maximum Output Power

Authors: Woon-Seng Gan, Dongyuan Shi, Xiaoyi Shen

Abstract: This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. Syst… ▽ More This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. System divergence can occur when the secondary speaker units are over-amplified or when there is a disturbance other than the noise to be controlled. Disturbance rejection is important to prevent the adaptive system from adapting to unwanted signals. The paper provides guidelines for implementing and operating real-time AANC systems based on these algorithms. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

arXiv:2307.05533 [pdf, other]

doi 10.1016/j.scs.2023.104763

Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted manuscript submitted to Sustainable Cities and Society

Journal ref: Sustain. Cities Soc., 104763, 2023

arXiv:2306.11408 [pdf, other]

A Computation-efficient Online Secondary Path Modeling Technique for Modified FXLMS Algorithm

Authors: Junwei Ji, Dongyuan Shi, Woon-Seng Gan, Xiaoyi Shen, Zhengding Luo

Abstract: This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM… ▽ More This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM, eliminating the use of destabilizing components such as auxiliary white noise or additional filters, which can negatively impact the complexity, stability, and noise reduction performance of the ANC system. The system operates in adaptive ANC mode until divergence is detected due to secondary path changes. At this moment, it switches to SPM mode until the path is remodeled and then returns to ANC mode. Furthermore, numerical simulations in the paper demonstrate that the proposed online technique effectively copes with the secondary path variations. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.09535 [pdf, other]

doi 10.1109/LSP.2023.3286808

MOV-Modified-FxLMS algorithm with Variable Penalty Factor in a Practical Power Output Constrained Active Control System

Authors: Chung Kwan Lai, Dongyuan Shi, Bhan Lam, Woon-Seng Gan

Abstract: Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offli… ▽ More Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offline knowledge of disturbance power and secondary path gain. The constant penalty factor in MOV-FxLMS is also susceptible to variations in disturbance power that could cause output power constraint violations. This paper presents a new variable penalty factor that utilizes the estimated disturbance in the established Modified-FxLMS (MFxLMS) algorithm, resulting in a computationally efficient MOV-MFxLMS algorithm that can adapt to changes in disturbance levels in real-time. Numerical simulation with real noise and plant response showed that the variable penalty factor always manages to meet its maximum power output constraint despite sudden changes in disturbance power, whereas the fixed penalty factor has suffered from a constraint mismatch. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted article in IEEE Signal Processing Letters

Journal ref: IEEE Signal Process. Lett., vol. 30, pp. 723-727, 2023

arXiv:2306.01425 [pdf, other]

Active Noise Control in The New Century: The Role and Prospect of Signal Processing

Authors: Dongyuan Shi, Bhan Lam, Woon-Seng Gan, Jordan Cheer, Stephen J. Elliott

Abstract: Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespr… ▽ More Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespread commercial success of ANC in aircraft cabins, automobile cabins, and headsets demonstrates the immeasurable public health and economic benefits of ANC. This article continues where Elliott and Nelson's 1993 Signal Processing Magazine article and Elliott's 1997 50th anniversary commentary on ANC left off, tracing the technical developments and applications in ANC spurred by the seminal texts of Nelson and Elliott (1991), Kuo and Morgan (1996), Hansen and Snyder (1996), and Elliott (2001) since the turn of the century. This article focuses on technical developments pertaining to real-world implementations, such as improving algorithmic convergence, reducing system latency, and extending control to non-stationary and/or broadband noise, as well as the commercial transition challenges from analog to digital ANC systems. Finally, open issues and the future of ANC in the era of artificial intelligence are discussed. △ Less

Submitted 6 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Submitted to inter.noise 2023, Chiba, Japan

arXiv:2305.12672 [pdf, other]

Block Coordinate Plug-and-Play Methods for Blind Inverse Problems

Authors: Weijie Gan, Shirin Shoushtari, Yuyang Hu, Jiaming Liu, Hongyu An, Ulugbek S. Kamilov

Abstract: Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a… ▽ More Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a new block-coordinate PnP (BC-PnP) method that efficiently solves this joint estimation problem by introducing learned denoisers as priors on both the unknown image and the unknown measurement operator. We present a new convergence theory for BC-PnP compatible with blind inverse problems by considering nonconvex data-fidelity terms and expansive denoisers. Our theory analyzes the convergence of BC-PnP to a stationary point of an implicit function associated with an approximate minimum mean-squared error (MMSE) denoiser. We numerically validate our method on two blind inverse problems: automatic coil sensitivity estimation in magnetic resonance imaging (MRI) and blind image deblurring. Our results show that BC-PnP provides an efficient and principled framework for using denoisers as PnP priors for jointly estimating measurement operators and images. △ Less

Submitted 26 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2303.12517 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095752

Real-time modelling of observation filter in the Remote Microphone Technique for an Active Noise Control application

Authors: Chung Kwan Lai, Bhan Lam, Dongyuan Shi, Woon-Seng Gan

Abstract: The remote microphone technique (RMT) is often used in active noise control (ANC) applications to overcome design constraints in microphone placements by estimating the acoustic pressure at inconvenient locations using a pre-calibrated observation filter (OF), albeit limited to stationary primary acoustic fields. While the OF estimation in varying primary fields can be significantly improved throu… ▽ More The remote microphone technique (RMT) is often used in active noise control (ANC) applications to overcome design constraints in microphone placements by estimating the acoustic pressure at inconvenient locations using a pre-calibrated observation filter (OF), albeit limited to stationary primary acoustic fields. While the OF estimation in varying primary fields can be significantly improved through the recently proposed source decomposition technique, it requires knowledge of the relative source strengths between incoherent primary noise sources. This paper proposes a method for combining the RMT with a new source-localization technique to estimate the source ratio parameter. Unlike traditional source-localization techniques, the proposed method is capable of being implemented in a real-time RMT application. Simulations with measured responses from an open-aperture ANC application showed a good estimation of the source ratio parameter, which allows the observation filter to be modelled in real-time. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 5 pages, 5 figures. Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2023)

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1-5

arXiv:2303.08411 [pdf, other]

A practical distributed active noise control algorithm overcoming communication restrictions

Authors: Junwei Ji, Dongyuan Shi, Zhengding Luo, Xiaoyi Shen, Woon-Seng Gan

Abstract: By assigning the massive computing tasks of the traditional multichannel active noise control (MCANC) system to several distributed control nodes, distributed multichannel active noise control (DMCANC) techniques have become effective global noise reduction solutions with low computational costs. However, existing DMCANC algorithms simply complete the distribution of traditional centralized algori… ▽ More By assigning the massive computing tasks of the traditional multichannel active noise control (MCANC) system to several distributed control nodes, distributed multichannel active noise control (DMCANC) techniques have become effective global noise reduction solutions with low computational costs. However, existing DMCANC algorithms simply complete the distribution of traditional centralized algorithms by combining neighbour nodes' information but rarely consider the degraded control performance and system stability of distributed units caused by delays and interruptions in communication. Hence, this paper develops a novel DMCANC algorithm that utilizes the compensation filters and neighbour nodes' information to counterbalance the cross-talk effect between channels while maintaining independent weight updating. Since the neighbours' information required barely affects the local control filter updating in each node, this approach can tolerate communication delay and interruption to some extent. Numerical simulations demonstrate that the proposed algorithm can achieve satisfactory noise reduction performance and high robustness to real-world communication challenges. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.08397 [pdf, other]

A Momentum Two-gradient Direction Algorithm with Variable Step Size Applied to Solve Practical Output Constraint Issue for Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Abstract: Active noise control (ANC) has been widely utilized to reduce unwanted environmental noise. The primary objective of ANC is to generate an anti-noise with the same amplitude but the opposite phase of the primary noise using the secondary source. However, the effectiveness of the ANC application is impacted by the speaker's output saturation. This paper proposes a two-gradient direction ANC algorit… ▽ More Active noise control (ANC) has been widely utilized to reduce unwanted environmental noise. The primary objective of ANC is to generate an anti-noise with the same amplitude but the opposite phase of the primary noise using the secondary source. However, the effectiveness of the ANC application is impacted by the speaker's output saturation. This paper proposes a two-gradient direction ANC algorithm with a momentum factor to solve the saturation with faster convergence. In order to make it implemented in real-time, a computation-effective variable step size approach is applied to further reduce the steady-state error brought on by the changing gradient directions. The time constant and step size bound for the momentum two-gradient direction algorithm is analyzed. Simulation results show that the proposed algorithm performs effectively in the time-unvaried and time-varied environment. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Paper is submitted and accepted by ICASSP2023

arXiv:2303.08379 [pdf]

doi 10.1109/ICASSP49357.2023.10096536

Implementing Continuous HRTF Measurement in Near-Field

Authors: Ee-Leng Tan, Santi Peksi, Woon-Seng Gan

Abstract: Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and d… ▽ More Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and dependent on directions and distances. Hence, a significant number of HRTF measurements at different distances in the NF would be needed. Using conventional static stop-and-go HRTF measurement methods to acquire these measurements would be time-consuming and tedious for human listeners. In this paper, we propose a continuous measurement system targeted for the NF, and efficiently capturing HRTFs in the horizontal plane within 45 secs. Comparative experiments are performed on head and torso simulator (HATS) and human listeners to evaluate system consistency and robustness. △ Less

Submitted 15 June, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 5 pages, 9 figures, Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2303.08342 [pdf, other]

doi 10.1109/ICASSP49357.2023.10094866

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

Authors: Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan

Abstract: Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural n… ▽ More Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural network, to allow early, mid-level, and late feature fusion of participant-linked, visual, and acoustic features. Ablation studies on module configurations and corresponding fusion methods using the ARAUS dataset show that contextual features improve the model performance in a statistically significant manner on the normalized ISO Pleasantness, to a mean squared error of $0.1194\pm0.0012$ for the best-performing all-modality model, against $0.1217\pm0.0009$ for the audio-only model. Soundscape augmentation systems can thereby leverage multimodal inputs for improved performance. We also investigate the impact of individual participant-linked factors using trained models to illustrate improvements in model explainability. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 5 pages, 2 figures. Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1-5

arXiv:2303.05788 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095205

Deep Generative Fixed-filter Active Noise Control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Junwei Ji, Woon-Seng Gan

Abstract: Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction p… ▽ More Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. Based on deep learning and a perfect-reconstruction filter bank, the GFANC method only requires a few prior data (one pre-trained broadband control filter) to automatically generate suitable control filters for various noises. The efficacy of the GFANC method is demonstrated by numerical simulations on real-recorded noises. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023. Code will be available after publication

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2211.12798 [pdf, other]

An Open Case-based Reasoning Framework for Personalized On-board Driving Assistance in Risk Scenarios

Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu

Abstract: Driver reaction is of vital importance in risk scenarios. Drivers can take correct evasive maneuver at proper cushion time to avoid the potential traffic crashes, but this reaction process is highly experience-dependent and requires various levels of driving skills. To improve driving safety and avoid the traffic accidents, it is necessary to provide all road drivers with on-board driving assistan… ▽ More Driver reaction is of vital importance in risk scenarios. Drivers can take correct evasive maneuver at proper cushion time to avoid the potential traffic crashes, but this reaction process is highly experience-dependent and requires various levels of driving skills. To improve driving safety and avoid the traffic accidents, it is necessary to provide all road drivers with on-board driving assistance. This study explores the plausibility of case-based reasoning (CBR) as the inference paradigm underlying the choice of personalized crash evasive maneuvers and the cushion time, by leveraging the wealthy of human driving experience from the steady stream of traffic cases, which have been rarely explored in previous studies. To this end, in this paper, we propose an open evolving framework for generating personalized on-board driving assistance. In particular, we present the FFMTE model with high performance to model the traffic events and build the case database; A tailored CBR-based method is then proposed to retrieve, reuse and revise the existing cases to generate the assistance. We take the 100-Car Naturalistic Driving Study dataset as an example to build and test our framework; the experiments show reasonable results, providing the drivers with valuable evasive information to avoid the potential crashes in different scenarios. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: 10 pahes, 8 figures, 4 tables, accepted by IEEE BigData 2022

arXiv:2210.14974 [pdf, other]

SINCO: A Novel structural regularizer for image compression using implicit neural representations

Authors: Harry Gao, Weijie Gan, Zhixin Sun, Ulugbek S. Kamilov

Abstract: Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we… ▽ More Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we propose to further improve image quality by using a new structural regularizer. We present structural regularization for INR compression (SINCO) as a novel INR method for image compression. SINCO imposes structural consistency of the compressed images to the groundtruth by using a segmentation network to penalize the discrepancy of segmentation masks predicted from compressed images. We validate SINCO on brain MRI images by showing that it can achieve better performance than some recent INR methods. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.06330 [pdf, other]

CoRRECT: A Deep Unfolding Framework for Motion-Corrected Quantitative R2* Map**

Authors: Xiaojian Xu, Weijie Gan, Satya V. V. N. Kothapalli, Dmitriy A. Yablonskiy, Ulugbek S. Kamilov

Abstract: Quantitative MRI (qMRI) refers to a class of MRI methods for quantifying the spatial distribution of biological tissue parameters. Traditional qMRI methods usually deal separately with artifacts arising from accelerated data acquisition, involuntary physical motion, and magnetic-field inhomogeneities, leading to suboptimal end-to-end performance. This paper presents CoRRECT, a unified deep unfoldi… ▽ More Quantitative MRI (qMRI) refers to a class of MRI methods for quantifying the spatial distribution of biological tissue parameters. Traditional qMRI methods usually deal separately with artifacts arising from accelerated data acquisition, involuntary physical motion, and magnetic-field inhomogeneities, leading to suboptimal end-to-end performance. This paper presents CoRRECT, a unified deep unfolding (DU) framework for qMRI consisting of a model-based end-to-end neural network, a method for motion-artifact reduction, and a self-supervised learning scheme. The network is trained to produce R2* maps whose k-space data matches the real data by also accounting for motion and field inhomogeneities. When deployed, CoRRECT only uses the k-space data without any pre-computed parameters for motion or inhomogeneity correction. Our results on experimentally collected multi-Gradient-Recalled Echo (mGRE) MRI data show that CoRRECT recovers motion and inhomogeneity artifact-free R2* maps in highly accelerated acquisition settings. This work opens the door to DU methods that can integrate physical measurement models, biophysical signal models, and learned prior models for high-quality qMRI. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.03837 [pdf, other]

Self-Supervised Deep Equilibrium Models for Inverse Problems with Theoretical Guarantees

Authors: Weijie Gan, Chunwei Ying, Parna Eshraghi, Tongyao Wang, Cihat Eldeniz, Yuyang Hu, Jiaming Liu, Yasheng Chen, Hongyu An, Ulugbek S. Kamilov

Abstract: Deep equilibrium models (DEQ) have emerged as a powerful alternative to deep unfolding (DU) for image reconstruction. DEQ models-implicit neural networks with effectively infinite number of layers-were shown to achieve state-of-the-art image reconstruction without the memory complexity associated with DU. While the performance of DEQ has been widely investigated, the existing work has primarily fo… ▽ More Deep equilibrium models (DEQ) have emerged as a powerful alternative to deep unfolding (DU) for image reconstruction. DEQ models-implicit neural networks with effectively infinite number of layers-were shown to achieve state-of-the-art image reconstruction without the memory complexity associated with DU. While the performance of DEQ has been widely investigated, the existing work has primarily focused on the settings where groundtruth data is available for training. We present self-supervised deep equilibrium model (SelfDEQ) as the first self-supervised reconstruction framework for training model-based implicit networks from undersampled and noisy MRI measurements. Our theoretical results show that SelfDEQ can compensate for unbalanced sampling across multiple acquisitions and match the performance of fully supervised DEQ. Our numerical results on in-vivo MRI data show that SelfDEQ leads to state-of-the-art performance using only undersampled and noisy training data. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2210.02584 [pdf, other]

SPICER: Self-Supervised Learning for MRI with Automatic Coil Sensitivity Estimation and Reconstruction

Authors: Yuyang Hu, Weijie Gan, Chunwei Ying, Tongyao Wang, Cihat Eldeniz, Jiaming Liu, Yasheng Chen, Hongyu An, Ulugbek S. Kamilov

Abstract: Deep model-based architectures (DMBAs) integrating physical measurement models and learned image regularizers are widely used in parallel magnetic resonance imaging (PMRI). Traditional DMBAs for PMRI rely on pre-estimated coil sensitivity maps (CSMs) as a component of the measurement model. However, estimation of accurate CSMs is a challenging problem when measurements are highly undersampled. Add… ▽ More Deep model-based architectures (DMBAs) integrating physical measurement models and learned image regularizers are widely used in parallel magnetic resonance imaging (PMRI). Traditional DMBAs for PMRI rely on pre-estimated coil sensitivity maps (CSMs) as a component of the measurement model. However, estimation of accurate CSMs is a challenging problem when measurements are highly undersampled. Additionally, traditional training of DMBAs requires high-quality groundtruth images, limiting their use in applications where groundtruth is difficult to obtain. This paper addresses these issues by presenting SPICE as a new method that integrates self-supervised learning and automatic coil sensitivity estimation. Instead of using pre-estimated CSMs, SPICE simultaneously reconstructs accurate MR images and estimates high-quality CSMs. SPICE also enables learning from undersampled noisy measurements without any groundtruth. We validate SPICE on experimentally collected data, showing that it can achieve state-of-the-art performance in highly accelerated data acquisition settings (up to 10x). △ Less

Submitted 6 June, 2024; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2208.08440 [pdf, other]

Performance Evaluation of Selective Fixed-filter Active Noise Control based on Different Convolutional Neural Networks

Authors: Zhengding Luo, Dongyuan Shi, Woon-Seng Gan

Abstract: Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for different types of noise. Deep learning technologi… ▽ More Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for different types of noise. Deep learning technologies, thus, can be used in SFANC methods to enable a more flexible selection of the most appropriate control filters for attenuating various noises. Furthermore, with the assistance of a deep neural network, the selecting strategy can be learned automatically from noise data rather than through trial and error, which significantly simplifies and improves the practicability of ANC design. Therefore, this paper investigates the performance of SFANC based on different one-dimensional and two-dimensional convolutional neural networks. Additionally, we conducted comparative analyses of several network training strategies and discovered that fine-tuning could improve selection performance. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2208.08082

arXiv:2208.08086 [pdf, other]

Implementation of Multi-channel Active Noise Control based on Back-propagation Mechanism

Authors: Zhengding Luo, Dongyuan Shi, Junwei Ji, Woon-seng Gan

Abstract: Active noise control (ANC) systems can efficiently attenuate low-frequency noises by introducing anti-noises to combine with the unwanted noises. In ANC systems, the filtered-x least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are well-known algorithms for adaptively adjusting control filters. Multi-channel ANC systems are typically required to attenuate unwa… ▽ More Active noise control (ANC) systems can efficiently attenuate low-frequency noises by introducing anti-noises to combine with the unwanted noises. In ANC systems, the filtered-x least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are well-known algorithms for adaptively adjusting control filters. Multi-channel ANC systems are typically required to attenuate unwanted noises in a large space. However, open-source implementations of the multi-channel FxLMS (McFxLMS) and multi-channel FxNLMS (McFxNLMS) algorithm continue to be scarce. Therefore, this paper proposes a simple and effective implementation approach of the McFxLMS and McFxNLMS algorithm. Motivated by the back-propagation process during neural network training, the McFxLMS and McFxNLMS algorithm can be implemented via automatic derivation mechanism. We implemented the two algorithms using the automatic derivation mechanism in PyTorch and made the source code available on GitHub. This implementation method can improve the practicality of multi-channel ANC systems, which is expected to be widely used in ANC applications. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.08082 [pdf, other]

doi 10.1109/LSP.2022.3169428

A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning

Authors: Zhengding Luo, Dongyuan Shi, Woon-Seng Gan

Abstract: The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors… ▽ More The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors through adaptive optimization. Nonetheless, its slow convergence has a detrimental effect on dynamic noise attenuation. Therefore, this paper proposes a hybrid SFANC-FxNLMS approach to overcome the adaptive algorithm's slow convergence and provide a better noise reduction level than the SFANC method. A lightweight one-dimensional convolutional neural network (1D CNN) is designed to automatically select the most suitable pre-trained control filter for each frame of the primary noise. Meanwhile, the FxNLMS algorithm continues to update the coefficients of the chosen pre-trained control filter at the sampling rate. Owing to the effective combination of the two algorithms, experimental results show that the hybrid SFANC-FxNLMS algorithm can achieve a rapid response time, a low noise reduction error, and a high degree of robustness. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Report number: Vol.29, p.1102-1106

Journal ref: IEEE Signal Processing Letters, 2022

arXiv:2207.12899 [pdf, other]

Assessment of a cost-effective headphone calibration procedure for soundscape evaluations

Authors: Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Trevor Wong, Woon-Seng Gan

Abstract: To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from… ▽ More To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from the intended equivalent continuous A-weighted sound pressure levels ($L_{\text{A,eq}}$) using the open-circuit voltage (OCV) calibration procedure. For a more holistic human-centric perspective, the OCV method is further investigated here in terms of psychoacoustic parameters, including relevant exceedance levels to account for temporal effects on the same 27 stimuli from the SATP. Moreover, a within-subjects experiment with 36 participants was conducted to examine the effects of OCV calibration on the PAQ attributes in ISO/TS~12913-2:2018. Bland-Altman analysis of the objective indicators revealed large biases in the OCV method across all weighted sound level and loudness indicators; and roughness indicators at \SI{5}{\%} and \SI{10}{\%} exceedance levels. Significant perceptual differences due to the OCV method were observed in about \SI{20}{\%} of the stimuli, which did not correspond clearly with the biased acoustic indicators. A cautioned interpretation of the objective and perceptual differences due to small and unpaired samples nevertheless provide grounds for further investigation. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: For 24th International Congress on Acoustics

Journal ref: in Proc. 24th Int. Congr. Acoust., 2022, pp. 1-8

arXiv:2207.09221 [pdf, other]

Do uHear? Validation of uHear App for Preliminary Screening of Hearing Ability in Soundscape Studies

Authors: Zhen-Ting Ong, Bhan Lam, Kenneth Ooi, Karn N. Watcharasupat, Trevor Wong, Woon-Seng Gan

Abstract: Studies involving soundscape perception often exclude participants with hearing loss to prevent impaired perception from affecting experimental results. Participants are typically screened with pure tone audiometry, the "gold standard" for identifying and quantifying hearing loss at specific frequencies, and excluded if a study-dependent threshold is not met. However, procuring professional audiom… ▽ More Studies involving soundscape perception often exclude participants with hearing loss to prevent impaired perception from affecting experimental results. Participants are typically screened with pure tone audiometry, the "gold standard" for identifying and quantifying hearing loss at specific frequencies, and excluded if a study-dependent threshold is not met. However, procuring professional audiometric equipment for soundscape studies may be cost-ineffective, and manually performing audiometric tests is labour-intensive. Moreover, testing requirements for soundscape studies may not require sensitivities and specificities as high as that in a medical diagnosis setting. Hence, in this study, we investigate the effectiveness of the uHear app, an iOS application, as an affordable and automatic alternative to a conventional audiometer in screening participants for hearing loss for the purpose of soundscape studies or listening tests in general. Based on audiometric comparisons with the audiometer of 163 participants, the uHear app was found to have high precision (98.04%) when using the World Health Organization (WHO) grading scheme for assessing normal hearing. Precision is further improved (98.69%) when all frequencies assessed with the uHear app is considered in the grading, which lends further support to this cost-effective, automated alternative to screen for normal hearing. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: Full paper submitted to 24th International Congress on Acoustics

arXiv:2207.01078 [pdf, other]

doi 10.1109/TAFFC.2023.3247914

ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes

Authors: Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Joo Young Hong, Woon-Seng Gan

Abstract: Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which… ▽ More Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature. △ Less

Submitted 5 March, 2023; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: [v1, v2] 25 pages, 11 figures. [v3] 33 pages, 18 figures. v3 updated with changes made after peer review. in IEEE Transactions on Affective Computing, 2023

Journal ref: IEEE Trans. Affect. Comput., pp. 1-17, 2023

arXiv:2206.07293 [pdf, other]

FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

Authors: Shengkui Zhao, Bin Ma, Karn N. Watcharasupat, Woon-Seng Gan

Abstract: Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder… ▽ More Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder (CRED) structure to boost feature representation along the frequency axis. The CRED applies frequency recurrence on 3D convolutional feature maps along the frequency axis following each convolution, therefore, it is capable of catching long-range frequency correlations and enhancing feature representations of speech inputs. The proposed frequency recurrence is realized efficiently using a feedforward sequential memory network (FSMN). Besides the CRED, we insert two stacked FSMN layers between the encoder and the decoder to model further temporal dynamics. We name the proposed framework as Frequency Recurrent CRN (FRCRN). We design FRCRN to predict complex Ideal Ratio Mask (cIRM) in complex-valued domain and optimize FRCRN using both time-frequency-domain and time-domain losses. Our proposed approach achieved state-of-the-art performance on wideband benchmark datasets and achieved 2nd place for the real-time fullband track in terms of Mean Opinion Score (MOS) and Word Accuracy (WAcc) in the ICASSP 2022 Deep Noise Suppression (DNS) challenge (https://github.com/alibabasglab/FRCRN). △ Less

Submitted 24 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables

arXiv:2206.04264 [pdf, other]

Formation Tracking for a Multi-Auv System Based on an Adaptive Sliding Mode Method in the Water Flow Environment

Authors: Xin Li, Daqi Zhu, Bing Sun, Qi Chen, Wenyang Gan, Zhigang Li

Abstract: In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the im… ▽ More In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the improved sliding mode method. A second order sliding mode control method is adopted to eliminate the chatting phenomenon of the controller. Thirdly, considering the water flow in the underwater working environment of the AUVs, an adaptive module is added to the controller. With the adaptive approach, the finite disturbances caused by water flow could be handled with the controller. The proposed method achieves stability by substituting an adaptive continuous term for the switching term in the controller. At last, a robust sliding mode controller with continuous model predictive control strategy for the multi-AUV system is developed to achieve leader-follower formation tracking under the presence of bounded flow disturbances, and simulations are implemented to confirm the effectiveness of the proposed method. △ Less

Submitted 17 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

arXiv:2206.03112 [pdf]

doi 10.3390/su14127485

Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering

Authors: Kenneth Ooi, Bhan Lam, Joo Young Hong, Karn N. Watcharasupat, Zhen-Ting Ong, Woon-Seng Gan

Abstract: The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and… ▽ More The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and improve result reliability. Hence, we propose a robust participant-led method to pinpoint characteristic soundscapes possessing arbitrary perceptual attributes. We validate our method by identifying Singaporean soundscapes spanning the perceptual quadrants generated from the "Pleasantness" and "Eventfulness" axes of the ISO 12913-2 circumplex model of soundscape perception, as perceived by local experts. From memory and experience, 67 participants first selected locations corresponding to each perceptual quadrant in each major planning region of Singapore. We then performed weighted k-means clustering on the selected locations, with weights for each location derived from previous frequencies and durations spent in each location by each participant. Weights hence acted as proxies for participant confidence. In total, 62 locations were thereby identified as suitable locations with characteristic soundscapes for further research utilizing the ISO 12913-2 perceptual quadrants. Audio-visual recordings and acoustic characterization of the soundscapes will be made in a future study. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 23 pages, 8 figures. Submitted to Sustainability

Journal ref: MDPI Sustainability. 2022; 14(12):7485

arXiv:2205.13051 [pdf, other]

Online Deep Equilibrium Learning for Regularization by Denoising

Authors: Jiaming Liu, Xiaojian Xu, Weijie Gan, Shirin Shoushtari, Ulugbek S. Kamilov

Abstract: Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-en… ▽ More Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-end optimal. The recent Deep Equilibrium Models (DEQ) framework has enabled memory-efficient end-to-end learning of PnP/RED priors by implicitly differentiating through the fixed-point equations without storing intermediate activation values. However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications. We propose ODER as a new strategy for improving the efficiency of DEQ through stochastic approximations of the measurement models. We theoretically analyze ODER giving insights into its convergence and ability to approximate the traditional DEQ approach. Our numerical results suggest the potential improvements in training/testing complexity due to ODER on three distinct imaging applications. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 28 pages, 8 figures

arXiv:2205.04728 [pdf, other]

Preliminary assessment of a cost-effective headphone calibration procedure for soundscape evaluations

Authors: Bhan Lam, Kenneth Ooi, Karn N. Watcharasupat, Zhen-Ting Ong, Yun-Ting Lau, Trevor Wong, Woon-Seng Gan

Abstract: The introduction of ISO 12913-2:2018 has provided a framework for standardized data collection and reporting procedures for soundscape practitioners. A strong emphasis was placed on the use of calibrated head and torso simulators (HATS) for binaural audio capture to obtain an accurate subjective impression and acoustic measure of the soundscape under evaluation. To auralise the binaural recordings… ▽ More The introduction of ISO 12913-2:2018 has provided a framework for standardized data collection and reporting procedures for soundscape practitioners. A strong emphasis was placed on the use of calibrated head and torso simulators (HATS) for binaural audio capture to obtain an accurate subjective impression and acoustic measure of the soundscape under evaluation. To auralise the binaural recordings as recorded or at set levels, the audio stimuli and the headphone setup are usually calibrated with a HATS. However, calibrated HATS are too financially prohibitive for most research teams, inevitably diminishing the availability of the soundscape standard. With the increasing availability of soundscape binaural recording datasets, and the importance of cross-cultural validation of the soundscape ISO standards, e.g.\ via the Soundscape Attributes Translation Project (SATP), it is imperative to assess the suitability of cost-effective headphone calibration methods to maximise availability without severely compromising on accuracy. Hence, this study objectively examines an open-circuit voltage (OCV) calibration method in comparison to a calibrated HATS on various soundcard and headphone combinations. Preliminary experiments found that calibration with the OCV method differed significantly from the reference binaural recordings in sound pressure levels, whereas negligible differences in levels were observed with the HATS calibration. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: Submitted to the 28th International Congress on Sound and Vibration

arXiv:2204.13890 [pdf, other]

doi 10.3397/IN_2022_0290

Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation

Authors: Trevor Wong, Karn N. Watcharasupat, Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Furi Andi Karnapi, Woon-Seng Gan

Abstract: Soundscape augmentation is an emerging approach for noise mitigation by introducing additional sounds known as "maskers" to increase acoustic comfort. Traditionally, the choice of maskers is often predicated on expert guidance or post-hoc analysis which can be time-consuming and sometimes arbitrary. Moreover, this often results in a static set of maskers that are inflexible to the dynamic nature o… ▽ More Soundscape augmentation is an emerging approach for noise mitigation by introducing additional sounds known as "maskers" to increase acoustic comfort. Traditionally, the choice of maskers is often predicated on expert guidance or post-hoc analysis which can be time-consuming and sometimes arbitrary. Moreover, this often results in a static set of maskers that are inflexible to the dynamic nature of real-world acoustic environments. Overcoming the inflexibility of traditional soundscape augmentation is twofold. First, given a snapshot of a soundscape, the system must be able to select an optimal masker without human supervision. Second, the system must also be able to react to changes in the acoustic environment with near real-time latency. In this work, we harness the combined prowess of cloud computing and the Internet of Things (IoT) to allow in-situ listening and playback using microcontrollers while delegating computationally expensive inference tasks to the cloud. In particular, a serverless cloud architecture was used for inference, ensuring near real-time latency and scalability without the need to provision computing resources. A working prototype of the system is currently being deployed in a public area experiencing high traffic noise, as well as undergoing public evaluation for future improvements. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: To be presented at the 51st International Congress and Exposition on Noise Control Engineering

Journal ref: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Feb. 2022, vol. 265, no. 5, pp. 2013-2021

arXiv:2204.13883 [pdf, other]

doi 10.1109/LSP.2022.3194419

Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

Authors: Karn N. Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, Woon-Seng Gan

Abstract: The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and l… ▽ More The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and labour-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to the dynamic nature of real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker and gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference time, as well as the tedious pre-calibration process required for new maskers. The proposed system was validated on a large-scale dataset of subjective responses to augmented soundscapes with more than 440 participants, ensuring the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level. △ Less

Submitted 23 July, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Journal ref: IEEE Signal Processing Letters, Vol. 29, pp. 1749 - 1753, 2022

arXiv:2204.04771 [pdf, other]

Image Reconstruction for MRI using Deep CNN Priors Trained without Groundtruth

Authors: Weijie Gan, Cihat Eldeniz, Jiaming Liu, Sihao Chen, Hongyu An, Ulugbek S. Kamilov

Abstract: We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors. Our prior is specified through a convolutional neural network (CNN) trained without any artifact-free ground truth to remove undersampling artifacts from MR images. The results on reconstructing free-breathing MRI data into ten r… ▽ More We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors. Our prior is specified through a convolutional neural network (CNN) trained without any artifact-free ground truth to remove undersampling artifacts from MR images. The results on reconstructing free-breathing MRI data into ten respiratory phases show that the method can form high-quality 4D images from severely undersampled measurements corresponding to acquisitions of about 1 and 2 minutes in length. The results also highlight the competitive performance of the method compared to several popular alternatives, including the TGV regularization and traditional UNet3D. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Showing 1–50 of 72 results for author: Gan, W