-
Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm
Authors:
Boxiang Wang,
Junwei Ji,
Xiaoyi Shen,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven…
▽ More
Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data
Authors:
Huidong Xie,
Weijie Gan,
Bo Zhou,
Ming-Kai Chen,
Michal Kulon,
Annemarie Boustani,
Benjamin A. Spencer,
Reimund Bayerlein,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Menghua Xia,
Yinchi Zhou,
Hui Liu,
Liang Guo,
Hongyu An,
Ulugbek S. Kamilov,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Ge Wang,
Ramsey D. Badawi,
Chi Liu
Abstract:
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi…
▽ More
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
A Survey of Integrating Wireless Technology into Active Noise Control
Authors:
Xiaoyi Shen,
Dongyuan Shi,
Zhengding Luo,
Junwei Ji,
Woon-Seng Gan
Abstract:
Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead…
▽ More
Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead of using microphone arrays, which increase the computation complexity of the ANC system, to isolate multiple noise sources to improve noise reduction performance, the application of the wireless technique avoids extra computation demand. Wireless transmissions of reference, error, and control signals are also applied to improve the convergence performance of the ANC system. Furthermore, this paper lists some wireless ANC applications, such as earbuds, headphones, windows, and headrests, underscoring their adaptability and efficiency in various settings.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator
Authors:
Xin Li,
Wenyang Gan,
Pang Wen,
Daqi Zhu
Abstract:
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network meth…
▽ More
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
△ Less
Submitted 24 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic Model
Authors:
Weijie Gan,
Huidong Xie,
Carl von Gall,
Günther Platsch,
Michael T. Jurkiewicz,
Andrea Andrade,
Udunna C. Anazodo,
Ulugbek S. Kamilov,
Hongyu An,
Jorge Cabello
Abstract:
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET re…
▽ More
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Unsupervised learning based end-to-end delayless generative fixed-filter active noise control
Authors:
Zhengding Luo,
Dongyuan Shi,
Xiaoyi Shen,
Woon-Seng Gan
Abstract:
Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro…
▽ More
Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. During training, the co-processor and real-time controller are integrated into an end-to-end differentiable ANC system. This enables us to use the accumulated squared error signal as the loss for training the 1D CNN. With this unsupervised learning paradigm, the unsupervised-GFANC method not only omits the labelling process but also exhibits better noise reduction performance compared to the supervised GFANC method in real noise experiments.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
Authors:
Jisheng Bai,
Mou Wang,
Haohe Liu,
Han Yin,
Yafei Jia,
Siwei Huang,
Yutong Du,
Dongzhe Zhang,
Dongyuan Shi,
Woon-Seng Gan,
Mark D. Plumbley,
Susanto Rahardja,
Bin Xiang,
Jianfeng Chen
Abstract:
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug…
▽ More
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.
△ Less
Submitted 28 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification
Authors:
Haitao Gan,
Lingchao Fu,
Ran Zhou,
Weiyan Gan,
Furong Wang,
Xiaoyan Wu,
Zhi Yang,
Zhongwei Huang
Abstract:
The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro…
▽ More
The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this approach relies on obtaining a substantial amount of challenging-to-acquire segmentation annotations. This paper proposes a novel weakly supervised auxiliary task learning network model (WAL-Net) to explore the interdependence between carotid plaque classification and segmentation tasks. The plaque classification task is primary task, while the plaque segmentation task serves as an auxiliary task, providing valuable information to enhance the performance of the primary task. Weakly supervised learning is adopted in the auxiliary task to completely break away from the dependence on segmentation annotations. Experiments and evaluations are conducted on a dataset comprising 1270 carotid plaque ultrasound images from Wuhan University Zhongnan Hospital. Results indicate that the proposed method achieved an approximately 1.3% improvement in carotid plaque classification accuracy compared to the baseline network. Specifically, the accuracy of mixed-echoic plaques classification increased by approximately 3.3%, demonstrating the effectiveness of our approach.
△ Less
Submitted 27 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music
Authors:
Han Yin,
Mou Wang,
Jisheng Bai,
Dongyuan Shi,
Woon-Seng Gan,
Jianfeng Chen
Abstract:
This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.
This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
DiffGEPCI: 3D MRI Synthesis from mGRE Signals using 2.5D Diffusion Model
Authors:
Yuyang Hu,
Satya V. V. N. Kothapalli,
Weijie Gan,
Alexander L. Sukstanskii,
Gregory F. Wu,
Manu Goyal,
Dmitriy A. Yablonskiy,
Ulugbek S. Kamilov
Abstract:
We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI…
▽ More
We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI signals as conditional inputs. DiffGEPCI operates in a two-step fashion: it initially estimates a 3D volume slice-by-slice using the axial plane and subsequently applies a refinement algorithm (referred to as 2.5D) to enhance the quality of the coronal and sagittal planes. Experimental validation on real mGRE data shows that DiffGEPCI achieves excellent performance, surpassing generative adversarial networks (GANs) and traditional diffusion models.
△ Less
Submitted 18 April, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration
Authors:
Zihao Zou,
Jiaming Liu,
Shirin Shoushtari,
Yubo Wang,
Weijie Gan,
Ulugbek S. Kamilov
Abstract:
Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces.…
▽ More
Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection
Authors:
Han Yin,
Jisheng Bai,
Mou Wang,
Dongyuan Shi,
Woon-Seng Gan,
Jianfeng Chen
Abstract:
Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-…
▽ More
Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. The SIM is initially generated through a statistical approach, referred as SIM-V1. However, the fixed artificial mask may mismatch the SED model, resulting in limited effectiveness. Therefore, we further propose SIM-V2, which employs a word embedding model for adaptive SIM estimation. Experimental results show that the proposed IDC module can effectively utilize the information from soft labels, and the integration of SIM-V1 can further improve the accuracy. In addition, the impact of different word embedding dimensions on SIM-V2 is explored, and the results show that the appropriate dimension can enable SIM-V2 achieve superior performance than SIM-V1. In DCASE 2023 Challenge Task4B, the proposed system achieved the top ranking performance on the evaluation dataset of MAESTRO Real.
△ Less
Submitted 7 December, 2023; v1 submitted 23 November, 2023;
originally announced November 2023.
-
AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning
Authors:
Jisheng Bai,
Han Yin,
Mou Wang,
Dongyuan Shi,
Woon-Seng Gan,
Jianfeng Chen,
Susanto Rahardja
Abstract:
Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema…
▽ More
Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-semantic audio Transformer by incorporating contrastive learning between hybrid acoustic representations. We then leverage LLMs to generate audio logs that summarize textual descriptions of the acoustic environment. Finally, we evaluate the AudioLog system on two datasets with both scene and event annotations. Experiments show that the proposed system achieves exceptional performance in acoustic scene classification and sound event detection, surpassing existing methods in the field. Further analysis of the prompts to LLMs demonstrates that AudioLog can effectively summarize long audio sequences. To the best of our knowledge, this approach is the first attempt to leverage LLMs for summarizing long audio sequences.
△ Less
Submitted 4 January, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging
Authors:
Huidong Xie,
Weijie Gan,
Bo Zhou,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Liang Guo,
Hongyu An,
Ulugbek S. Kamilov,
Ge Wang,
Chi Liu
Abstract:
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image…
▽ More
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image reconstructions due to the memory burden. Directly stacking 2D slices together to create 3D image volumes would results in severe inconsistencies between slices. Previous works tried to either apply a penalty term along the z-axis to remove inconsistencies or reconstruct the 3D image volumes with 2 pre-trained perpendicular 2D diffusion models. Nonetheless, these previous methods failed to produce satisfactory results in challenging cases for PET image denoising. In addition to administered dose, the noise levels in PET images are affected by several other factors in clinical settings, e.g. scan time, medical history, patient size, and weight, etc. Therefore, a method to simultaneously denoise PET images with different noise-levels is needed. Here, we proposed a Dose-aware Diffusion model for 3D low-dose PET imaging (DDPET-3D) to address these challenges. We extensively evaluated DDPET-3D on 100 patients with 6 different low-dose levels (a total of 600 testing studies), and demonstrated superior performance over previous diffusion models for 3D imaging problems as well as previous noise-aware medical image denoising models. The code is available at: https://github.com/xxx/xxx.
△ Less
Submitted 28 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
A Structured Pruning Algorithm for Model-based Deep Learning
Authors:
Chicago Park,
Weijie Gan,
Zihao Zou,
Yuyang Hu,
Zhixin Sun,
Ulugbek S. Kamilov
Abstract:
There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits the…
▽ More
There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits their applicability in certain large-scale applications. We address this issue by presenting structured pruning algorithm for model-based deep learning (SPADE) as the first structured pruning algorithm for MBDL networks. SPADE reduces the computational complexity of CNNs used within MBDL networks by pruning its non-essential weights. We propose three distinct strategies to fine-tune the pruned MBDL networks to minimize the performance loss. Each fine-tuning strategy has a unique benefit that depends on the presence of a pre-trained model and a high-quality ground truth. We validate SPADE on two distinct inverse problems, namely compressed sensing MRI and image super-resolution. Our results highlight that MBDL models pruned by SPADE can achieve substantial speed up in testing time while maintaining competitive performance.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction
Authors:
Weijie Gan,
Qiuchen Zhai,
Michael Thompson McCann,
Cristina Garcia Cardona,
Ulugbek S. Kamilov,
Brendt Wohlberg
Abstract:
Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In…
▽ More
Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance.
△ Less
Submitted 6 March, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
A Plug-and-Play Image Registration Network
Authors:
Junhao Hu,
Weijie Gan,
Zhixin Sun,
Hongyu An,
Ulugbek S. Kamilov
Abstract:
Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in develo** DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclu…
▽ More
Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in develo** DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclusively relies on a pre-trained CNN without explicitly enforcing fidelity between the registered image and the reference. We present plug-and-play image registration network (PIRATE) as a new DIR method that addresses this issue by integrating an explicit data-fidelity penalty and a CNN prior. PIRATE pre-trains a CNN denoiser on the registration field and "plugs" it into an iterative method as a regularizer. We additionally present PIRATE+ that fine-tunes the CNN prior in PIRATE using deep equilibrium models (DEQ). PIRATE+ interprets the fixed-point iteration of PIRATE as a network with effectively infinite layers and then trains the resulting network end-to-end, enabling it to learn more task-specific information and boosting its performance. Our numerical results on OASIS and CANDI datasets show that our methods achieve state-of-the-art performance on DIR.
△ Less
Submitted 19 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Preliminary investigation of the short-term in situ performance of an automatic masker selection system
Authors:
Bhan Lam,
Zhen-Ting Ong,
Kenneth Ooi,
Wen-Hui Ong,
Trevor Wong,
Karn N. Watcharasupat,
Woon-Seng Gan
Abstract:
Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an…
▽ More
Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an automatic masker selection system (AMSS) was recently developed. The AMSS uses a deep-learning model trained on a large-scale dataset of subjective responses to maximize the derived ISO pleasantness (ISO 12913-2). Hence, this study investigates the short-term in situ performance of the AMSS implemented in a gazebo in an urban park. Firstly, the predicted ISO pleasantness from the AMSS is evaluated in comparison to the in situ subjective evaluation scores. Secondly, the effect of various masker selection schemes on the perceived affective quality and appropriateness would be evaluated. In total, each participant evaluated 6 conditions: (1) ambient environment with no maskers; (2) AMSS; (3) bird and (4) water masker from prior art; (5) random selection from same pool of maskers used to train the AMSS; and (6) selection of best-performing maskers based on the analysis of the dataset used to train the AMSS.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm
Authors:
Dongyuan Shi,
Woon-Seng Gan,
Bhan Lam,
Shulin Wen,
Xiaoyi Shen
Abstract:
Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal…
▽ More
Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of dealing with quickly varying disturbances, such as piling noise. Furthermore, the noise power variation also deteriorates the robustness of the algorithm when it adopts the fixed step size. To solve these issues, we integrated the normalized multichannel FxLMS with the momentum method, which hence, effectively avoids the interference of the primary noise power and accelerates the convergence of the algorithm. To validate its effectiveness, we deployed this algorithm in a multichannel noise control window to control the real machine noise.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Practical Active Noise Control: Restriction of Maximum Output Power
Authors:
Woon-Seng Gan,
Dongyuan Shi,
Xiaoyi Shen
Abstract:
This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. Syst…
▽ More
This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. System divergence can occur when the secondary speaker units are over-amplified or when there is a disturbance other than the noise to be controlled. Disturbance rejection is important to prevent the adaptive system from adapting to unwanted signals. The paper provides guidelines for implementing and operating real-time AANC systems based on these algorithms.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Anti-noise window: Subjective perception of active noise reduction and effect of informational masking
Authors:
Bhan Lam,
Kelvin Chee Quan Lim,
Kenneth Ooi,
Zhen-Ting Ong,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines…
▽ More
Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
A Computation-efficient Online Secondary Path Modeling Technique for Modified FXLMS Algorithm
Authors:
Junwei Ji,
Dongyuan Shi,
Woon-Seng Gan,
Xiaoyi Shen,
Zhengding Luo
Abstract:
This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM…
▽ More
This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM, eliminating the use of destabilizing components such as auxiliary white noise or additional filters, which can negatively impact the complexity, stability, and noise reduction performance of the ANC system. The system operates in adaptive ANC mode until divergence is detected due to secondary path changes. At this moment, it switches to SPM mode until the path is remodeled and then returns to ANC mode. Furthermore, numerical simulations in the paper demonstrate that the proposed online technique effectively copes with the secondary path variations.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
MOV-Modified-FxLMS algorithm with Variable Penalty Factor in a Practical Power Output Constrained Active Control System
Authors:
Chung Kwan Lai,
Dongyuan Shi,
Bhan Lam,
Woon-Seng Gan
Abstract:
Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offli…
▽ More
Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offline knowledge of disturbance power and secondary path gain. The constant penalty factor in MOV-FxLMS is also susceptible to variations in disturbance power that could cause output power constraint violations. This paper presents a new variable penalty factor that utilizes the estimated disturbance in the established Modified-FxLMS (MFxLMS) algorithm, resulting in a computationally efficient MOV-MFxLMS algorithm that can adapt to changes in disturbance levels in real-time. Numerical simulation with real noise and plant response showed that the variable penalty factor always manages to meet its maximum power output constraint despite sudden changes in disturbance power, whereas the fixed penalty factor has suffered from a constraint mismatch.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Active Noise Control in The New Century: The Role and Prospect of Signal Processing
Authors:
Dongyuan Shi,
Bhan Lam,
Woon-Seng Gan,
Jordan Cheer,
Stephen J. Elliott
Abstract:
Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespr…
▽ More
Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespread commercial success of ANC in aircraft cabins, automobile cabins, and headsets demonstrates the immeasurable public health and economic benefits of ANC. This article continues where Elliott and Nelson's 1993 Signal Processing Magazine article and Elliott's 1997 50th anniversary commentary on ANC left off, tracing the technical developments and applications in ANC spurred by the seminal texts of Nelson and Elliott (1991), Kuo and Morgan (1996), Hansen and Snyder (1996), and Elliott (2001) since the turn of the century. This article focuses on technical developments pertaining to real-world implementations, such as improving algorithmic convergence, reducing system latency, and extending control to non-stationary and/or broadband noise, as well as the commercial transition challenges from analog to digital ANC systems. Finally, open issues and the future of ANC in the era of artificial intelligence are discussed.
△ Less
Submitted 6 July, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Block Coordinate Plug-and-Play Methods for Blind Inverse Problems
Authors:
Weijie Gan,
Shirin Shoushtari,
Yuyang Hu,
Jiaming Liu,
Hongyu An,
Ulugbek S. Kamilov
Abstract:
Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a…
▽ More
Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a new block-coordinate PnP (BC-PnP) method that efficiently solves this joint estimation problem by introducing learned denoisers as priors on both the unknown image and the unknown measurement operator. We present a new convergence theory for BC-PnP compatible with blind inverse problems by considering nonconvex data-fidelity terms and expansive denoisers. Our theory analyzes the convergence of BC-PnP to a stationary point of an implicit function associated with an approximate minimum mean-squared error (MMSE) denoiser. We numerically validate our method on two blind inverse problems: automatic coil sensitivity estimation in magnetic resonance imaging (MRI) and blind image deblurring. Our results show that BC-PnP provides an efficient and principled framework for using denoisers as PnP priors for jointly estimating measurement operators and images.
△ Less
Submitted 26 October, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Real-time modelling of observation filter in the Remote Microphone Technique for an Active Noise Control application
Authors:
Chung Kwan Lai,
Bhan Lam,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
The remote microphone technique (RMT) is often used in active noise control (ANC) applications to overcome design constraints in microphone placements by estimating the acoustic pressure at inconvenient locations using a pre-calibrated observation filter (OF), albeit limited to stationary primary acoustic fields. While the OF estimation in varying primary fields can be significantly improved throu…
▽ More
The remote microphone technique (RMT) is often used in active noise control (ANC) applications to overcome design constraints in microphone placements by estimating the acoustic pressure at inconvenient locations using a pre-calibrated observation filter (OF), albeit limited to stationary primary acoustic fields. While the OF estimation in varying primary fields can be significantly improved through the recently proposed source decomposition technique, it requires knowledge of the relative source strengths between incoherent primary noise sources. This paper proposes a method for combining the RMT with a new source-localization technique to estimate the source ratio parameter. Unlike traditional source-localization techniques, the proposed method is capable of being implemented in a real-time RMT application. Simulations with measured responses from an open-aperture ANC application showed a good estimation of the source ratio parameter, which allows the observation filter to be modelled in real-time.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
A practical distributed active noise control algorithm overcoming communication restrictions
Authors:
Junwei Ji,
Dongyuan Shi,
Zhengding Luo,
Xiaoyi Shen,
Woon-Seng Gan
Abstract:
By assigning the massive computing tasks of the traditional multichannel active noise control (MCANC) system to several distributed control nodes, distributed multichannel active noise control (DMCANC) techniques have become effective global noise reduction solutions with low computational costs. However, existing DMCANC algorithms simply complete the distribution of traditional centralized algori…
▽ More
By assigning the massive computing tasks of the traditional multichannel active noise control (MCANC) system to several distributed control nodes, distributed multichannel active noise control (DMCANC) techniques have become effective global noise reduction solutions with low computational costs. However, existing DMCANC algorithms simply complete the distribution of traditional centralized algorithms by combining neighbour nodes' information but rarely consider the degraded control performance and system stability of distributed units caused by delays and interruptions in communication. Hence, this paper develops a novel DMCANC algorithm that utilizes the compensation filters and neighbour nodes' information to counterbalance the cross-talk effect between channels while maintaining independent weight updating. Since the neighbours' information required barely affects the local control filter updating in each node, this approach can tolerate communication delay and interruption to some extent. Numerical simulations demonstrate that the proposed algorithm can achieve satisfactory noise reduction performance and high robustness to real-world communication challenges.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
A Momentum Two-gradient Direction Algorithm with Variable Step Size Applied to Solve Practical Output Constraint Issue for Active Noise Control
Authors:
Xiaoyi Shen,
Dongyuan Shi,
Zhengding Luo,
Junwei Ji,
Woon-Seng Gan
Abstract:
Active noise control (ANC) has been widely utilized to reduce unwanted environmental noise. The primary objective of ANC is to generate an anti-noise with the same amplitude but the opposite phase of the primary noise using the secondary source. However, the effectiveness of the ANC application is impacted by the speaker's output saturation. This paper proposes a two-gradient direction ANC algorit…
▽ More
Active noise control (ANC) has been widely utilized to reduce unwanted environmental noise. The primary objective of ANC is to generate an anti-noise with the same amplitude but the opposite phase of the primary noise using the secondary source. However, the effectiveness of the ANC application is impacted by the speaker's output saturation. This paper proposes a two-gradient direction ANC algorithm with a momentum factor to solve the saturation with faster convergence. In order to make it implemented in real-time, a computation-effective variable step size approach is applied to further reduce the steady-state error brought on by the changing gradient directions. The time constant and step size bound for the momentum two-gradient direction algorithm is analyzed. Simulation results show that the proposed algorithm performs effectively in the time-unvaried and time-varied environment.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Implementing Continuous HRTF Measurement in Near-Field
Authors:
Ee-Leng Tan,
Santi Peksi,
Woon-Seng Gan
Abstract:
Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and d…
▽ More
Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and dependent on directions and distances. Hence, a significant number of HRTF measurements at different distances in the NF would be needed. Using conventional static stop-and-go HRTF measurement methods to acquire these measurements would be time-consuming and tedious for human listeners. In this paper, we propose a continuous measurement system targeted for the NF, and efficiently capturing HRTFs in the horizontal plane within 45 secs. Comparative experiments are performed on head and torso simulator (HATS) and human listeners to evaluate system consistency and robustness.
△ Less
Submitted 15 June, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs
Authors:
Kenneth Ooi,
Karn N. Watcharasupat,
Bhan Lam,
Zhen-Ting Ong,
Woon-Seng Gan
Abstract:
Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural n…
▽ More
Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural network, to allow early, mid-level, and late feature fusion of participant-linked, visual, and acoustic features. Ablation studies on module configurations and corresponding fusion methods using the ARAUS dataset show that contextual features improve the model performance in a statistically significant manner on the normalized ISO Pleasantness, to a mean squared error of $0.1194\pm0.0012$ for the best-performing all-modality model, against $0.1217\pm0.0009$ for the audio-only model. Soundscape augmentation systems can thereby leverage multimodal inputs for improved performance. We also investigate the impact of individual participant-linked factors using trained models to illustrate improvements in model explainability.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Deep Generative Fixed-filter Active Noise Control
Authors:
Zhengding Luo,
Dongyuan Shi,
Xiaoyi Shen,
Junwei Ji,
Woon-Seng Gan
Abstract:
Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction p…
▽ More
Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. Based on deep learning and a perfect-reconstruction filter bank, the GFANC method only requires a few prior data (one pre-trained broadband control filter) to automatically generate suitable control filters for various noises. The efficacy of the GFANC method is demonstrated by numerical simulations on real-recorded noises.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
An Open Case-based Reasoning Framework for Personalized On-board Driving Assistance in Risk Scenarios
Authors:
Wenbin Gan,
Minh-Son Dao,
Koji Zettsu
Abstract:
Driver reaction is of vital importance in risk scenarios. Drivers can take correct evasive maneuver at proper cushion time to avoid the potential traffic crashes, but this reaction process is highly experience-dependent and requires various levels of driving skills. To improve driving safety and avoid the traffic accidents, it is necessary to provide all road drivers with on-board driving assistan…
▽ More
Driver reaction is of vital importance in risk scenarios. Drivers can take correct evasive maneuver at proper cushion time to avoid the potential traffic crashes, but this reaction process is highly experience-dependent and requires various levels of driving skills. To improve driving safety and avoid the traffic accidents, it is necessary to provide all road drivers with on-board driving assistance. This study explores the plausibility of case-based reasoning (CBR) as the inference paradigm underlying the choice of personalized crash evasive maneuvers and the cushion time, by leveraging the wealthy of human driving experience from the steady stream of traffic cases, which have been rarely explored in previous studies. To this end, in this paper, we propose an open evolving framework for generating personalized on-board driving assistance. In particular, we present the FFMTE model with high performance to model the traffic events and build the case database; A tailored CBR-based method is then proposed to retrieve, reuse and revise the existing cases to generate the assistance. We take the 100-Car Naturalistic Driving Study dataset as an example to build and test our framework; the experiments show reasonable results, providing the drivers with valuable evasive information to avoid the potential crashes in different scenarios.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
SINCO: A Novel structural regularizer for image compression using implicit neural representations
Authors:
Harry Gao,
Weijie Gan,
Zhixin Sun,
Ulugbek S. Kamilov
Abstract:
Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we…
▽ More
Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we propose to further improve image quality by using a new structural regularizer. We present structural regularization for INR compression (SINCO) as a novel INR method for image compression. SINCO imposes structural consistency of the compressed images to the groundtruth by using a segmentation network to penalize the discrepancy of segmentation masks predicted from compressed images. We validate SINCO on brain MRI images by showing that it can achieve better performance than some recent INR methods.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
CoRRECT: A Deep Unfolding Framework for Motion-Corrected Quantitative R2* Map**
Authors:
Xiaojian Xu,
Weijie Gan,
Satya V. V. N. Kothapalli,
Dmitriy A. Yablonskiy,
Ulugbek S. Kamilov
Abstract:
Quantitative MRI (qMRI) refers to a class of MRI methods for quantifying the spatial distribution of biological tissue parameters. Traditional qMRI methods usually deal separately with artifacts arising from accelerated data acquisition, involuntary physical motion, and magnetic-field inhomogeneities, leading to suboptimal end-to-end performance. This paper presents CoRRECT, a unified deep unfoldi…
▽ More
Quantitative MRI (qMRI) refers to a class of MRI methods for quantifying the spatial distribution of biological tissue parameters. Traditional qMRI methods usually deal separately with artifacts arising from accelerated data acquisition, involuntary physical motion, and magnetic-field inhomogeneities, leading to suboptimal end-to-end performance. This paper presents CoRRECT, a unified deep unfolding (DU) framework for qMRI consisting of a model-based end-to-end neural network, a method for motion-artifact reduction, and a self-supervised learning scheme. The network is trained to produce R2* maps whose k-space data matches the real data by also accounting for motion and field inhomogeneities. When deployed, CoRRECT only uses the k-space data without any pre-computed parameters for motion or inhomogeneity correction. Our results on experimentally collected multi-Gradient-Recalled Echo (mGRE) MRI data show that CoRRECT recovers motion and inhomogeneity artifact-free R2* maps in highly accelerated acquisition settings. This work opens the door to DU methods that can integrate physical measurement models, biophysical signal models, and learned prior models for high-quality qMRI.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Self-Supervised Deep Equilibrium Models for Inverse Problems with Theoretical Guarantees
Authors:
Weijie Gan,
Chunwei Ying,
Parna Eshraghi,
Tongyao Wang,
Cihat Eldeniz,
Yuyang Hu,
Jiaming Liu,
Yasheng Chen,
Hongyu An,
Ulugbek S. Kamilov
Abstract:
Deep equilibrium models (DEQ) have emerged as a powerful alternative to deep unfolding (DU) for image reconstruction. DEQ models-implicit neural networks with effectively infinite number of layers-were shown to achieve state-of-the-art image reconstruction without the memory complexity associated with DU. While the performance of DEQ has been widely investigated, the existing work has primarily fo…
▽ More
Deep equilibrium models (DEQ) have emerged as a powerful alternative to deep unfolding (DU) for image reconstruction. DEQ models-implicit neural networks with effectively infinite number of layers-were shown to achieve state-of-the-art image reconstruction without the memory complexity associated with DU. While the performance of DEQ has been widely investigated, the existing work has primarily focused on the settings where groundtruth data is available for training. We present self-supervised deep equilibrium model (SelfDEQ) as the first self-supervised reconstruction framework for training model-based implicit networks from undersampled and noisy MRI measurements. Our theoretical results show that SelfDEQ can compensate for unbalanced sampling across multiple acquisitions and match the performance of fully supervised DEQ. Our numerical results on in-vivo MRI data show that SelfDEQ leads to state-of-the-art performance using only undersampled and noisy training data.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
SPICER: Self-Supervised Learning for MRI with Automatic Coil Sensitivity Estimation and Reconstruction
Authors:
Yuyang Hu,
Weijie Gan,
Chunwei Ying,
Tongyao Wang,
Cihat Eldeniz,
Jiaming Liu,
Yasheng Chen,
Hongyu An,
Ulugbek S. Kamilov
Abstract:
Deep model-based architectures (DMBAs) integrating physical measurement models and learned image regularizers are widely used in parallel magnetic resonance imaging (PMRI). Traditional DMBAs for PMRI rely on pre-estimated coil sensitivity maps (CSMs) as a component of the measurement model. However, estimation of accurate CSMs is a challenging problem when measurements are highly undersampled. Add…
▽ More
Deep model-based architectures (DMBAs) integrating physical measurement models and learned image regularizers are widely used in parallel magnetic resonance imaging (PMRI). Traditional DMBAs for PMRI rely on pre-estimated coil sensitivity maps (CSMs) as a component of the measurement model. However, estimation of accurate CSMs is a challenging problem when measurements are highly undersampled. Additionally, traditional training of DMBAs requires high-quality groundtruth images, limiting their use in applications where groundtruth is difficult to obtain. This paper addresses these issues by presenting SPICE as a new method that integrates self-supervised learning and automatic coil sensitivity estimation. Instead of using pre-estimated CSMs, SPICE simultaneously reconstructs accurate MR images and estimates high-quality CSMs. SPICE also enables learning from undersampled noisy measurements without any groundtruth. We validate SPICE on experimentally collected data, showing that it can achieve state-of-the-art performance in highly accelerated data acquisition settings (up to 10x).
△ Less
Submitted 6 June, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Performance Evaluation of Selective Fixed-filter Active Noise Control based on Different Convolutional Neural Networks
Authors:
Zhengding Luo,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for different types of noise. Deep learning technologi…
▽ More
Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for different types of noise. Deep learning technologies, thus, can be used in SFANC methods to enable a more flexible selection of the most appropriate control filters for attenuating various noises. Furthermore, with the assistance of a deep neural network, the selecting strategy can be learned automatically from noise data rather than through trial and error, which significantly simplifies and improves the practicability of ANC design. Therefore, this paper investigates the performance of SFANC based on different one-dimensional and two-dimensional convolutional neural networks. Additionally, we conducted comparative analyses of several network training strategies and discovered that fine-tuning could improve selection performance.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Implementation of Multi-channel Active Noise Control based on Back-propagation Mechanism
Authors:
Zhengding Luo,
Dongyuan Shi,
Junwei Ji,
Woon-seng Gan
Abstract:
Active noise control (ANC) systems can efficiently attenuate low-frequency noises by introducing anti-noises to combine with the unwanted noises. In ANC systems, the filtered-x least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are well-known algorithms for adaptively adjusting control filters. Multi-channel ANC systems are typically required to attenuate unwa…
▽ More
Active noise control (ANC) systems can efficiently attenuate low-frequency noises by introducing anti-noises to combine with the unwanted noises. In ANC systems, the filtered-x least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are well-known algorithms for adaptively adjusting control filters. Multi-channel ANC systems are typically required to attenuate unwanted noises in a large space. However, open-source implementations of the multi-channel FxLMS (McFxLMS) and multi-channel FxNLMS (McFxNLMS) algorithm continue to be scarce. Therefore, this paper proposes a simple and effective implementation approach of the McFxLMS and McFxNLMS algorithm. Motivated by the back-propagation process during neural network training, the McFxLMS and McFxNLMS algorithm can be implemented via automatic derivation mechanism. We implemented the two algorithms using the automatic derivation mechanism in PyTorch and made the source code available on GitHub. This implementation method can improve the practicality of multi-channel ANC systems, which is expected to be widely used in ANC applications.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning
Authors:
Zhengding Luo,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors…
▽ More
The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors through adaptive optimization. Nonetheless, its slow convergence has a detrimental effect on dynamic noise attenuation. Therefore, this paper proposes a hybrid SFANC-FxNLMS approach to overcome the adaptive algorithm's slow convergence and provide a better noise reduction level than the SFANC method. A lightweight one-dimensional convolutional neural network (1D CNN) is designed to automatically select the most suitable pre-trained control filter for each frame of the primary noise. Meanwhile, the FxNLMS algorithm continues to update the coefficients of the chosen pre-trained control filter at the sampling rate. Owing to the effective combination of the two algorithms, experimental results show that the hybrid SFANC-FxNLMS algorithm can achieve a rapid response time, a low noise reduction error, and a high degree of robustness.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Assessment of a cost-effective headphone calibration procedure for soundscape evaluations
Authors:
Bhan Lam,
Kenneth Ooi,
Zhen-Ting Ong,
Karn N. Watcharasupat,
Trevor Wong,
Woon-Seng Gan
Abstract:
To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from…
▽ More
To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from the intended equivalent continuous A-weighted sound pressure levels ($L_{\text{A,eq}}$) using the open-circuit voltage (OCV) calibration procedure. For a more holistic human-centric perspective, the OCV method is further investigated here in terms of psychoacoustic parameters, including relevant exceedance levels to account for temporal effects on the same 27 stimuli from the SATP. Moreover, a within-subjects experiment with 36 participants was conducted to examine the effects of OCV calibration on the PAQ attributes in ISO/TS~12913-2:2018. Bland-Altman analysis of the objective indicators revealed large biases in the OCV method across all weighted sound level and loudness indicators; and roughness indicators at \SI{5}{\%} and \SI{10}{\%} exceedance levels. Significant perceptual differences due to the OCV method were observed in about \SI{20}{\%} of the stimuli, which did not correspond clearly with the biased acoustic indicators. A cautioned interpretation of the objective and perceptual differences due to small and unpaired samples nevertheless provide grounds for further investigation.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
Do uHear? Validation of uHear App for Preliminary Screening of Hearing Ability in Soundscape Studies
Authors:
Zhen-Ting Ong,
Bhan Lam,
Kenneth Ooi,
Karn N. Watcharasupat,
Trevor Wong,
Woon-Seng Gan
Abstract:
Studies involving soundscape perception often exclude participants with hearing loss to prevent impaired perception from affecting experimental results. Participants are typically screened with pure tone audiometry, the "gold standard" for identifying and quantifying hearing loss at specific frequencies, and excluded if a study-dependent threshold is not met. However, procuring professional audiom…
▽ More
Studies involving soundscape perception often exclude participants with hearing loss to prevent impaired perception from affecting experimental results. Participants are typically screened with pure tone audiometry, the "gold standard" for identifying and quantifying hearing loss at specific frequencies, and excluded if a study-dependent threshold is not met. However, procuring professional audiometric equipment for soundscape studies may be cost-ineffective, and manually performing audiometric tests is labour-intensive. Moreover, testing requirements for soundscape studies may not require sensitivities and specificities as high as that in a medical diagnosis setting. Hence, in this study, we investigate the effectiveness of the uHear app, an iOS application, as an affordable and automatic alternative to a conventional audiometer in screening participants for hearing loss for the purpose of soundscape studies or listening tests in general. Based on audiometric comparisons with the audiometer of 163 participants, the uHear app was found to have high precision (98.04%) when using the World Health Organization (WHO) grading scheme for assessing normal hearing. Precision is further improved (98.69%) when all frequencies assessed with the uHear app is considered in the grading, which lends further support to this cost-effective, automated alternative to screen for normal hearing.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes
Authors:
Kenneth Ooi,
Zhen-Ting Ong,
Karn N. Watcharasupat,
Bhan Lam,
Joo Young Hong,
Woon-Seng Gan
Abstract:
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which…
▽ More
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature.
△ Less
Submitted 5 March, 2023; v1 submitted 3 July, 2022;
originally announced July 2022.
-
FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Authors:
Shengkui Zhao,
Bin Ma,
Karn N. Watcharasupat,
Woon-Seng Gan
Abstract:
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder…
▽ More
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder (CRED) structure to boost feature representation along the frequency axis. The CRED applies frequency recurrence on 3D convolutional feature maps along the frequency axis following each convolution, therefore, it is capable of catching long-range frequency correlations and enhancing feature representations of speech inputs. The proposed frequency recurrence is realized efficiently using a feedforward sequential memory network (FSMN). Besides the CRED, we insert two stacked FSMN layers between the encoder and the decoder to model further temporal dynamics. We name the proposed framework as Frequency Recurrent CRN (FRCRN). We design FRCRN to predict complex Ideal Ratio Mask (cIRM) in complex-valued domain and optimize FRCRN using both time-frequency-domain and time-domain losses. Our proposed approach achieved state-of-the-art performance on wideband benchmark datasets and achieved 2nd place for the real-time fullband track in terms of Mean Opinion Score (MOS) and Word Accuracy (WAcc) in the ICASSP 2022 Deep Noise Suppression (DNS) challenge (https://github.com/alibabasglab/FRCRN).
△ Less
Submitted 24 November, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Formation Tracking for a Multi-Auv System Based on an Adaptive Sliding Mode Method in the Water Flow Environment
Authors:
Xin Li,
Daqi Zhu,
Bing Sun,
Qi Chen,
Wenyang Gan,
Zhigang Li
Abstract:
In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the im…
▽ More
In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the improved sliding mode method. A second order sliding mode control method is adopted to eliminate the chatting phenomenon of the controller. Thirdly, considering the water flow in the underwater working environment of the AUVs, an adaptive module is added to the controller. With the adaptive approach, the finite disturbances caused by water flow could be handled with the controller. The proposed method achieves stability by substituting an adaptive continuous term for the switching term in the controller. At last, a robust sliding mode controller with continuous model predictive control strategy for the multi-AUV system is developed to achieve leader-follower formation tracking under the presence of bounded flow disturbances, and simulations are implemented to confirm the effectiveness of the proposed method.
△ Less
Submitted 17 January, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering
Authors:
Kenneth Ooi,
Bhan Lam,
Joo Young Hong,
Karn N. Watcharasupat,
Zhen-Ting Ong,
Woon-Seng Gan
Abstract:
The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and…
▽ More
The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and improve result reliability. Hence, we propose a robust participant-led method to pinpoint characteristic soundscapes possessing arbitrary perceptual attributes. We validate our method by identifying Singaporean soundscapes spanning the perceptual quadrants generated from the "Pleasantness" and "Eventfulness" axes of the ISO 12913-2 circumplex model of soundscape perception, as perceived by local experts. From memory and experience, 67 participants first selected locations corresponding to each perceptual quadrant in each major planning region of Singapore. We then performed weighted k-means clustering on the selected locations, with weights for each location derived from previous frequencies and durations spent in each location by each participant. Weights hence acted as proxies for participant confidence. In total, 62 locations were thereby identified as suitable locations with characteristic soundscapes for further research utilizing the ISO 12913-2 perceptual quadrants. Audio-visual recordings and acoustic characterization of the soundscapes will be made in a future study.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Online Deep Equilibrium Learning for Regularization by Denoising
Authors:
Jiaming Liu,
Xiaojian Xu,
Weijie Gan,
Shirin Shoushtari,
Ulugbek S. Kamilov
Abstract:
Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-en…
▽ More
Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-end optimal. The recent Deep Equilibrium Models (DEQ) framework has enabled memory-efficient end-to-end learning of PnP/RED priors by implicitly differentiating through the fixed-point equations without storing intermediate activation values. However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications. We propose ODER as a new strategy for improving the efficiency of DEQ through stochastic approximations of the measurement models. We theoretically analyze ODER giving insights into its convergence and ability to approximate the traditional DEQ approach. Our numerical results suggest the potential improvements in training/testing complexity due to ODER on three distinct imaging applications.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Preliminary assessment of a cost-effective headphone calibration procedure for soundscape evaluations
Authors:
Bhan Lam,
Kenneth Ooi,
Karn N. Watcharasupat,
Zhen-Ting Ong,
Yun-Ting Lau,
Trevor Wong,
Woon-Seng Gan
Abstract:
The introduction of ISO 12913-2:2018 has provided a framework for standardized data collection and reporting procedures for soundscape practitioners. A strong emphasis was placed on the use of calibrated head and torso simulators (HATS) for binaural audio capture to obtain an accurate subjective impression and acoustic measure of the soundscape under evaluation. To auralise the binaural recordings…
▽ More
The introduction of ISO 12913-2:2018 has provided a framework for standardized data collection and reporting procedures for soundscape practitioners. A strong emphasis was placed on the use of calibrated head and torso simulators (HATS) for binaural audio capture to obtain an accurate subjective impression and acoustic measure of the soundscape under evaluation. To auralise the binaural recordings as recorded or at set levels, the audio stimuli and the headphone setup are usually calibrated with a HATS. However, calibrated HATS are too financially prohibitive for most research teams, inevitably diminishing the availability of the soundscape standard. With the increasing availability of soundscape binaural recording datasets, and the importance of cross-cultural validation of the soundscape ISO standards, e.g.\ via the Soundscape Attributes Translation Project (SATP), it is imperative to assess the suitability of cost-effective headphone calibration methods to maximise availability without severely compromising on accuracy. Hence, this study objectively examines an open-circuit voltage (OCV) calibration method in comparison to a calibrated HATS on various soundcard and headphone combinations. Preliminary experiments found that calibration with the OCV method differed significantly from the reference binaural recordings in sound pressure levels, whereas negligible differences in levels were observed with the HATS calibration.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation
Authors:
Trevor Wong,
Karn N. Watcharasupat,
Bhan Lam,
Kenneth Ooi,
Zhen-Ting Ong,
Furi Andi Karnapi,
Woon-Seng Gan
Abstract:
Soundscape augmentation is an emerging approach for noise mitigation by introducing additional sounds known as "maskers" to increase acoustic comfort. Traditionally, the choice of maskers is often predicated on expert guidance or post-hoc analysis which can be time-consuming and sometimes arbitrary. Moreover, this often results in a static set of maskers that are inflexible to the dynamic nature o…
▽ More
Soundscape augmentation is an emerging approach for noise mitigation by introducing additional sounds known as "maskers" to increase acoustic comfort. Traditionally, the choice of maskers is often predicated on expert guidance or post-hoc analysis which can be time-consuming and sometimes arbitrary. Moreover, this often results in a static set of maskers that are inflexible to the dynamic nature of real-world acoustic environments. Overcoming the inflexibility of traditional soundscape augmentation is twofold. First, given a snapshot of a soundscape, the system must be able to select an optimal masker without human supervision. Second, the system must also be able to react to changes in the acoustic environment with near real-time latency. In this work, we harness the combined prowess of cloud computing and the Internet of Things (IoT) to allow in-situ listening and playback using microcontrollers while delegating computationally expensive inference tasks to the cloud. In particular, a serverless cloud architecture was used for inference, ensuring near real-time latency and scalability without the need to provision computing resources. A working prototype of the system is currently being deployed in a public area experiencing high traffic noise, as well as undergoing public evaluation for future improvements.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain
Authors:
Karn N. Watcharasupat,
Kenneth Ooi,
Bhan Lam,
Trevor Wong,
Zhen-Ting Ong,
Woon-Seng Gan
Abstract:
The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and l…
▽ More
The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and labour-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to the dynamic nature of real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker and gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference time, as well as the tedious pre-calibration process required for new maskers. The proposed system was validated on a large-scale dataset of subjective responses to augmented soundscapes with more than 440 participants, ensuring the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level.
△ Less
Submitted 23 July, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Image Reconstruction for MRI using Deep CNN Priors Trained without Groundtruth
Authors:
Weijie Gan,
Cihat Eldeniz,
Jiaming Liu,
Sihao Chen,
Hongyu An,
Ulugbek S. Kamilov
Abstract:
We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors. Our prior is specified through a convolutional neural network (CNN) trained without any artifact-free ground truth to remove undersampling artifacts from MR images. The results on reconstructing free-breathing MRI data into ten r…
▽ More
We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors. Our prior is specified through a convolutional neural network (CNN) trained without any artifact-free ground truth to remove undersampling artifacts from MR images. The results on reconstructing free-breathing MRI data into ten respiratory phases show that the method can form high-quality 4D images from severely undersampled measurements corresponding to acquisitions of about 1 and 2 minutes in length. The results also highlight the competitive performance of the method compared to several popular alternatives, including the TGV regularization and traditional UNet3D.
△ Less
Submitted 10 April, 2022;
originally announced April 2022.