Skip to main content

Showing 1–23 of 23 results for author: Watcharasupat, K N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18747  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems.… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted to the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

  2. arXiv:2309.02539  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

    Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More

    Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

  3. arXiv:2308.07767  [pdf, other

    eess.AS cs.SD

    Preliminary investigation of the short-term in situ performance of an automatic masker selection system

    Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Woon-Seng Gan

    Abstract: Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: paper submitted to the 52nd International Congress and Exposition on Noise Control Engineering held in Chiba, Greater Tokyo, Japan, on 20-23 August 2023 (Inter-Noise 2023)

    ACM Class: J.2; J.4

  4. arXiv:2306.08053  [pdf, other

    eess.AS cs.SD

    Quantifying Spatial Audio Quality Impairment

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical components of spatial audio quality, however, remain scarce, despite being perhaps the least subjective aspect of spatial audio quality to quantify. By considering i… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

  5. Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

    Authors: Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan

    Abstract: Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural n… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures. Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1-5

  6. arXiv:2207.12899  [pdf, other

    eess.AS cs.SD

    Assessment of a cost-effective headphone calibration procedure for soundscape evaluations

    Authors: Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Trevor Wong, Woon-Seng Gan

    Abstract: To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: For 24th International Congress on Acoustics

    Journal ref: in Proc. 24th Int. Congr. Acoust., 2022, pp. 1-8

  7. ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes

    Authors: Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Joo Young Hong, Woon-Seng Gan

    Abstract: Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which… ▽ More

    Submitted 5 March, 2023; v1 submitted 3 July, 2022; originally announced July 2022.

    Comments: [v1, v2] 25 pages, 11 figures. [v3] 33 pages, 18 figures. v3 updated with changes made after peer review. in IEEE Transactions on Affective Computing, 2023

    Journal ref: IEEE Trans. Affect. Comput., pp. 1-17, 2023

  8. arXiv:2206.07293  [pdf, other

    cs.SD eess.AS

    FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

    Authors: Shengkui Zhao, Bin Ma, Karn N. Watcharasupat, Woon-Seng Gan

    Abstract: Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder… ▽ More

    Submitted 24 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables

  9. arXiv:2206.03112  [pdf

    cs.LG cs.SD eess.AS

    Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering

    Authors: Kenneth Ooi, Bhan Lam, Joo Young Hong, Karn N. Watcharasupat, Zhen-Ting Ong, Woon-Seng Gan

    Abstract: The ecological validity of soundscape studies usually rests on a choice of soundscapes that are representative of the perceptual space under investigation. For example, a soundscape pleasantness study might investigate locations with soundscapes ranging from "pleasant" to "annoying". The choice of soundscapes is typically researcher-led, but a participant-led process can reduce selection bias and… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: 23 pages, 8 figures. Submitted to Sustainability

    Journal ref: MDPI Sustainability. 2022; 14(12):7485

  10. Crossing the Linguistic Causeway: A Binational Approach for Translating Soundscape Attributes to Bahasa Melayu

    Authors: Bhan Lam, Julia Chieng, Karn N. Watcharasupat, Kenneth Ooi, Zhen-Ting Ong, Joo Young Hong, Woon-Seng Gan

    Abstract: Translation of perceptual descriptors such as the perceived affective quality attributes in the soundscape standard (ISO/TS 12913-2:2018) is an inherently intricate task, especially if the target language is used in multiple countries. Despite geographical proximity and a shared language of Bahasa Melayu (Standard Malay), differences in culture and language education policies between Singapore and… ▽ More

    Submitted 5 July, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Published in Applied Acoustics in the Special Issue on Soundscape Attributes Translation: Current Projects and Challenges

    Journal ref: Appl. Acoust., vol. 199, p. 108976, Oct. 2022

  11. arXiv:2205.04728  [pdf, other

    eess.AS cs.SD

    Preliminary assessment of a cost-effective headphone calibration procedure for soundscape evaluations

    Authors: Bhan Lam, Kenneth Ooi, Karn N. Watcharasupat, Zhen-Ting Ong, Yun-Ting Lau, Trevor Wong, Woon-Seng Gan

    Abstract: The introduction of ISO 12913-2:2018 has provided a framework for standardized data collection and reporting procedures for soundscape practitioners. A strong emphasis was placed on the use of calibrated head and torso simulators (HATS) for binaural audio capture to obtain an accurate subjective impression and acoustic measure of the soundscape under evaluation. To auralise the binaural recordings… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Submitted to the 28th International Congress on Sound and Vibration

  12. arXiv:2204.13890  [pdf, other

    eess.AS cs.SD eess.SY

    Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation

    Authors: Trevor Wong, Karn N. Watcharasupat, Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Furi Andi Karnapi, Woon-Seng Gan

    Abstract: Soundscape augmentation is an emerging approach for noise mitigation by introducing additional sounds known as "maskers" to increase acoustic comfort. Traditionally, the choice of maskers is often predicated on expert guidance or post-hoc analysis which can be time-consuming and sometimes arbitrary. Moreover, this often results in a static set of maskers that are inflexible to the dynamic nature o… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: To be presented at the 51st International Congress and Exposition on Noise Control Engineering

    Journal ref: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Feb. 2022, vol. 265, no. 5, pp. 2013-2021

  13. arXiv:2204.13883  [pdf, other

    eess.AS cs.LG cs.SD

    Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

    Authors: Karn N. Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, Woon-Seng Gan

    Abstract: The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and l… ▽ More

    Submitted 23 July, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: Accepted to IEEE Signal Processing Letters. (c) 2022 IEEE

    Journal ref: IEEE Signal Processing Letters, Vol. 29, pp. 1749 - 1753, 2022

  14. arXiv:2203.12245  [pdf, other

    cs.SD eess.AS stat.AP stat.ME

    Quantitative Evaluation Approach for Translation of Perceptual Soundscape Attributes: Initial Application to the Thai Language

    Authors: Karn N. Watcharasupat, Sureenate Jaratjarungkiat, Bhan Lam, Su**at Jitwiriyanont, Kanyanut Akaratham, Kenneth Ooi, Zhen-Ting Ong, Titima Suthiwan, Nitipong Pichetpan, Monthita Rojtinnakorn, Woon-Seng Gan

    Abstract: Translation of perceptual soundscape attributes from one language to another remains a challenging task that requires a high degree of fidelity in both psychoacoustic and psycholinguistic senses across the target population. Due to the inherently subjective nature of human perception, translating soundscape attributes using only small focus group discussion or expert panels could lead to translati… ▽ More

    Submitted 6 June, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Under review for Applied Acoustics (Special Issue on Soundscape Attributes Translation: Current Projects and Challenges)

    Journal ref: Appl. Acoust., vol. 200, p. 108962, Nov. 2022

  15. arXiv:2112.10638  [pdf, ps, other

    cs.LG cs.AI cs.IR cs.MS

    Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models

    Authors: Karn N. Watcharasupat, Junyoung Lee, Alexander Lerch

    Abstract: Latte (for LATent Tensor Evaluation) is a Python library for evaluation of latent-based generative models in the fields of disentanglement learning and controllable generation. Latte is compatible with both PyTorch and TensorFlow/Keras, and provides both functional and modular APIs that can be easily extended to support other deep learning frameworks. Using NumPy-based and framework-agnostic imple… ▽ More

    Submitted 22 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: To appear in Software Impacts

    Journal ref: Software Impacts, Volume 11, 2022, 100222, ISSN 2665-9638

  16. SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays

    Authors: Thi Ngoc Tho Nguyen, Douglas L. Jones, Karn N. Watcharasupat, Huy Phan, Woon-Seng Gan

    Abstract: Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightwei… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: text overlap with arXiv:2110.00275

    Journal ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 716-720

  17. arXiv:2111.02006  [pdf, other

    cs.SD eess.AS

    A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context

    Authors: Kenneth Ooi, Karn N. Watcharasupat, Santi Peksi, Furi Andi Karnapi, Zhen-Ting Ong, Danny Chua, Hui-Wen Leow, Li-Long Kwok, Xin-Lei Ng, Zhen-Ann Loh, Woon-Seng Gan

    Abstract: This paper introduces SINGA:PURA, a strongly labelled polyphonic urban sound dataset with spatiotemporal context. The data were collected via several recording units deployed across Singapore as a part of a wireless acoustic sensor network. These recordings were made as part of a project to identify and mitigate noise sources in Singapore, but also possess a wider applicability to sound event dete… ▽ More

    Submitted 11 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: 7 pages, 8 figures, 3 tables. To be published in Proceedings of the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

    Journal ref: Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 982-988

  18. arXiv:2111.01320  [pdf, other

    eess.AS cs.SD

    AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence

    Authors: Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

    Abstract: We propose a dataset, AVASpeech-SMAD, to assist speech and music activity detection research. With frame-level music labels, the proposed dataset extends the existing AVASpeech dataset, which originally consists of 45 hours of audio and speech activity labels. To the best of our knowledge, the proposed AVASpeech-SMAD is the first open-source dataset that features strong polyphonic labels for both… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  19. arXiv:2110.05587  [pdf, other

    cs.SD cs.IR cs.IT cs.LG eess.AS

    Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Controllable music generation with deep generative models has become increasingly reliant on disentanglement learning techniques. However, current disentanglement metrics, such as mutual information gap (MIG), are often inadequate and misleading when used for evaluating latent representations in the presence of interdependent semantic attributes often encountered in real-world music datasets. In t… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Submitted to the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference

  20. arXiv:2110.00745  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

    Authors: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

    Abstract: Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this… ▽ More

    Submitted 22 January, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

    Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

    Journal ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660

  21. SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

    Authors: Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan

    Abstract: Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses amplitude and/or phase differences between microphones to estimate source directions. As a result, it is often di… ▽ More

    Submitted 6 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022

  22. arXiv:2107.10471  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning

    Authors: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Zhen Jian Lee, Douglas L. Jones, Woon Seng Gan

    Abstract: The Sørensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-e… ▽ More

    Submitted 2 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Submitted to the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021

  23. arXiv:2107.10469  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis

    Authors: Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Zhen Jian Lee, Ngoc Khanh Nguyen, Douglas L. Jones, Woon Seng Gan

    Abstract: Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation. As a result, SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources. Furthermore, SELD often faces an additional challenge of assigning correct corresp… ▽ More

    Submitted 2 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted for the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021

    Journal ref: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop, pp. 120-124