Search | arXiv e-print repository

Soundscape Captioning using Sound Affective Quality Network and Large Language Model

Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship between sounds and the emotions they evoke within a context. To fill this gap and to automate soundscape analysis, which traditionally relies on labour-intensive subjective ratings and surveys, we propose the soundscape captioning (SoundSCap) task. SoundSCap generates context-aware soundscape descriptions by capturing the acoustic scene, event information, and the corresponding human affective qualities. To this end, we propose an automatic soundscape captioner (SoundSCaper) composed of an acoustic model, SoundAQnet, and a general large language model (LLM). SoundAQnet simultaneously models multi-scale information about acoustic scenes, events, and perceived affective qualities, while LLM generates soundscape captions by parsing the information captured by SoundAQnet to a common language. The soundscape caption's quality is assessed by a jury of 16 audio/soundscape experts. The average score (out of 5) of SoundSCaper-generated captions is lower than the score of captions generated by two soundscape experts by 0.21 and 0.25, respectively, on the evaluation set and the model-unknown mixed external dataset with varying lengths and acoustic properties, but the differences are not statistically significant. Overall, SoundSCaper-generated captions show promising performance compared to captions annotated by soundscape experts. The models' code, LLM scripts, human assessment data and instructions, and expert evaluation statistics are all publicly available. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

arXiv:2311.16927 [pdf, other]

Study of speaker localization under dynamic and reverberant environments

Authors: Daniel A. Mitchell, Boaz Rafaely

Abstract: Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with mov… ▽ More Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with moving sources and arrays. This calls for algorithms that perform well in dynamic environments. In this article, we study the performance of a speaker localization algorithm in such an environment. The study is based on the recently published EasyCom speech dataset recorded in reverberant and noisy environments using a wearable array on glasses. Although the localization algorithm performs well in static environments, its performance degraded substantially when used on the EasyCom dataset. The paper presents performance analysis and proposes methods for improvement. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Journal ref: in Proceedings of the 24rd International Congress on Acoustics (ICA 2022), no. ABS-0359, Oct 2022

arXiv:2311.09030 [pdf]

doi 10.1121/10.0022408

AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

Authors: Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang, Andrew Mitchell, Francesco Aletta, Jian Kang, Dick Botteldooren

Abstract: Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex… ▽ More Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: The Journal of the Acoustical Society of America, 154 (5), 3145

Journal ref: The Journal of the Acoustical Society of America, 154, 3145 (2023)

arXiv:2309.07155 [pdf]

doi 10.1109/JLT.2023.3314526

Maximizing the performance for microcomb based microwave photonic transversal signal processors

Authors: Yang Sun, Jiayang Wu, Yang Li, Xingyuan Xu, Guanghui Ren, Mengxi Tan, Sai Tak Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wa… ▽ More Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wavelength sources for implementing MWP transversal signal processors with significantly reduced size, power consumption, and complexity. By using microcomb-based MWP transversal signal processors, a diverse range of signal processing functions have been demonstrated recently. In this paper, we provide a detailed analysis for the processing inaccuracy that is induced by the imperfect response of experimental components. First, we investigate the errors arising from different sources including imperfections in the microcombs, the chirp of electro-optic modulators, chromatic dispersion of the dispersive module, sha** errors of the optical spectral shapers, and noise of the photodetector. Next, we provide a global picture quantifying the impact of different error sources on the overall system performance. Finally, we introduce feedback control to compensate the errors caused by experimental imperfections and achieve significantly improved accuracy. These results provide a guide for optimizing the accuracy of microcomb-based MWP transversal signal processors. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 15 pages, 12 figures, 60 references

Journal ref: Journal of Lightwave Technology Volume 41, (2023)

arXiv:2308.11980 [pdf, other]

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

arXiv:2207.14424 [pdf]

Phase retrieval of programmable photonic integrated circuits based on an on-chip fractional-delay reference path

Authors: Xingyuan Xu, Guanghui Ren, Aditya Dubey, Tim Feleppa, Xumeng Liu, Andreas Boes, Arnan Mitchell, Arthur J. Lowery

Abstract: Programmable photonic integrated circuits (PICs), offering diverse signal processing functions within a single chip, are promising solutions for applications ranging from optical communications to artificial intelligence. While the scale and complexity of programmable PICs is increasing, the characterization, and thus calibration, of them becomes increasingly challenging. Here we demonstrate a pha… ▽ More Programmable photonic integrated circuits (PICs), offering diverse signal processing functions within a single chip, are promising solutions for applications ranging from optical communications to artificial intelligence. While the scale and complexity of programmable PICs is increasing, the characterization, and thus calibration, of them becomes increasingly challenging. Here we demonstrate a phase retrieval method for programmable PICs using an on-chip fractional-delay reference path. The impulse response of the chip can be uniquely and precisely identified from only the insertion loss using a standard complex Fourier transform. We demonstrate our approach experimentally with a 4-tap finite-impulse-response chip. The results match well with expectations and verifies our approach as effective for individually determining the taps' weights without the need for additional ports and photodiodes. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2207.06245

Hitless memory-reconfigurable photonic reservoir computing architecture

Authors: Mohab Abdalla, Clément Zrounba, Raphael Cardoso, Paul Jimenez, Guanghui Ren, Andreas Boes, Arnan Mitchell, Alberto Bosio, Ian O'Connor, Fabio Pavanello

Abstract: Reservoir computing is an analog bio-inspired computation model for efficiently processing time-dependent signals, the photonic implementations of which promise a combination of massive parallel information processing, low power consumption, and high speed operation. However, most implementations, especially for the case of time-delay reservoir computing (TDRC), require signal attenuation in the r… ▽ More Reservoir computing is an analog bio-inspired computation model for efficiently processing time-dependent signals, the photonic implementations of which promise a combination of massive parallel information processing, low power consumption, and high speed operation. However, most implementations, especially for the case of time-delay reservoir computing (TDRC), require signal attenuation in the reservoir to achieve the desired system dynamics for a specific task, often resulting in large amounts of power being coupled outside of the system. We propose a novel TDRC architecture based on an asymmetric Mach-Zehnder interferometer (MZI) integrated in a resonant cavity which allows the memory capacity of the system to be tuned without the need for an optical attenuator block. Furthermore, this can be leveraged to find the optimal value for the specific components of the total memory capacity metric. We demonstrate this approach on the temporal bitwise XOR task and conclude that this way of memory capacity reconfiguration allows optimal performance to be achieved for memory-specific tasks. △ Less

Submitted 17 May, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: The paper has been withdrawn by the authors due to their belief that the arguments and results presented in the paper are not mature enough, and includes a slight error

arXiv:2112.04586 [pdf, other]

Monolithic Integration of Quantum Resonant Tunneling Gate on a 22nm FD-SOI CMOS Process

Authors: Imran Bashir, Dirk Leipold, Elena Blokhina, Mike Asker, David Redmond, Ali Esmailiyan, Panagiotis Giounanlis, Hans Haenlein, Xuton Wu, Andrii Sokolov, Dennis Andrade-Miceli, Andrew K. Mitchell, Robert Bogdan Staszewski

Abstract: The proliferation of quantum computing technologies has fueled the race to build a practical quantum computer. The spectrum of the innovation is wide and encompasses many aspects of this technology, such as the qubit, control and detection mechanism, cryogenic electronics, and system integration. A few of those emerging technologies are poised for successful monolithic integration of cryogenic ele… ▽ More The proliferation of quantum computing technologies has fueled the race to build a practical quantum computer. The spectrum of the innovation is wide and encompasses many aspects of this technology, such as the qubit, control and detection mechanism, cryogenic electronics, and system integration. A few of those emerging technologies are poised for successful monolithic integration of cryogenic electronics with the quantum structure where the qubits reside. In this work, we present a fully integrated Quantum Processor Unit in which the quantum core is co-located with control and detection circuits on the same die in a commercial 22-nm FD-SOI process from GlobalFoundries. The system described in this work comprises a two dimensional (2D) 240 qubits array integrated with 8 detectors and 32 injectors operating at 3K and inside a two-stage Gifford-McMahon cryo-cooler. The power consumption of each detector and injector is 1mW and 0.27mW, respectively. The control sequence is programmed into an on-chip pattern generator that acts as a command and control block for all hardware in the Quantum Processor Unit. Using the aforementioned apparatus, we performed a quantum resonant tunneling experiment on two qubits inside the 2D qubit array. With supporting lab measurements, we demonstrate the feasibility of the proposed architecture in scaling-up the existing quantum core to thousands of qubits. △ Less

Submitted 4 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2005.02869 [pdf]

doi 10.1109/JLT.2020.2997699

Photonic RF channelizer based on a 90 wavelength optical soliton crystal 49GHz Kerr microcomb

Authors: Xingyuan Xu, Mengxi Tan, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive opt… ▽ More We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive optical periodic filter to slice the RF spectrum with a high resolution of 121.4 MHz. We experimentally achieve an instantaneous RF operation bandwidth of 8.08 GHz and verify RF channelization up to 17.55 GHz via thermal tuning. Our approach is a significant step towards the monolithically integrated photonic RF receivers with reduced complexity, size, and unprecedented performance, which is important for wide RF applications ranging from broadband analog signal processing to digital-compatible signal detection. △ Less

Submitted 20 April, 2020; originally announced May 2020.

Comments: 7 pages, 4 figures, 59 references

Journal ref: Journal of Lightwave Technology Early Access vol. 38 (2020)

arXiv:2002.09676 [pdf, other]

Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion

Authors: Siddhant Gangapurwala, Alexander Mitchell, Ioannis Havoutis

Abstract: Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee… ▽ More Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning. △ Less

Submitted 22 February, 2020; originally announced February 2020.

Comments: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics and Automation Letters (RA-L), January 2020 with presentation at International Conference on Robotics and Automation (ICRA) 2020

arXiv:1910.06282 [pdf]

doi 10.1109/JLT.2019.2946606

Microwave photonic fractional Hilbert transformer with an integrated optical soliton crystal micro-comb

Authors: Mengxi Tan, Xingyuan Xu, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitra… ▽ More We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitrary fractional orders and a broad operation bandwidth. We experimentally characterize the RF amplitude and phase response for different fractional orders and perform system demonstrations of real-time fractional Hilbert transforms. We achieve a phase ripple of < 0.15 rad within the 3-dB pass-band, with bandwidths ranging from 5 to 9 octaves, depending on the order. The experimental results show good agreement with theory, confirming the effectiveness of our approach as a new way to implement high-performance fractional Hilbert transformers with broad processing bandwidth, high reconfigurability, and greatly reduced size and complexity. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 12 pages, 7 figures, 61 references

Journal ref: IEEE Journal of Lightwave Technology, Volume 37, (2019)

arXiv:1909.03353 [pdf]

doi 10.1109/LPT.2019.2940497

Microwave and RF signal processing based on integrated soliton crystal optical microcombs

Authors: Xingyuan Xu, Mengxi Tan, Jiayang Wu, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The stro… ▽ More Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The strong potential of optical micro-combs for RF photonics applications in terms of functions and integrability is also discussed. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 7 pages, 7 figures, 39 references

Journal ref: IEEE Photonics Technology Letters Volume 31 (2019)

arXiv:1808.08828 [pdf]

Photonic single sideband RF generator based on an integrated optical micro-ring resonator

Authors: Xingyuan Xu, Jiayang Wu, Mengxi Tan, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator.… ▽ More We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator. We achieve a very large dynamic tuning range of over 55 dB for both the optical carrier-to-sideband ratio and the dual-channel RF equalization. △ Less

Submitted 7 August, 2018; originally announced August 2018.

Comments: 10 pages, 13 Figures, 53 references

Journal ref: IEEE Journal of Lightwave Technology (JLT) Volume 36 (2018)

arXiv:1804.10040 [pdf]

doi 10.1109/PIERS-FALL.2017.8293510

High-order Radio Frequency Differentiation via Photonic Signal Processing with an Integrated Micro-resonator Kerr Optical Frequency Comb Source

Authors: Xingyuan Xu, Jiayang Wu, Mehrdad Shoeiby, Sai T. Chu, Brent E. Little, Roberto Morandotti, Arnan Mitchell, David J. Moss

Abstract: We demonstrate the use of integrated micro-resonator based optical frequency comb sources as the basis for transversal filtering functions for microwave and radio frequency photonic filtering and advanced functions. We demonstrate the use of integrated micro-resonator based optical frequency comb sources as the basis for transversal filtering functions for microwave and radio frequency photonic filtering and advanced functions. △ Less

Submitted 7 April, 2018; originally announced April 2018.

Comments: 8 pages, 7 figures, 46 References. arXiv admin note: substantial text overlap with arXiv:1512.01741, arXiv:1512.06301

Journal ref: 2017 Progress In Electromagnetics Research Symposium Fall (PIERS, FALL), Singapore, 19 to 22 November, Pages 2232 to 2236

Showing 1–14 of 14 results for author: Mitchell, A