-
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
Authors:
Yuanbo Hou,
Qiaoqiao Ren,
Andrew Mitchell,
Wenwu Wang,
Jian Kang,
Tony Belpaeme,
Dick Botteldooren
Abstract:
We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe…
▽ More
We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship between sounds and the emotions they evoke within a context. To fill this gap and to automate soundscape analysis, which traditionally relies on labour-intensive subjective ratings and surveys, we propose the soundscape captioning (SoundSCap) task. SoundSCap generates context-aware soundscape descriptions by capturing the acoustic scene, event information, and the corresponding human affective qualities. To this end, we propose an automatic soundscape captioner (SoundSCaper) composed of an acoustic model, SoundAQnet, and a general large language model (LLM). SoundAQnet simultaneously models multi-scale information about acoustic scenes, events, and perceived affective qualities, while LLM generates soundscape captions by parsing the information captured by SoundAQnet to a common language. The soundscape caption's quality is assessed by a jury of 16 audio/soundscape experts. The average score (out of 5) of SoundSCaper-generated captions is lower than the score of captions generated by two soundscape experts by 0.21 and 0.25, respectively, on the evaluation set and the model-unknown mixed external dataset with varying lengths and acoustic properties, but the differences are not statistically significant. Overall, SoundSCaper-generated captions show promising performance compared to captions annotated by soundscape experts. The models' code, LLM scripts, human assessment data and instructions, and expert evaluation statistics are all publicly available.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Study of speaker localization under dynamic and reverberant environments
Authors:
Daniel A. Mitchell,
Boaz Rafaely
Abstract:
Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with mov…
▽ More
Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with moving sources and arrays. This calls for algorithms that perform well in dynamic environments. In this article, we study the performance of a speaker localization algorithm in such an environment. The study is based on the recently published EasyCom speech dataset recorded in reverberant and noisy environments using a wearable array on glasses. Although the localization algorithm performs well in static environments, its performance degraded substantially when used on the EasyCom dataset. The paper presents performance analysis and proposes methods for improvement.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance
Authors:
Yuanbo Hou,
Qiaoqiao Ren,
Huizhong Zhang,
Andrew Mitchell,
Francesco Aletta,
Jian Kang,
Dick Botteldooren
Abstract:
Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex…
▽ More
Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Maximizing the performance for microcomb based microwave photonic transversal signal processors
Authors:
Yang Sun,
Jiayang Wu,
Yang Li,
Xingyuan Xu,
Guanghui Ren,
Mengxi Tan,
Sai Tak Chu,
Brent E. Little,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wa…
▽ More
Microwave photonic (MWP) transversal signal processors offer a compelling solution for realizing versatile high-speed information processing by combining the advantages of reconfigurable electrical digital signal processing and high-bandwidth photonic processing. With the capability of generating a number of discrete wavelengths from micro-scale resonators, optical microcombs are powerful multi-wavelength sources for implementing MWP transversal signal processors with significantly reduced size, power consumption, and complexity. By using microcomb-based MWP transversal signal processors, a diverse range of signal processing functions have been demonstrated recently. In this paper, we provide a detailed analysis for the processing inaccuracy that is induced by the imperfect response of experimental components. First, we investigate the errors arising from different sources including imperfections in the microcombs, the chirp of electro-optic modulators, chromatic dispersion of the dispersive module, sha** errors of the optical spectral shapers, and noise of the photodetector. Next, we provide a global picture quantifying the impact of different error sources on the overall system performance. Finally, we introduce feedback control to compensate the errors caused by experimental imperfections and achieve significantly improved accuracy. These results provide a guide for optimizing the accuracy of microcomb-based MWP transversal signal processors.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning
Authors:
Yuanbo Hou,
Siyang Song,
Cheng Luo,
Andrew Mitchell,
Qiaoqiao Ren,
Weicheng Xie,
Jian Kang,
Wenwu Wang,
Dick Botteldooren
Abstract:
Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel…
▽ More
Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Phase retrieval of programmable photonic integrated circuits based on an on-chip fractional-delay reference path
Authors:
Xingyuan Xu,
Guanghui Ren,
Aditya Dubey,
Tim Feleppa,
Xumeng Liu,
Andreas Boes,
Arnan Mitchell,
Arthur J. Lowery
Abstract:
Programmable photonic integrated circuits (PICs), offering diverse signal processing functions within a single chip, are promising solutions for applications ranging from optical communications to artificial intelligence. While the scale and complexity of programmable PICs is increasing, the characterization, and thus calibration, of them becomes increasingly challenging. Here we demonstrate a pha…
▽ More
Programmable photonic integrated circuits (PICs), offering diverse signal processing functions within a single chip, are promising solutions for applications ranging from optical communications to artificial intelligence. While the scale and complexity of programmable PICs is increasing, the characterization, and thus calibration, of them becomes increasingly challenging. Here we demonstrate a phase retrieval method for programmable PICs using an on-chip fractional-delay reference path. The impulse response of the chip can be uniquely and precisely identified from only the insertion loss using a standard complex Fourier transform. We demonstrate our approach experimentally with a 4-tap finite-impulse-response chip. The results match well with expectations and verifies our approach as effective for individually determining the taps' weights without the need for additional ports and photodiodes.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Hitless memory-reconfigurable photonic reservoir computing architecture
Authors:
Mohab Abdalla,
Clément Zrounba,
Raphael Cardoso,
Paul Jimenez,
Guanghui Ren,
Andreas Boes,
Arnan Mitchell,
Alberto Bosio,
Ian O'Connor,
Fabio Pavanello
Abstract:
Reservoir computing is an analog bio-inspired computation model for efficiently processing time-dependent signals, the photonic implementations of which promise a combination of massive parallel information processing, low power consumption, and high speed operation. However, most implementations, especially for the case of time-delay reservoir computing (TDRC), require signal attenuation in the r…
▽ More
Reservoir computing is an analog bio-inspired computation model for efficiently processing time-dependent signals, the photonic implementations of which promise a combination of massive parallel information processing, low power consumption, and high speed operation. However, most implementations, especially for the case of time-delay reservoir computing (TDRC), require signal attenuation in the reservoir to achieve the desired system dynamics for a specific task, often resulting in large amounts of power being coupled outside of the system. We propose a novel TDRC architecture based on an asymmetric Mach-Zehnder interferometer (MZI) integrated in a resonant cavity which allows the memory capacity of the system to be tuned without the need for an optical attenuator block. Furthermore, this can be leveraged to find the optimal value for the specific components of the total memory capacity metric. We demonstrate this approach on the temporal bitwise XOR task and conclude that this way of memory capacity reconfiguration allows optimal performance to be achieved for memory-specific tasks.
△ Less
Submitted 17 May, 2023; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Monolithic Integration of Quantum Resonant Tunneling Gate on a 22nm FD-SOI CMOS Process
Authors:
Imran Bashir,
Dirk Leipold,
Elena Blokhina,
Mike Asker,
David Redmond,
Ali Esmailiyan,
Panagiotis Giounanlis,
Hans Haenlein,
Xuton Wu,
Andrii Sokolov,
Dennis Andrade-Miceli,
Andrew K. Mitchell,
Robert Bogdan Staszewski
Abstract:
The proliferation of quantum computing technologies has fueled the race to build a practical quantum computer. The spectrum of the innovation is wide and encompasses many aspects of this technology, such as the qubit, control and detection mechanism, cryogenic electronics, and system integration. A few of those emerging technologies are poised for successful monolithic integration of cryogenic ele…
▽ More
The proliferation of quantum computing technologies has fueled the race to build a practical quantum computer. The spectrum of the innovation is wide and encompasses many aspects of this technology, such as the qubit, control and detection mechanism, cryogenic electronics, and system integration. A few of those emerging technologies are poised for successful monolithic integration of cryogenic electronics with the quantum structure where the qubits reside. In this work, we present a fully integrated Quantum Processor Unit in which the quantum core is co-located with control and detection circuits on the same die in a commercial 22-nm FD-SOI process from GlobalFoundries. The system described in this work comprises a two dimensional (2D) 240 qubits array integrated with 8 detectors and 32 injectors operating at 3K and inside a two-stage Gifford-McMahon cryo-cooler. The power consumption of each detector and injector is 1mW and 0.27mW, respectively. The control sequence is programmed into an on-chip pattern generator that acts as a command and control block for all hardware in the Quantum Processor Unit. Using the aforementioned apparatus, we performed a quantum resonant tunneling experiment on two qubits inside the 2D qubit array. With supporting lab measurements, we demonstrate the feasibility of the proposed architecture in scaling-up the existing quantum core to thousands of qubits.
△ Less
Submitted 4 February, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Photonic RF channelizer based on a 90 wavelength optical soliton crystal 49GHz Kerr microcomb
Authors:
Xingyuan Xu,
Mengxi Tan,
Jiayang Wu,
Andreas Boes,
Thach G. Nguyen,
Sai T. Chu,
Brent E. Little,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive opt…
▽ More
We report a broadband radio frequency (RF) channelizer with up to 92 channels using a coherent microcomb source. A soliton crystal microcomb, generated by a 49 GHz micro-ring resonator (MRR), is used as a multi-wavelength source. Due to its ultra-low comb spacing, up to 92 wavelengths are available in the C band, yielding a broad operation bandwidth. Another high-Q MRR is employed as a passive optical periodic filter to slice the RF spectrum with a high resolution of 121.4 MHz. We experimentally achieve an instantaneous RF operation bandwidth of 8.08 GHz and verify RF channelization up to 17.55 GHz via thermal tuning. Our approach is a significant step towards the monolithically integrated photonic RF receivers with reduced complexity, size, and unprecedented performance, which is important for wide RF applications ranging from broadband analog signal processing to digital-compatible signal detection.
△ Less
Submitted 20 April, 2020;
originally announced May 2020.
-
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion
Authors:
Siddhant Gangapurwala,
Alexander Mitchell,
Ioannis Havoutis
Abstract:
Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee…
▽ More
Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Microwave photonic fractional Hilbert transformer with an integrated optical soliton crystal micro-comb
Authors:
Mengxi Tan,
Xingyuan Xu,
Bill Corcoran,
Jiayang Wu,
Andreas Boes,
Thach G. Nguyen,
Sai T. Chu,
Brent E. Little,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitra…
▽ More
We report a photonic microwave and RF fractional Hilbert transformer based on an integrated Kerr micro-comb source. The micro-comb source has a free spectral range (FSR) of 50GHz, generating a large number of comb lines that serve as a high-performance multi-wavelength source for the transformer. By programming and sha** the comb lines according to calculated tap weights, we achieve both arbitrary fractional orders and a broad operation bandwidth. We experimentally characterize the RF amplitude and phase response for different fractional orders and perform system demonstrations of real-time fractional Hilbert transforms. We achieve a phase ripple of < 0.15 rad within the 3-dB pass-band, with bandwidths ranging from 5 to 9 octaves, depending on the order. The experimental results show good agreement with theory, confirming the effectiveness of our approach as a new way to implement high-performance fractional Hilbert transformers with broad processing bandwidth, high reconfigurability, and greatly reduced size and complexity.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Microwave and RF signal processing based on integrated soliton crystal optical microcombs
Authors:
Xingyuan Xu,
Mengxi Tan,
Jiayang Wu,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The stro…
▽ More
Microcombs are powerful tools as sources of multiple wavelength channels for photonic RF signal processing. They offer a compact device footprint, large numbers of wavelengths, and wide Nyquist bands. Here, we review recent progress on microcomb-based photonic RF signal processors, including true time delays, reconfigurable filters, Hilbert transformers, differentiators, and channelizers. The strong potential of optical micro-combs for RF photonics applications in terms of functions and integrability is also discussed.
△ Less
Submitted 7 September, 2019;
originally announced September 2019.
-
Photonic single sideband RF generator based on an integrated optical micro-ring resonator
Authors:
Xingyuan Xu,
Jiayang Wu,
Mengxi Tan,
Thach G. Nguyen,
Sai T. Chu,
Brent E. Little,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator.…
▽ More
We demonstrate narrowband orthogonally polarized optical RF single sideband generation as well as dual-channel equalization based on an integrated dual-polarization-mode high-Q microring resonator. The device operates in the optical communications band and enables narrowband RF operation at either 16.6 GHz or 32.2 GHz, determined by the free spectral range and TE/TM mode interval in the resonator. We achieve a very large dynamic tuning range of over 55 dB for both the optical carrier-to-sideband ratio and the dual-channel RF equalization.
△ Less
Submitted 7 August, 2018;
originally announced August 2018.
-
High-order Radio Frequency Differentiation via Photonic Signal Processing with an Integrated Micro-resonator Kerr Optical Frequency Comb Source
Authors:
Xingyuan Xu,
Jiayang Wu,
Mehrdad Shoeiby,
Sai T. Chu,
Brent E. Little,
Roberto Morandotti,
Arnan Mitchell,
David J. Moss
Abstract:
We demonstrate the use of integrated micro-resonator based optical frequency comb sources as the basis for transversal filtering functions for microwave and radio frequency photonic filtering and advanced functions.
We demonstrate the use of integrated micro-resonator based optical frequency comb sources as the basis for transversal filtering functions for microwave and radio frequency photonic filtering and advanced functions.
△ Less
Submitted 7 April, 2018;
originally announced April 2018.