Showing 1–2 of 2 results for author: Swedlow, N

Search v0.5.6 released 2020-02-24

arXiv:2302.08202 [pdf]

eess.AS cs.SD

DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement

Authors: Aaron Master, Lie Lu, Jonas Samuelsson, Heidi-Maria Lehtonen, Scott Norcross, Nathan Swedlow, Audrey Howard

Abstract: Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technolo… ▽ More Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems. △ Less

Submitted 22 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 5 pages, 4 figures. To be published in ICASSP 2023
arXiv:2211.14378 [pdf]

eess.AS cs.SD eess.SP

Stereo Speech Enhancement Using Custom Mid-Side Signals and Monaural Processing

Authors: Aaron Master, Lie Lu, Nathan Swedlow

Abstract: Speech Enhancement (SE) systems typically operate on monaural input and are used for applications including voice communications and capture cleanup for user generated content. Recent advancements and changes in the devices used for these applications are likely to lead to an increase in the amount of two-channel content for the same applications. However, SE systems are typically designed for mon… ▽ More Speech Enhancement (SE) systems typically operate on monaural input and are used for applications including voice communications and capture cleanup for user generated content. Recent advancements and changes in the devices used for these applications are likely to lead to an increase in the amount of two-channel content for the same applications. However, SE systems are typically designed for monaural input; stereo results produced using trivial methods such as channel independent or mid-side processing may be unsatisfactory, including substantial speech distortions. To address this, we propose a system which creates a novel representation of stereo signals called Custom Mid-Side Signals (CMSS). CMSS allow benefits of mid-side signals for center-panned speech to be extended to a much larger class of input signals. This in turn allows any existing monaural SE system to operate as an efficient stereo system by processing the custom mid signal. We describe how the parameters needed for CMSS can be efficiently estimated by a component of the spatio-level filtering source separation system. Subjective listening using state-of-the-art deep learning-based SE systems on stereo content with various speech mixing styles shows that CMSS processing leads to improved speech quality at approximately half the cost of channel-independent processing. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 12 pages, 5 figures. Submitted to the Journal of the Audio Engineering Society

Search v0.5.6 released 2020-02-24