-
Termination in Concurrency, Revisited
Authors:
Joseph W. N. Paulus,
Jorge A. Pérez,
Daniele Nantes-Sobrinho
Abstract:
Termination is a central property in sequential programming models: a term is terminating if all its reduction sequences are finite. Termination is also important in concurrency in general, and for message-passing programs in particular. A variety of type systems that enforce termination by ty** have been developed. In this paper, we rigorously compare several type systems for $π$-calculus proce…
▽ More
Termination is a central property in sequential programming models: a term is terminating if all its reduction sequences are finite. Termination is also important in concurrency in general, and for message-passing programs in particular. A variety of type systems that enforce termination by ty** have been developed. In this paper, we rigorously compare several type systems for $π$-calculus processes from the unifying perspective of termination. Adopting session types as reference framework, we consider two different type systems: one follows Deng and Sangiorgi's weight-based approach; the other is Caires and Pfenning's Curry-Howard correspondence between linear logic and session types. Our technical results precisely connect these very different type systems, and shed light on the classes of client/server interactions they admit as correct.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction
Authors:
Jouni Paulus,
Matteo Torcoli
Abstract:
A geometrically-motivated method for primary-ambient decomposition is proposed and evaluated in an up-mixing application. The method consists of two steps, accommodating a particularly intuitive explanation. The first step consists of signal-adaptive rotations applied on the input stereo scene, which translate the primary sound sources into the center of the rotated scene. The second step applies…
▽ More
A geometrically-motivated method for primary-ambient decomposition is proposed and evaluated in an up-mixing application. The method consists of two steps, accommodating a particularly intuitive explanation. The first step consists of signal-adaptive rotations applied on the input stereo scene, which translate the primary sound sources into the center of the rotated scene. The second step applies a center-channel extraction method, based on a simple signal model and optimal in the mean-squared-error sense. The performance is evaluated by using the estimated ambient component to enable surround sound starting from real-world stereo signals. The participants in the reported listening test are asked to adjust the audio scene envelopment and find the audio settings that pleases them the most. The possibility for up-mixing enabled by the proposed method is used extensively, and the user satisfaction is significantly increased compared to the original stereo mix.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Sampling Frequency Independent Dialogue Separation
Authors:
Jouni Paulus,
Matteo Torcoli
Abstract:
In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audi…
▽ More
In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audio at 48 kHz. The separated audio sources are compared with the ones produced by the same model architectures trained with 48 kHz versions of the same training data. A listening test and computational measures show that there is no significant perceptual difference between the models trained with 8 kHz or with 48 kHz. This transferability of the learned parameters allows for a faster and computationally less costly training. It also enables using training datasets available at a lower sampling frequency than the one needed by the application at hand, or using data collections with multiple sampling frequencies.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Typed Non-determinism in Functional and Concurrent Calculi
Authors:
Bas van den Heuvel,
Joseph W. N. Paulus,
Daniele Nantes-Sobrinho,
Jorge A. Pérez
Abstract:
We study functional and concurrent calculi with non-determinism, along with type systems to control resources based on linearity. The interplay between non-determinism and linearity is delicate: careless handling of branches can discard resources meant to be used exactly once. Here we go beyond prior work by considering non-determinism in its standard sense: once a branch is selected, the rest are…
▽ More
We study functional and concurrent calculi with non-determinism, along with type systems to control resources based on linearity. The interplay between non-determinism and linearity is delicate: careless handling of branches can discard resources meant to be used exactly once. Here we go beyond prior work by considering non-determinism in its standard sense: once a branch is selected, the rest are discarded. Our technical contributions are three-fold. First, we introduce a $π$-calculus with non-deterministic choice, governed by session types. Second, we introduce a resource $λ$-calculus, governed by intersection types, in which non-determinism concerns fetching of resources from bags. Finally, we connect our two typed non-deterministic calculi via a correct translation.
△ Less
Submitted 29 September, 2023; v1 submitted 2 May, 2022;
originally announced May 2022.
-
CAIPI in Practice: Towards Explainable Interactive Medical Image Classification
Authors:
Emanuel Slany,
Yannik Ott,
Stephan Scheele,
Jan Paulus,
Ute Schmid
Abstract:
Would you trust physicians if they cannot explain their decisions to you? Medical diagnostics using machine learning gained enormously in importance within the last decade. However, without further enhancements many state-of-the-art machine learning methods are not suitable for medical application. The most important reasons are insufficient data set quality and the black-box behavior of machine l…
▽ More
Would you trust physicians if they cannot explain their decisions to you? Medical diagnostics using machine learning gained enormously in importance within the last decade. However, without further enhancements many state-of-the-art machine learning methods are not suitable for medical application. The most important reasons are insufficient data set quality and the black-box behavior of machine learning algorithms such as Deep Learning models. Consequently, end-users cannot correct the model's decisions and the corresponding explanations. The latter is crucial for the trustworthiness of machine learning in the medical domain. The research field explainable interactive machine learning searches for methods that address both shortcomings. This paper extends the explainable and interactive CAIPI algorithm and provides an interface to simplify human-in-the-loop approaches for image classification. The interface enables the end-user (1) to investigate and (2) to correct the model's prediction and explanation, and (3) to influence the data set quality. After CAIPI optimization with only a single counterexample per iteration, the model achieves an accuracy of $97.48\%$ on the Medical MNIST and $95.02\%$ on the Fashion MNIST. This accuracy is approximately equal to state-of-the-art Deep Learning optimization procedures. Besides, CAIPI reduces the labeling effort by approximately $80\%$.
△ Less
Submitted 31 May, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement
Authors:
Matteo Torcoli,
Christian Simon,
Jouni Paulus,
Davide Straninger,
Alfred Riedel,
Volker Koch,
Stefan Wits,
Daniela Rieger,
Harald Fuchs,
Christian Uhle,
Stefan Meltzer,
Adrian Murtaza
Abstract:
Difficulties in following speech due to loud background sounds are common in broadcasting. Object-based audio, e.g., MPEG-H Audio solves this problem by providing a user-adjustable speech level. While object-based audio is gaining momentum, transitioning to it requires time and effort. Also, lots of content exists, produced and archived outside the object-based workflows. To address this, Fraunhof…
▽ More
Difficulties in following speech due to loud background sounds are common in broadcasting. Object-based audio, e.g., MPEG-H Audio solves this problem by providing a user-adjustable speech level. While object-based audio is gaining momentum, transitioning to it requires time and effort. Also, lots of content exists, produced and archived outside the object-based workflows. To address this, Fraunhofer IIS has developed a deep-learning solution called Dialog+, capable of enabling speech level personalization also for content with only the final audio tracks available. This paper reports on public field tests evaluating Dialog+, conducted together with Westdeutscher Rundfunk (WDR) and Bayerischer Rundfunk (BR), starting from September 2020. To our knowledge, these are the first large-scale tests of this kind. As part of one of these, a survey with more than 2,000 participants showed that 90% of the people above 60 years old have problems in understanding speech in TV "often" or "very often". Overall, 83% of the participants liked the possibility to switch to Dialog+, including those who do not normally struggle with speech intelligibility. Dialog+ introduces a clear benefit for the audience, filling the gap between object-based broadcasting and traditionally produced material.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Types and Terms Translated: Unrestricted Resources in Encoding Functions as Processes (Extended Version)
Authors:
Joseph W. N. Paulus,
Daniele Nantes-Sobrinho,
Jorge A. Pérez
Abstract:
Type-preserving translations are effective rigorous tools in the study of core programming calculi. In this paper, we develop a new typed translation that connects sequential and concurrent calculi; it is governed by type systems that control resource consumption. Our main contribution is the source language, a new resource $λ$-calculus with non-determinism and failures, dubbed \ulamf. In \ulamf,…
▽ More
Type-preserving translations are effective rigorous tools in the study of core programming calculi. In this paper, we develop a new typed translation that connects sequential and concurrent calculi; it is governed by type systems that control resource consumption. Our main contribution is the source language, a new resource $λ$-calculus with non-determinism and failures, dubbed \ulamf. In \ulamf, resources are split into linear and unrestricted; failures are explicit and arise from this distinction. We define a type system based on intersection types to control resources and fail-prone computation. The target language is \spi, an existing session-typed $π$-calculus that results from a Curry-Howard correspondence between linear logic and session types. Our typed translation subsumes our prior work; interestingly, it treats unrestricted resources in \lamrfailunres as client-server session behaviours in \spi.
△ Less
Submitted 30 May, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning
Authors:
Christian Uhle,
Matteo Torcoli,
Jouni Paulus
Abstract:
Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, qua…
▽ More
Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, quantified using the Artifact-related Perceptual Score, meets an adjustable target. Subjective evaluations indicate that consistent sound quality is obtained across various input signals. Our experiments show that the proposed method is able to control the trade-off with an accuracy that is adequate for real-world dialogue enhancement applications.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate
Authors:
Matteo Torcoli,
Jouni Paulus,
Thorsten Kastner,
Christian Uhle
Abstract:
Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source s…
▽ More
Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output signal (rDNN2f). It is shown that iDNN2f achieves very strong correlation with the original measure on the test data (Pearson r=0.99), while performance decreases for nDNN2f (r>=0.91) and rDNN2f (r>=0.82). The non-intrusive estimate nDNN2f is mapped to select item-dependent remixing gains with the aim of maximizing the interferer attenuation under a constraint on the minimum quality of the remixed output (e.g., audible but not annoying deteriorations). A listening test shows that this is successfully achieved even with very different selected gains (up to 23 dB difference).
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
A Hands-on Comparison of DNNs for Dialog Separation Using Transfer Learning from Music Source Separation
Authors:
Martin Strauss,
Jouni Paulus,
Matteo Torcoli,
Bernd Edler
Abstract:
This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the consid…
▽ More
This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the considered broadcast content, and vocals separation in music is considered as a parallel for dialog separation in the target application domain. These similarities are assumed to enable transfer learning between the tasks. Three models pre-trained on music (Open-Unmix, Spleeter, and Conv-TasNet) are considered in the experiments, and fine-tuned with real broadcast data. The performance of the models is evaluated before and after fine-tuning with computational evaluation metrics (SI-SIRi, SI-SDRi, 2f-model), as well as with a listening test simulating an application where the non-speech signal is partially attenuated, e.g., for better speech intelligibility. The evaluations include two reference systems specifically developed for dialog separation. The results indicate that pre-trained music source separation models can be used for dialog separation to some degree, and that they benefit from the fine-tuning, reaching a performance close to task-specific solutions.
△ Less
Submitted 22 June, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Non-Deterministic Functions as Non-Deterministic Processes (Extended Version)
Authors:
Joseph W. N. Paulus,
Daniele Nantes-Sobrinho,
Jorge A. Pérez
Abstract:
We study encodings of the lambda-calculus into the pi-calculus in the unexplored case of calculi with non-determinism and failures. On the sequential side, we consider lambdafail, a new non-deterministic calculus in which intersection types control resources (terms); on the concurrent side, we consider spi, a pi-calculus in which non-determinism and failure rest upon a Curry-Howard correspondence…
▽ More
We study encodings of the lambda-calculus into the pi-calculus in the unexplored case of calculi with non-determinism and failures. On the sequential side, we consider lambdafail, a new non-deterministic calculus in which intersection types control resources (terms); on the concurrent side, we consider spi, a pi-calculus in which non-determinism and failure rest upon a Curry-Howard correspondence between linear logic and session types. We present a typed encoding of lambdafail into spi and establish its correctness. Our encoding precisely explains the interplay of non-deterministic and fail-prone evaluation in lambdafail via typed processes in spi. In particular, it shows how failures in sequential evaluation (absence/excess of resources) can be neatly codified as interaction protocols.
△ Less
Submitted 9 October, 2023; v1 submitted 30 April, 2021;
originally announced April 2021.
-
MPEG-H Audio for Improving Accessibility in Broadcasting and Streaming
Authors:
Christian Simon,
Matteo Torcoli,
Jouni Paulus
Abstract:
Broadcasting and streaming services still suffer from various levels of accessibility barriers for a significant portion of the population, limiting the access to information and culture, and in the most severe cases limiting the empowerment of people. This paper provides a brief overview of some of the most common accessibility barriers encountered. It then gives a short introduction to object-ba…
▽ More
Broadcasting and streaming services still suffer from various levels of accessibility barriers for a significant portion of the population, limiting the access to information and culture, and in the most severe cases limiting the empowerment of people. This paper provides a brief overview of some of the most common accessibility barriers encountered. It then gives a short introduction to object-based audio (OBA) production and transport, focusing on the aspects relevant for lowering accessibility barriers. MPEG-H Audio is used as a concrete example of an OBA system already deployed. Two example cases (dialog enhancement and audio description) are used to demonstrate in detail the simplicity of producing MPEG-H Audio content providing improved accessibility. Several other possibilities are outlined briefly. We show that using OBA for broadcasting and streaming content allows offering several accessibility features in a flexible manner, requiring only small changes to the existing production workflow, assuming the receiver supports the functionality.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.