Search | arXiv e-print repository

Termination in Concurrency, Revisited

Authors: Joseph W. N. Paulus, Jorge A. Pérez, Daniele Nantes-Sobrinho

Abstract: Termination is a central property in sequential programming models: a term is terminating if all its reduction sequences are finite. Termination is also important in concurrency in general, and for message-passing programs in particular. A variety of type systems that enforce termination by ty** have been developed. In this paper, we rigorously compare several type systems for $π$-calculus proce… ▽ More Termination is a central property in sequential programming models: a term is terminating if all its reduction sequences are finite. Termination is also important in concurrency in general, and for message-passing programs in particular. A variety of type systems that enforce termination by ty** have been developed. In this paper, we rigorously compare several type systems for $π$-calculus processes from the unifying perspective of termination. Adopting session types as reference framework, we consider two different type systems: one follows Deng and Sangiorgi's weight-based approach; the other is Caires and Pfenning's Curry-Howard correspondence between linear logic and session types. Our technical results precisely connect these very different type systems, and shed light on the classes of client/server interactions they admit as correct. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2206.02125 [pdf, other]

Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction

Authors: Jouni Paulus, Matteo Torcoli

Abstract: A geometrically-motivated method for primary-ambient decomposition is proposed and evaluated in an up-mixing application. The method consists of two steps, accommodating a particularly intuitive explanation. The first step consists of signal-adaptive rotations applied on the input stereo scene, which translate the primary sound sources into the center of the rotated scene. The second step applies… ▽ More A geometrically-motivated method for primary-ambient decomposition is proposed and evaluated in an up-mixing application. The method consists of two steps, accommodating a particularly intuitive explanation. The first step consists of signal-adaptive rotations applied on the input stereo scene, which translate the primary sound sources into the center of the rotated scene. The second step applies a center-channel extraction method, based on a simple signal model and optimal in the mean-squared-error sense. The performance is evaluated by using the estimated ambient component to enable surround sound starting from real-world stereo signals. The participants in the reported listening test are asked to adjust the audio scene envelopment and find the audio settings that pleases them the most. The possibility for up-mixing enabled by the proposed method is used extensively, and the user satisfaction is significantly increased compared to the original stereo mix. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: accepted into EUSIPCO 2022

arXiv:2206.02124 [pdf, other]

Sampling Frequency Independent Dialogue Separation

Authors: Jouni Paulus, Matteo Torcoli

Abstract: In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audi… ▽ More In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audio at 48 kHz. The separated audio sources are compared with the ones produced by the same model architectures trained with 48 kHz versions of the same training data. A listening test and computational measures show that there is no significant perceptual difference between the models trained with 8 kHz or with 48 kHz. This transferability of the learned parameters allows for a faster and computationally less costly training. It also enables using training datasets available at a lower sampling frequency than the one needed by the application at hand, or using data collections with multiple sampling frequencies. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: accepted into EUSIPCO 2022

arXiv:2205.00680 [pdf, ps, other]

Typed Non-determinism in Functional and Concurrent Calculi

Authors: Bas van den Heuvel, Joseph W. N. Paulus, Daniele Nantes-Sobrinho, Jorge A. Pérez

Abstract: We study functional and concurrent calculi with non-determinism, along with type systems to control resources based on linearity. The interplay between non-determinism and linearity is delicate: careless handling of branches can discard resources meant to be used exactly once. Here we go beyond prior work by considering non-determinism in its standard sense: once a branch is selected, the rest are… ▽ More We study functional and concurrent calculi with non-determinism, along with type systems to control resources based on linearity. The interplay between non-determinism and linearity is delicate: careless handling of branches can discard resources meant to be used exactly once. Here we go beyond prior work by considering non-determinism in its standard sense: once a branch is selected, the rest are discarded. Our technical contributions are three-fold. First, we introduce a $π$-calculus with non-deterministic choice, governed by session types. Second, we introduce a resource $λ$-calculus, governed by intersection types, in which non-determinism concerns fetching of resources from bags. Finally, we connect our two typed non-deterministic calculi via a correct translation. △ Less

Submitted 29 September, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.02661 [pdf, other]

CAIPI in Practice: Towards Explainable Interactive Medical Image Classification

Authors: Emanuel Slany, Yannik Ott, Stephan Scheele, Jan Paulus, Ute Schmid

Abstract: Would you trust physicians if they cannot explain their decisions to you? Medical diagnostics using machine learning gained enormously in importance within the last decade. However, without further enhancements many state-of-the-art machine learning methods are not suitable for medical application. The most important reasons are insufficient data set quality and the black-box behavior of machine l… ▽ More Would you trust physicians if they cannot explain their decisions to you? Medical diagnostics using machine learning gained enormously in importance within the last decade. However, without further enhancements many state-of-the-art machine learning methods are not suitable for medical application. The most important reasons are insufficient data set quality and the black-box behavior of machine learning algorithms such as Deep Learning models. Consequently, end-users cannot correct the model's decisions and the corresponding explanations. The latter is crucial for the trustworthiness of machine learning in the medical domain. The research field explainable interactive machine learning searches for methods that address both shortcomings. This paper extends the explainable and interactive CAIPI algorithm and provides an interface to simplify human-in-the-loop approaches for image classification. The interface enables the end-user (1) to investigate and (2) to correct the model's prediction and explanation, and (3) to influence the data set quality. After CAIPI optimization with only a single counterexample per iteration, the model achieves an accuracy of $97.48\%$ on the Medical MNIST and $95.02\%$ on the Fashion MNIST. This accuracy is approximately equal to state-of-the-art Deep Learning optimization procedures. Besides, CAIPI reduces the labeling effort by approximately $80\%$. △ Less

Submitted 31 May, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Manuscript accepted at IFIP AIAI 2022, correct typo in Discussion

arXiv:2112.09494 [pdf]

Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement

Authors: Matteo Torcoli, Christian Simon, Jouni Paulus, Davide Straninger, Alfred Riedel, Volker Koch, Stefan Wits, Daniela Rieger, Harald Fuchs, Christian Uhle, Stefan Meltzer, Adrian Murtaza

Abstract: Difficulties in following speech due to loud background sounds are common in broadcasting. Object-based audio, e.g., MPEG-H Audio solves this problem by providing a user-adjustable speech level. While object-based audio is gaining momentum, transitioning to it requires time and effort. Also, lots of content exists, produced and archived outside the object-based workflows. To address this, Fraunhof… ▽ More Difficulties in following speech due to loud background sounds are common in broadcasting. Object-based audio, e.g., MPEG-H Audio solves this problem by providing a user-adjustable speech level. While object-based audio is gaining momentum, transitioning to it requires time and effort. Also, lots of content exists, produced and archived outside the object-based workflows. To address this, Fraunhofer IIS has developed a deep-learning solution called Dialog+, capable of enabling speech level personalization also for content with only the final audio tracks available. This paper reports on public field tests evaluating Dialog+, conducted together with Westdeutscher Rundfunk (WDR) and Bayerischer Rundfunk (BR), starting from September 2020. To our knowledge, these are the first large-scale tests of this kind. As part of one of these, a survey with more than 2,000 participants showed that 90% of the people above 60 years old have problems in understanding speech in TV "often" or "very often". Overall, 83% of the participants liked the possibility to switch to Dialog+, including those who do not normally struggle with speech intelligibility. Dialog+ introduces a clear benefit for the audience, filling the gap between object-based broadcasting and traditionally produced material. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: Presented at IBC 2021 (International Broadcasting Convention)

arXiv:2112.01593 [pdf, other]

Types and Terms Translated: Unrestricted Resources in Encoding Functions as Processes (Extended Version)

Authors: Joseph W. N. Paulus, Daniele Nantes-Sobrinho, Jorge A. Pérez

Abstract: Type-preserving translations are effective rigorous tools in the study of core programming calculi. In this paper, we develop a new typed translation that connects sequential and concurrent calculi; it is governed by type systems that control resource consumption. Our main contribution is the source language, a new resource $λ$-calculus with non-determinism and failures, dubbed \ulamf. In \ulamf,… ▽ More Type-preserving translations are effective rigorous tools in the study of core programming calculi. In this paper, we develop a new typed translation that connects sequential and concurrent calculi; it is governed by type systems that control resource consumption. Our main contribution is the source language, a new resource $λ$-calculus with non-determinism and failures, dubbed \ulamf. In \ulamf, resources are split into linear and unrestricted; failures are explicit and arise from this distinction. We define a type system based on intersection types to control resources and fail-prone computation. The target language is \spi, an existing session-typed $π$-calculus that results from a Curry-Howard correspondence between linear logic and session types. Our typed translation subsumes our prior work; interestingly, it treats unrestricted resources in \lamrfailunres as client-server session behaviours in \spi. △ Less

Submitted 30 May, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: 24 pages, plus appendices. Extended version of a paper in the Post-proceedings of TYPES 2021. arXiv admin note: text overlap with arXiv:2104.14759

arXiv:2107.10562 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053789

Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

Authors: Christian Uhle, Matteo Torcoli, Jouni Paulus

Abstract: Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, qua… ▽ More Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, quantified using the Artifact-related Perceptual Score, meets an adjustable target. Subjective evaluations indicate that consistent sound quality is obtained across various input signals. Our experiments show that the proposed method is able to control the trade-off with an accuracy that is adequate for real-world dialogue enhancement applications. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: Accepted paper at ICASSP 2020

Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2107.10151 [pdf, other]

doi 10.1109/WASPAA52581.2021.9632756

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

Authors: Matteo Torcoli, Jouni Paulus, Thorsten Kastner, Christian Uhle

Abstract: Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source s… ▽ More Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output signal (rDNN2f). It is shown that iDNN2f achieves very strong correlation with the original measure on the test data (Pearson r=0.99), while performance decreases for nDNN2f (r>=0.91) and rDNN2f (r>=0.82). The non-intrusive estimate nDNN2f is mapped to select item-dependent remixing gains with the aim of maximizing the interferer attenuation under a constraint on the minimum quality of the remixed output (e.g., audible but not annoying deteriorations). A listening test shows that this is successfully achieved even with very different selected gains (up to 23 dB difference). △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Manuscript accepted for the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

arXiv:2106.09093 [pdf, other]

A Hands-on Comparison of DNNs for Dialog Separation Using Transfer Learning from Music Source Separation

Authors: Martin Strauss, Jouni Paulus, Matteo Torcoli, Bernd Edler

Abstract: This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the consid… ▽ More This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the considered broadcast content, and vocals separation in music is considered as a parallel for dialog separation in the target application domain. These similarities are assumed to enable transfer learning between the tasks. Three models pre-trained on music (Open-Unmix, Spleeter, and Conv-TasNet) are considered in the experiments, and fine-tuned with real broadcast data. The performance of the models is evaluated before and after fine-tuning with computational evaluation metrics (SI-SIRi, SI-SDRi, 2f-model), as well as with a listening test simulating an application where the non-speech signal is partially attenuated, e.g., for better speech intelligibility. The evaluations include two reference systems specifically developed for dialog separation. The results indicate that pre-trained music source separation models can be used for dialog separation to some degree, and that they benefit from the fine-tuning, reaching a performance close to task-specific solutions. △ Less

Submitted 22 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: accepted in INTERSPEECH 2021

arXiv:2104.14759 [pdf, other]

doi 10.46298/lmcs-19(4:1)2023

Non-Deterministic Functions as Non-Deterministic Processes (Extended Version)

Authors: Joseph W. N. Paulus, Daniele Nantes-Sobrinho, Jorge A. Pérez

Abstract: We study encodings of the lambda-calculus into the pi-calculus in the unexplored case of calculi with non-determinism and failures. On the sequential side, we consider lambdafail, a new non-deterministic calculus in which intersection types control resources (terms); on the concurrent side, we consider spi, a pi-calculus in which non-determinism and failure rest upon a Curry-Howard correspondence… ▽ More We study encodings of the lambda-calculus into the pi-calculus in the unexplored case of calculi with non-determinism and failures. On the sequential side, we consider lambdafail, a new non-deterministic calculus in which intersection types control resources (terms); on the concurrent side, we consider spi, a pi-calculus in which non-determinism and failure rest upon a Curry-Howard correspondence between linear logic and session types. We present a typed encoding of lambdafail into spi and establish its correctness. Our encoding precisely explains the interplay of non-deterministic and fail-prone evaluation in lambdafail via typed processes in spi. In particular, it shows how failures in sequential evaluation (absence/excess of resources) can be neatly codified as interaction protocols. △ Less

Submitted 9 October, 2023; v1 submitted 30 April, 2021; originally announced April 2021.

Journal ref: Logical Methods in Computer Science, Volume 19, Issue 4 (October 10, 2023) lmcs:8812

arXiv:1909.11549 [pdf, other]

MPEG-H Audio for Improving Accessibility in Broadcasting and Streaming

Authors: Christian Simon, Matteo Torcoli, Jouni Paulus

Abstract: Broadcasting and streaming services still suffer from various levels of accessibility barriers for a significant portion of the population, limiting the access to information and culture, and in the most severe cases limiting the empowerment of people. This paper provides a brief overview of some of the most common accessibility barriers encountered. It then gives a short introduction to object-ba… ▽ More Broadcasting and streaming services still suffer from various levels of accessibility barriers for a significant portion of the population, limiting the access to information and culture, and in the most severe cases limiting the empowerment of people. This paper provides a brief overview of some of the most common accessibility barriers encountered. It then gives a short introduction to object-based audio (OBA) production and transport, focusing on the aspects relevant for lowering accessibility barriers. MPEG-H Audio is used as a concrete example of an OBA system already deployed. Two example cases (dialog enhancement and audio description) are used to demonstrate in detail the simplicity of producing MPEG-H Audio content providing improved accessibility. Several other possibilities are outlined briefly. We show that using OBA for broadcasting and streaming content allows offering several accessibility features in a flexible manner, requiring only small changes to the existing production workflow, assuming the receiver supports the functionality. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: White Paper

Showing 1–12 of 12 results for author: Paulus, J