-
Voice Spoofing Detection Corpus for Single and Multi-order Audio Replays
Authors:
Roland Baumann,
Khalid Mahmood Malik,
Ali Javed,
Andersen Ball,
Brandon Kujawa,
Hafiz Malik
Abstract:
The evolution of modern voice controlled devices (VCDs) in recent years has revolutionized the Internet of Things, and resulted in increased realization of smart homes, personalization and home automation through voice commands. The introduction of VCDs in IoT is expected to give emergence of new subfield of IoT, called Multimedia of Thing (MoT). These VCDs can be exploited in IoT driven environme…
▽ More
The evolution of modern voice controlled devices (VCDs) in recent years has revolutionized the Internet of Things, and resulted in increased realization of smart homes, personalization and home automation through voice commands. The introduction of VCDs in IoT is expected to give emergence of new subfield of IoT, called Multimedia of Thing (MoT). These VCDs can be exploited in IoT driven environment to generate various spoofing attacks including the replays. Replay attacks are generated through replaying the recorded audio of legitimate human speaker with the intent of deceiving the VCDs having speaker verification system. The connectivity among the VCDs can easily be exploited in IoT driven environment to generate a chain of replay attacks (multi-order replay attacks). Existing spoofing detection datasets like ASVspoof and ReMASC contain only the first-order replay recordings against the bonafide audio samples. These datasets can not offer evaluation of the anti-spoofing algorithms capable of detecting the multi-order replay attacks. Additionally, these datasets do not capture the characteristics of microphone arrays, which is an important characteristic of modern VCDs. We need a diverse replay spoofing detection corpus that consists of multi-order replay recordings against the bonafide voice samples. This paper presents a novel voice spoofing detection corpus (VSDC) to evaluate the performance of multi-order replay anti-spoofing methods. The proposed VSDC consists of first and second-order-replay samples against the bonafide audio recordings. Additionally, the proposed VSDC can also be used to evaluate the performance of speaker verification systems as our corpus includes the audio samples of fifteen different speakers. To the best of our knowledge, this is the first publicly available replay spoofing detection corpus comprising of first-order and second-order-replay samples.
△ Less
Submitted 2 September, 2019;
originally announced September 2019.
-
Spoken Language Understanding on the Edge
Authors:
Alaa Saade,
Alice Coucke,
Alexandre Caulier,
Joseph Dureau,
Adrien Ball,
Théodore Bluche,
David Leroy,
Clément Doumouro,
Thibault Gisselbrecht,
Francesco Caltagirone,
Thibaut Lavril,
Maël Primet
Abstract:
We consider the problem of performing Spoken Language Understanding (SLU) on small devices typical of IoT applications. Our contributions are twofold. First, we outline the design of an embedded, private-by-design SLU system and show that it has performance on par with cloud-based commercial solutions. Second, we release the datasets used in our experiments in the interest of reproducibility and i…
▽ More
We consider the problem of performing Spoken Language Understanding (SLU) on small devices typical of IoT applications. Our contributions are twofold. First, we outline the design of an embedded, private-by-design SLU system and show that it has performance on par with cloud-based commercial solutions. Second, we release the datasets used in our experiments in the interest of reproducibility and in the hope that they can prove useful to the SLU community.
△ Less
Submitted 2 October, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
Authors:
Alice Coucke,
Alaa Saade,
Adrien Ball,
Théodore Bluche,
Alexandre Caulier,
David Leroy,
Clément Doumouro,
Thibault Gisselbrecht,
Francesco Caltagirone,
Thibaut Lavril,
Maël Primet,
Joseph Dureau
Abstract:
This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected. Focusing on Automatic Speech Recognition and Natural Language Understanding, we detail our…
▽ More
This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected. Focusing on Automatic Speech Recognition and Natural Language Understanding, we detail our approach to training high-performance Machine Learning models that are small enough to run in real-time on small devices. Additionally, we describe a data generation procedure that provides sufficient, high-quality training data without compromising user privacy.
△ Less
Submitted 6 December, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.