-
Effective weakly supervised semantic frame induction using expression sharing in hierarchical hidden Markov models
Authors:
Janneke van de Loo,
Jort F. Gemmeke,
Guy De Pauw,
Bart Ons,
Walter Daelemans,
Hugo Van hamme
Abstract:
We present a framework for the induction of semantic frames from utterances in the context of an adaptive command-and-control interface. The system is trained on an individual user's utterances and the corresponding semantic frames representing controls. During training, no prior information on the alignment between utterance segments and frame slots and values is available. In addition, semantic…
▽ More
We present a framework for the induction of semantic frames from utterances in the context of an adaptive command-and-control interface. The system is trained on an individual user's utterances and the corresponding semantic frames representing controls. During training, no prior information on the alignment between utterance segments and frame slots and values is available. In addition, semantic frames in the training data can contain information that is not expressed in the utterances. To tackle this weakly supervised classification task, we propose a framework based on Hidden Markov Models (HMMs). Structural modifications, resulting in a hierarchical HMM, and an extension called expression sharing are introduced to minimize the amount of training time and effort required for the user.
The dataset used for the present study is PATCOR, which contains commands uttered in the context of a vocally guided card game, Patience. Experiments were carried out on orthographic and phonetic transcriptions of commands, segmented on different levels of n-gram granularity. The experimental results show positive effects of all the studied system extensions, with some effect differences between the different input representations. Moreover, evaluation experiments on held-out data with the optimal system configuration show that the extended system is able to achieve high accuracies with relatively small amounts of training data.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
CNN Architectures for Large-Scale Audio Classification
Authors:
Shawn Hershey,
Sourish Chaudhuri,
Daniel P. W. Ellis,
Jort F. Gemmeke,
Aren Jansen,
R. Channing Moore,
Manoj Plakal,
Devin Platt,
Rif A. Saurous,
Bryan Seybold,
Malcolm Slaney,
Ron J. Weiss,
Kevin Wilson
Abstract:
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th…
▽ More
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
△ Less
Submitted 10 January, 2017; v1 submitted 29 September, 2016;
originally announced September 2016.
-
TR02: State dependent oracle masks for improved dynamical features
Authors:
J. F. Gemmeke,
B. Cranen
Abstract:
Using the AURORA-2 digit recognition task, we show that recognition accuracies obtained with classical, SNR based oracle masks can be substantially improved by using a state-dependent mask estimation technique.
Using the AURORA-2 digit recognition task, we show that recognition accuracies obtained with classical, SNR based oracle masks can be substantially improved by using a state-dependent mask estimation technique.
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
TR01: Time-continuous Sparse Imputation
Authors:
J. F. Gemmeke,
B. Cranen
Abstract:
An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing) prior to decoding, and to replace the missing ones by clean speech estimates. We present a novel method to obtain such clean speech estimates. Unlike previous imputation frameworks which work on a frame-by-frame basis, our method focuses o…
▽ More
An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing) prior to decoding, and to replace the missing ones by clean speech estimates. We present a novel method to obtain such clean speech estimates. Unlike previous imputation frameworks which work on a frame-by-frame basis, our method focuses on exploiting information from a large time-context. Using a sliding window approach, denoised speech representations are constructed using a sparse representation of the reliable features in an overcomplete basis of fixed-length exemplar fragments. We demonstrate the potential of our approach with experiments on the AURORA-2 connected digit database.
△ Less
Submitted 16 January, 2009;
originally announced January 2009.