-
Auxiliary Learning as a step towards Artificial General Intelligence
Authors:
Christeen T. Jose
Abstract:
Auxiliary Learning is a machine learning approach in which the model acknowledges the existence of objects that do not come under any of its learned categories.The name Auxiliary learning was chosen due to the introduction of an auxiliary class. The paper focuses on increasing the generality of existing narrow purpose neural networks and also highlights the need to handle unknown objects. The Cat…
▽ More
Auxiliary Learning is a machine learning approach in which the model acknowledges the existence of objects that do not come under any of its learned categories.The name Auxiliary learning was chosen due to the introduction of an auxiliary class. The paper focuses on increasing the generality of existing narrow purpose neural networks and also highlights the need to handle unknown objects. The Cat & Dog binary classifier is taken as an example throughout the paper.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Deepfake Detection using ImageNet models and Temporal Images of 468 Facial Landmarks
Authors:
Christeen T Jose
Abstract:
This paper presents our results and findings on the use of temporal images for deepfake detection. We modelled temporal relations that exist in the movement of 468 facial landmarks across frames of a given video as spatial relations by constructing an image (referred to as temporal image) using the pixel values at these facial landmarks. CNNs are capable of recognizing spatial relationships that e…
▽ More
This paper presents our results and findings on the use of temporal images for deepfake detection. We modelled temporal relations that exist in the movement of 468 facial landmarks across frames of a given video as spatial relations by constructing an image (referred to as temporal image) using the pixel values at these facial landmarks. CNNs are capable of recognizing spatial relationships that exist between the pixels of a given image. 10 different ImageNet models were considered for the study.
△ Less
Submitted 14 August, 2022;
originally announced August 2022.
-
Wakeword Detection under Distribution Shifts
Authors:
Sree Hari Krishnan Parthasarathi,
Lu Zeng,
Christin Jose,
Joseph Wang
Abstract:
We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of t…
▽ More
We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Latency Control for Keyword Spotting
Authors:
Christin Jose,
Joseph Wang,
Grant P. Strimel,
Mohammad Omar Khursheed,
Yuriy Mishchenko,
Brian Kulis
Abstract:
Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function withou…
▽ More
Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function without explicit knowledge of the keyword endpoint. Through a single, tunable hyperparameter, our approach enables one to balance detection latency and accuracy for the targeted application. Empirically, we show that our approach gives superior performance under latency constraints when compared to existing methods. Namely, we make a substantial 25\% relative false accepts improvement for a fixed latency target when compared to the baseline state-of-the-art. We also show that when our approach is used in conjunction with a max-pooling loss, we are able to improve relative false accepts by 25 % at a fixed latency when compared to cross entropy loss.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting
Authors:
Mohammad Omar Khursheed,
Christin Jose,
Rajath Kumar,
Gengshen Fu,
Brian Kulis,
Santosh Kumar Cheekatmalla
Abstract:
In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architectu…
▽ More
In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architecture, and we can get up to 32% reduction in False Accepts at a 50k parameter budget with 75% reduction in parameter size compared to word-level Dense Neural Network models. We discuss solutions to the challenging problem of performing inference on streaming audio with this architecture, as well as differences in start-end index errors and latency in comparison to CNN, DNN, and DNN-HMM models.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Accurate Detection of Wake Word Start and End Using a CNN
Authors:
Christin Jose,
Yuriy Mishchenko,
Thibaud Senechal,
Anish Shah,
Alex Escott,
Shiv Vitaladevuni
Abstract:
Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we prop…
▽ More
Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods for detecting the endpoints of wake words in neural KWS that use single-stage word-level neural networks. Our results show that the new techniques give superior accuracy for detecting wake words' endpoints of up to 50 msec standard error versus human annotations, on par with the conventional Acoustic Model plus HMM forced alignment. To our knowledge, this is the first study of wake word endpoints detection methods for single-stage neural KWS.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
The WILDTRACK Multi-Camera Person Dataset
Authors:
Tatjana Chavdarova,
Pierre Baqué,
Stéphane Bouquet,
Andrii Maksai,
Cijo Jose,
Louis Lettry,
Pascal Fua,
Luc Van Gool,
François Fleuret
Abstract:
People detection methods are highly sensitive to the perpetual occlusions among the targets. As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. The seven-static…
▽ More
People detection methods are highly sensitive to the perpetual occlusions among the targets. As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. The seven-static-camera set-up captures realistic and challenging scenarios of walking people.
Notably, its camera calibration with jointly high-precision projection widens the range of algorithms which may make use of this dataset. In aim to help accelerate the research on automatic camera calibration, such annotations also accompany this dataset.
Furthermore, the rich-in-appearance visual context of the pedestrian class makes this dataset attractive for monocular pedestrian detection as well, since: the HD cameras are placed relatively close to the people, and the size of the dataset further increases seven-fold.
In summary, we overview existing multi-camera datasets and detection methods, enumerate details of our dataset, and we benchmark multi-camera state of the art detectors on this new dataset.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
Kronecker Recurrent Units
Authors:
Cijo Jose,
Moustpaha Cisse,
Francois Fleuret
Abstract:
Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achiev…
▽ More
Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced.
△ Less
Submitted 31 December, 2017; v1 submitted 29 May, 2017;
originally announced May 2017.
-
Scalable Metric Learning via Weighted Approximate Rank Component Analysis
Authors:
Cijo Jose,
Francois Fleuret
Abstract:
We are interested in the large-scale learning of Mahalanobis distances, with a particular focus on person re-identification.
We propose a metric learning formulation called Weighted Approximate Rank Component Analysis (WARCA). WARCA optimizes the precision at top ranks by combining the WARP loss with a regularizer that favors orthonormal linear map**s, and avoids rank-deficient embeddings. Usi…
▽ More
We are interested in the large-scale learning of Mahalanobis distances, with a particular focus on person re-identification.
We propose a metric learning formulation called Weighted Approximate Rank Component Analysis (WARCA). WARCA optimizes the precision at top ranks by combining the WARP loss with a regularizer that favors orthonormal linear map**s, and avoids rank-deficient embeddings. Using this new regularizer allows us to adapt the large-scale WSABIE procedure and to leverage the Adam stochastic optimization algorithm, which results in an algorithm that scales gracefully to very large data-sets. Also, we derive a kernelized version which allows to take advantage of state-of-the-art features for re-identification when data-set size permits kernel computation.
Benchmarks on recent and standard re-identification data-sets show that our method beats existing state-of-the-art techniques both in term of accuracy and speed. We also provide experimental analysis to shade lights on the properties of the regularizer we use, and how it improves performance.
△ Less
Submitted 23 March, 2016; v1 submitted 1 March, 2016;
originally announced March 2016.
-
Text Classification For Authorship Attribution Analysis
Authors:
M. Sudheep Elayidom,
Chinchu Jose,
Anitta Puthussery,
Neenu K Sasi
Abstract:
Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical methods can be used to set apart the approach of an author numerically. The basic methodologies that are made use in computational stylometry are word length, senten…
▽ More
Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical methods can be used to set apart the approach of an author numerically. The basic methodologies that are made use in computational stylometry are word length, sentence length, vocabulary affluence, frequencies etc. Each author has an inborn style of writing, which is particular to himself. Statistical quantitative techniques can be used to differentiate the approach of an author in a numerical way. The problem can be broken down into three sub problems as author identification, author characterization and similarity detection. The steps involved are pre-processing, extracting features, classification and author identification. For this different classifiers can be used. Here fuzzy learning classifier and SVM are used. After author identification the SVM was found to have more accuracy than Fuzzy classifier. Later combined the classifiers to obtain a better accuracy when compared to individual SVM and fuzzy classifier.
△ Less
Submitted 18 October, 2013;
originally announced October 2013.