Search | arXiv e-print repository

Auxiliary Learning as a step towards Artificial General Intelligence

Abstract: Auxiliary Learning is a machine learning approach in which the model acknowledges the existence of objects that do not come under any of its learned categories.The name Auxiliary learning was chosen due to the introduction of an auxiliary class. The paper focuses on increasing the generality of existing narrow purpose neural networks and also highlights the need to handle unknown objects. The Cat… ▽ More Auxiliary Learning is a machine learning approach in which the model acknowledges the existence of objects that do not come under any of its learned categories.The name Auxiliary learning was chosen due to the introduction of an auxiliary class. The paper focuses on increasing the generality of existing narrow purpose neural networks and also highlights the need to handle unknown objects. The Cat & Dog binary classifier is taken as an example throughout the paper. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2208.06990 [pdf, other]

Deepfake Detection using ImageNet models and Temporal Images of 468 Facial Landmarks

Authors: Christeen T Jose

Abstract: This paper presents our results and findings on the use of temporal images for deepfake detection. We modelled temporal relations that exist in the movement of 468 facial landmarks across frames of a given video as spatial relations by constructing an image (referred to as temporal image) using the pixel values at these facial landmarks. CNNs are capable of recognizing spatial relationships that e… ▽ More This paper presents our results and findings on the use of temporal images for deepfake detection. We modelled temporal relations that exist in the movement of 468 facial landmarks across frames of a given video as spatial relations by constructing an image (referred to as temporal image) using the pixel values at these facial landmarks. CNNs are capable of recognizing spatial relationships that exist between the pixels of a given image. 10 different ImageNet models were considered for the study. △ Less

Submitted 14 August, 2022; originally announced August 2022.

arXiv:2207.06423 [pdf, ps, other]

Wakeword Detection under Distribution Shifts

Authors: Sree Hari Krishnan Parthasarathi, Lu Zeng, Christin Jose, Joseph Wang

Abstract: We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of t… ▽ More We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2206.07261 [pdf, other]

doi 10.21437/Interspeech.2022-10608

Latency Control for Keyword Spotting

Authors: Christin Jose, Joseph Wang, Grant P. Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis

Abstract: Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function withou… ▽ More Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function without explicit knowledge of the keyword endpoint. Through a single, tunable hyperparameter, our approach enables one to balance detection latency and accuracy for the targeted application. Empirically, we show that our approach gives superior performance under latency constraints when compared to existing methods. Namely, we make a substantial 25\% relative false accepts improvement for a fixed latency target when compared to the baseline state-of-the-art. We also show that when our approach is used in conjunction with a max-pooling loss, we are able to improve relative false accepts by 25 % at a fixed latency when compared to cross entropy loss. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Proceedings of INTERSPEECH

arXiv:2109.14725 [pdf, other]

Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting

Authors: Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla

Abstract: In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architectu… ▽ More In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architecture, and we can get up to 32% reduction in False Accepts at a 50k parameter budget with 75% reduction in parameter size compared to word-level Dense Neural Network models. We discuss solutions to the challenging problem of performing inference on streaming audio with this architecture, as well as differences in start-end index errors and latency in comparison to CNN, DNN, and DNN-HMM models. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.12941

ACM Class: I.2.0

arXiv:2008.03790 [pdf, other]

doi 10.21437/Interspeech.2020-1491

Accurate Detection of Wake Word Start and End Using a CNN

Authors: Christin Jose, Yuriy Mishchenko, Thibaud Senechal, Anish Shah, Alex Escott, Shiv Vitaladevuni

Abstract: Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we prop… ▽ More Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods for detecting the endpoints of wake words in neural KWS that use single-stage word-level neural networks. Our results show that the new techniques give superior accuracy for detecting wake words' endpoints of up to 50 msec standard error versus human annotations, on par with the conventional Acoustic Model plus HMM forced alignment. To our knowledge, this is the first study of wake word endpoints detection methods for single-stage neural KWS. △ Less

Submitted 9 August, 2020; originally announced August 2020.

Comments: Proceedings of INTERSPEECH

Journal ref: Interspeech 2020

arXiv:1707.09299 [pdf, other]

The WILDTRACK Multi-Camera Person Dataset

Authors: Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Louis Lettry, Pascal Fua, Luc Van Gool, François Fleuret

Abstract: People detection methods are highly sensitive to the perpetual occlusions among the targets. As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. The seven-static… ▽ More People detection methods are highly sensitive to the perpetual occlusions among the targets. As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. The seven-static-camera set-up captures realistic and challenging scenarios of walking people. Notably, its camera calibration with jointly high-precision projection widens the range of algorithms which may make use of this dataset. In aim to help accelerate the research on automatic camera calibration, such annotations also accompany this dataset. Furthermore, the rich-in-appearance visual context of the pedestrian class makes this dataset attractive for monocular pedestrian detection as well, since: the HD cameras are placed relatively close to the people, and the size of the dataset further increases seven-fold. In summary, we overview existing multi-camera datasets and detection methods, enumerate details of our dataset, and we benchmark multi-camera state of the art detectors on this new dataset. △ Less

Submitted 28 July, 2017; originally announced July 2017.

arXiv:1705.10142 [pdf, other]

Kronecker Recurrent Units

Authors: Cijo Jose, Moustpaha Cisse, Francois Fleuret

Abstract: Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achiev… ▽ More Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced. △ Less

Submitted 31 December, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

arXiv:1603.00370 [pdf, other]

Scalable Metric Learning via Weighted Approximate Rank Component Analysis

Authors: Cijo Jose, Francois Fleuret

Abstract: We are interested in the large-scale learning of Mahalanobis distances, with a particular focus on person re-identification. We propose a metric learning formulation called Weighted Approximate Rank Component Analysis (WARCA). WARCA optimizes the precision at top ranks by combining the WARP loss with a regularizer that favors orthonormal linear map**s, and avoids rank-deficient embeddings. Usi… ▽ More We are interested in the large-scale learning of Mahalanobis distances, with a particular focus on person re-identification. We propose a metric learning formulation called Weighted Approximate Rank Component Analysis (WARCA). WARCA optimizes the precision at top ranks by combining the WARP loss with a regularizer that favors orthonormal linear map**s, and avoids rank-deficient embeddings. Using this new regularizer allows us to adapt the large-scale WSABIE procedure and to leverage the Adam stochastic optimization algorithm, which results in an algorithm that scales gracefully to very large data-sets. Also, we derive a kernelized version which allows to take advantage of state-of-the-art features for re-identification when data-set size permits kernel computation. Benchmarks on recent and standard re-identification data-sets show that our method beats existing state-of-the-art techniques both in term of accuracy and speed. We also provide experimental analysis to shade lights on the properties of the regularizer we use, and how it improves performance. △ Less

Submitted 23 March, 2016; v1 submitted 1 March, 2016; originally announced March 2016.

arXiv:1310.4909 [pdf]

doi 10.5121/acij.2013.4501

Text Classification For Authorship Attribution Analysis

Authors: M. Sudheep Elayidom, Chinchu Jose, Anitta Puthussery, Neenu K Sasi

Abstract: Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical methods can be used to set apart the approach of an author numerically. The basic methodologies that are made use in computational stylometry are word length, senten… ▽ More Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical methods can be used to set apart the approach of an author numerically. The basic methodologies that are made use in computational stylometry are word length, sentence length, vocabulary affluence, frequencies etc. Each author has an inborn style of writing, which is particular to himself. Statistical quantitative techniques can be used to differentiate the approach of an author in a numerical way. The problem can be broken down into three sub problems as author identification, author characterization and similarity detection. The steps involved are pre-processing, extracting features, classification and author identification. For this different classifiers can be used. Here fuzzy learning classifier and SVM are used. After author identification the SVM was found to have more accuracy than Fuzzy classifier. Later combined the classifiers to obtain a better accuracy when compared to individual SVM and fuzzy classifier. △ Less

Submitted 18 October, 2013; originally announced October 2013.

Comments: 10 pages

Journal ref: Advanced Computing: An International Journal (ACIJ), Vol.4, No.5, September 2013

Showing 1–10 of 10 results for author: Jose, C