Search | arXiv e-print repository

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Abstract: Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two… ▽ More Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2211.09536 [pdf, other]

Towards Building Text-To-Speech Systems for the Next Billion Users

Authors: Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

Abstract: Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively l… ▽ More Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform. △ Less

Submitted 17 February, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted at ICASSP 2023. Gokul and Praveen contributed equally

arXiv:2210.05916 [pdf, other]

Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features

Authors: Gokul Karthik Kumar, Karthik Nandakumar

Abstract: Hateful memes are a growing menace on social media. While the image and its corresponding text in a meme are related, they do not necessarily convey the same meaning when viewed individually. Hence, detecting hateful memes requires careful consideration of both visual and textual information. Multimodal pre-training can be beneficial for this task because it effectively captures the relationship b… ▽ More Hateful memes are a growing menace on social media. While the image and its corresponding text in a meme are related, they do not necessarily convey the same meaning when viewed individually. Hence, detecting hateful memes requires careful consideration of both visual and textual information. Multimodal pre-training can be beneficial for this task because it effectively captures the relationship between the image and the text by representing them in a similar feature space. Furthermore, it is essential to model the interactions between the image and text features through intermediate fusion. Most existing methods either employ multimodal pre-training or intermediate fusion, but not both. In this work, we propose the Hate-CLIPper architecture, which explicitly models the cross-modal interactions between the image and text representations obtained using Contrastive Language-Image Pre-training (CLIP) encoders via a feature interaction matrix (FIM). A simple classifier based on the FIM representation is able to achieve state-of-the-art performance on the Hateful Memes Challenge (HMC) dataset with an AUROC of 85.8, which even surpasses the human performance of 82.65. Experiments on other meme datasets such as Propaganda Memes and TamilMemes also demonstrate the generalizability of the proposed approach. Finally, we analyze the interpretability of the FIM representation and show that cross-modal interactions can indeed facilitate the learning of meaningful concepts. The code for this work is available at https://github.com/gokulkarthik/hateclipper. △ Less

Submitted 17 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022 Workshop on NLP for Positive Impact

arXiv:2205.05543 [pdf, other]

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Authors: Gokul Karthik Kumar, Sahal Shaji Mullappilly, Abhishek Singh Gehlot

Abstract: Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not e… ▽ More Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not exhaustively studied for object detection transformers (DETR, Deformable DETR) as their transformer encoder modules take input in the convolutional neural network (CNN) extracted feature space rather than the image space as in general vision transformers. However, the CNN feature maps still maintain the spatial relationship and we utilize this property to design self-supervised learning approaches to train the encoder of object detection transformers in pretraining and multi-task learning settings. We explore common self-supervised methods based on image reconstruction, masked image modeling and jigsaw. Preliminary experiments in the iSAID dataset demonstrate faster convergence of DETR in the initial epochs in both pretraining and multi-task learning settings; nonetheless, similar improvement is not observed in the case of multi-task learning with Deformable DETR. The code for our experiments with DETR and Deformable DETR are available at https://github.com/gokulkarthik/detr and https://github.com/gokulkarthik/Deformable-DETR respectively. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Final Project for the course "Visual Object Detection And Recognition" (CV703) at MBZUAI

arXiv:2204.05814 [pdf, other]

MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages

Authors: Gokul Karthik Kumar, Abhishek Singh Gehlot, Sahal Shaji Mullappilly, Karthik Nandakumar

Abstract: Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the othe… ▽ More Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of cross-language families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the fine-tuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/gokulkarthik/mucot. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: Accepted for oral presentation at ACL 2022 Workshop on Speech and Language Technologies for Dravidian Languages

arXiv:2105.03833 [pdf, other]

Euclidean Distance-Optimal Post-Processing of Grid-Based Paths

Authors: Guru Koushik Senthil Kumar, Sandip Aine, Maxim Likhachev

Abstract: Paths planned over grids can often be suboptimal in an Euclidean space and contain a large number of unnecessary turns. Consequently, researchers have looked into post-processing techniques to improve the paths after they are planned. In this paper, we propose a novel post-processing technique, called Homotopic Visibility Graph Planning (HVG) which differentiates itself from existing post-processi… ▽ More Paths planned over grids can often be suboptimal in an Euclidean space and contain a large number of unnecessary turns. Consequently, researchers have looked into post-processing techniques to improve the paths after they are planned. In this paper, we propose a novel post-processing technique, called Homotopic Visibility Graph Planning (HVG) which differentiates itself from existing post-processing methods in that it is guaranteed to shorten the path such that it is at least as short as the provably shortest path that lies within the same topological class as the initially computed path. We propose the algorithm, provide proofs and compare it experimentally against other post-processing methods and any-angle planning algorithms. △ Less

Submitted 9 May, 2021; originally announced May 2021.

arXiv:2006.15722 [pdf, other]

Deep Probabilistic Accelerated Evaluation: A Robust Certifiable Rare-Event Simulation Methodology for Black-Box Safety-Critical Systems

Authors: Mansur Arief, Zhiyuan Huang, Guru Koushik Senthil Kumar, Yuanlu Bai, Shengyi He, Wenhao Ding, Henry Lam, Ding Zhao

Abstract: Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications. Simulation provides a useful platform to evaluate the extremal risks of these systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these learning-based syste… ▽ More Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications. Simulation provides a useful platform to evaluate the extremal risks of these systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these learning-based systems due to their black-box nature that fundamentally undermines its efficiency guarantee, which can lead to under-estimation without diagnostically detected. We propose a framework called Deep Probabilistic Accelerated Evaluation (Deep-PrAE) to design statistically guaranteed IS, by converting black-box samplers that are versatile but could lack guarantees, into one with what we call a relaxed efficiency certificate that allows accurate estimation of bounds on the safety-critical event probability. We present the theory of Deep-PrAE that combines the dominating point concept with rare-event set learning via deep neural network classifiers, and demonstrate its effectiveness in numerical examples including the safety-testing of an intelligent driving algorithm. △ Less

Submitted 8 March, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

arXiv:1807.08535 [pdf]

doi 10.1016/j.jastp.2018.01.010

A preliminary comparison of Na lidar and meteor radar zonal winds during geomagnetic quiet and disturbed conditions

Authors: G. Kishore Kumar, H. Nesse Tyssøy, Bifford P. Williams

Abstract: We investigate the possibility that sufficiently large electric fields and/or ionization during geomagnetic disturbed conditions may invalidate the assumptions applied in the retrieval of neutral horizontal winds from meteor and/or lidar measurements. As per our knowledge, the possible errors in the wind estimation have never been reported. In the present case study, we have been using co-located… ▽ More We investigate the possibility that sufficiently large electric fields and/or ionization during geomagnetic disturbed conditions may invalidate the assumptions applied in the retrieval of neutral horizontal winds from meteor and/or lidar measurements. As per our knowledge, the possible errors in the wind estimation have never been reported. In the present case study, we have been using co-located meteor radar and sodium resonance lidar zonal wind measurements over Andenes (69.27$^{\circ}$N, 16.04$^{\circ}$E) during intense substorms in the declining phase of the January 2005 solar proton event (21-22 January 2005). In total, 14 h of measurements are available for the comparison, which covers both quiet and disturbed conditions. For comparison, the lidar zonal wind measurements are averaged over the same time and altitude as the meteor radar wind measurements. High cross correlations ($\sim$0.8) are found in all height regions. The discrepancies can be explained in light of differences in the observational volumes of the two instruments. Further, we extended the comparison to address the electric field and/or ionization impact on the neutral wind estimation. For the periods of low ionization, the neutral winds estimated with both instruments are quite consistent with each other. During periods of elevated ionization, comparatively large differences are noticed at the highermost altitude, which might be due to the electric field and/or ionization impact on the wind estimation. At present, one event is not sufficient to make any firm conclusion. Further study with more co-located measurements are needed to test the statistical significance of the result. △ Less

Submitted 23 July, 2018; originally announced July 2018.

arXiv:1612.00842 [pdf, other]

doi 10.1016/j.nima.2017.06.052

Design and Performance of a Hybrid Fast and Thermal Neutron Detector

Authors: M. K. Singh, A. Sonay, M. Deniz, M. Agartioglu, G. Asryan, G. Kiran Kumar, H. B. Li, J. Li, F. K. Lin, S. T. Lin, V. Sharma, L. Singh, V. Singh, V. S. Subrahmanyam, A. K. Soma, H. T. Wong, S. W. Yang, I. O. Yildirim, Q. Yue

Abstract: We report the performance and characterization of a custom-built hybrid detector consisting of BC501A liquid scintillator for fast neutrons and BC702 scintillator for thermal neutrons. The calibration and the resolution of the BC501A liquid scintillator detector are performed. The event identification via Pulse Shape Discrimination (PSD) technique is developed in order to distinguish gamma, fast a… ▽ More We report the performance and characterization of a custom-built hybrid detector consisting of BC501A liquid scintillator for fast neutrons and BC702 scintillator for thermal neutrons. The calibration and the resolution of the BC501A liquid scintillator detector are performed. The event identification via Pulse Shape Discrimination (PSD) technique is developed in order to distinguish gamma, fast and thermal neutrons. Monte Carlo simulation packages are developed in GEANT4 to obtain actual neutron energy spectrum from the measured recoil spectrum. The developed methods are tested by reconstruction of 241AmBe(α, n) neutron spectrum. △ Less

Submitted 2 December, 2016; originally announced December 2016.

Comments: 12 pages, 32 figures

arXiv:1609.06518 [pdf, ps, other]

doi 10.1080/03081087.2017.1356801

A note on discrete Borg-type theorems

Authors: V. B. Kiran Kumar, G. Krishna Kumar

Abstract: We consider the discrete versions of the well known Borg theorem and use simple linear algebraic techniques to obtain new versions of the discrete Borg type theorems. To be precise, we prove that the periodic potential of a discrete Schrodinger operator is almost a constant if and only if the possible spectral gaps of the operator are of small width. This result is further extended to more general… ▽ More We consider the discrete versions of the well known Borg theorem and use simple linear algebraic techniques to obtain new versions of the discrete Borg type theorems. To be precise, we prove that the periodic potential of a discrete Schrodinger operator is almost a constant if and only if the possible spectral gaps of the operator are of small width. This result is further extended to more general settings and the connection to the well known Ten Martini problem is also discussed. △ Less

Submitted 17 September, 2019; v1 submitted 21 September, 2016; originally announced September 2016.

Journal ref: LINEAR AND MULTILINEAR ALGEBRA, 2018

arXiv:1601.05471 [pdf, other]

Long-Baseline Neutrino Facility (LBNF) and Deep Underground Neutrino Experiment (DUNE) Conceptual Design Report Volume 1: The LBNF and DUNE Projects

Authors: R. Acciarri, M. A. Acero, M. Adamowski, C. Adams, P. Adamson, S. Adhikari, Z. Ahmad, C. H. Albright, T. Alion, E. Amador, J. Anderson, K. Anderson, C. Andreopoulos, M. Andrews, R. Andrews, I. Anghel, J. d. Anjos, A. Ankowski, M. Antonello, A. ArandaFernandez, A. Ariga, T. Ariga, D. Aristizabal, E. Arrieta-Diaz, K. Aryal , et al. (780 additional authors not shown)

Abstract: This document presents the Conceptual Design Report (CDR) put forward by an international neutrino community to pursue the Deep Underground Neutrino Experiment at the Long-Baseline Neutrino Facility (LBNF/DUNE), a groundbreaking science experiment for long-baseline neutrino oscillation studies and for neutrino astrophysics and nucleon decay searches. The DUNE far detector will be a very large modu… ▽ More This document presents the Conceptual Design Report (CDR) put forward by an international neutrino community to pursue the Deep Underground Neutrino Experiment at the Long-Baseline Neutrino Facility (LBNF/DUNE), a groundbreaking science experiment for long-baseline neutrino oscillation studies and for neutrino astrophysics and nucleon decay searches. The DUNE far detector will be a very large modular liquid argon time-projection chamber (LArTPC) located deep underground, coupled to the LBNF multi-megawatt wide-band neutrino beam. DUNE will also have a high-resolution and high-precision near detector. △ Less

Submitted 20 January, 2016; originally announced January 2016.

arXiv:1411.4802 [pdf, other]

doi 10.1016/j.nima.2016.08.044

Characterization and Performance of Germanium Detectors with sub-keV Sensitivities for Neutrino and Dark Matter Experiments

Authors: The TEXONO Collaboration, A. K. Soma, M. K. Singh, L. Singh, G. Kiran Kumar, F. K. Lin, Q. Du, H. Jiang, S. K. Liu, J. L. Ma, V. Sharma, L. Wang, Y. C. Wu, L. T. Yang, W. Zhao, M. Agartioglu, G. Asryan, Y. Y. Chang, J. H. Chen, Y. C. Chuang, M. Deniz, C. L. Hsu, Y. H. Hsu, T. R. Huang, L. P. Jia , et al. (24 additional authors not shown)

Abstract: Germanium ionization detectors with sensitivities as low as 100 eVee (electron-equivalent energy) open new windows for studies on neutrino and dark matter physics. The relevant physics subjects are summarized. The detectors have to measure physics signals whose amplitude is comparable to that of pedestal electronic noise. To fully exploit this new detector technique, various experimental issues in… ▽ More Germanium ionization detectors with sensitivities as low as 100 eVee (electron-equivalent energy) open new windows for studies on neutrino and dark matter physics. The relevant physics subjects are summarized. The detectors have to measure physics signals whose amplitude is comparable to that of pedestal electronic noise. To fully exploit this new detector technique, various experimental issues including quenching factors, energy reconstruction and calibration, signal triggering and selection as well as evaluation of their associated efficiencies have to be attended. The efforts and results of a research program to address these challenges are presented. △ Less

Submitted 1 September, 2016; v1 submitted 18 November, 2014; originally announced November 2014.

Comments: 18 pages, 18 figures, 3 table; v3 -- Published Version

Journal ref: Nuclear Instruments and Methods A 836, 67-82 (2016)

arXiv:1307.1372 [pdf]

Clustering of Complex Networks and Community Detection Using Group Search Optimization

Authors: G. Kishore Kumar, V. K. Jayaraman

Abstract: Group Search Optimizer(GSO) is one of the best algorithms, is very new in the field of Evolutionary Computing. It is very robust and efficient algorithm, which is inspired by animal searching behaviour. The paper describes an application of GSO to clustering of networks. We have tested GSO against five standard benchmark datasets, GSO algorithm is proved very competitive in terms of accuracy and c… ▽ More Group Search Optimizer(GSO) is one of the best algorithms, is very new in the field of Evolutionary Computing. It is very robust and efficient algorithm, which is inspired by animal searching behaviour. The paper describes an application of GSO to clustering of networks. We have tested GSO against five standard benchmark datasets, GSO algorithm is proved very competitive in terms of accuracy and convergence speed. △ Less

Submitted 19 August, 2013; v1 submitted 4 July, 2013; originally announced July 2013.

Comments: 7 pages, 2 figures

Showing 1–13 of 13 results for author: Kumar, G K