Search | arXiv e-print repository

Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Authors: Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Abstract: Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-b… ▽ More Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.11778 [pdf, other]

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Abstract: Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two… ▽ More Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:1502.04272 [pdf, ps, other]

doi 10.1109/LSP.2015.2404827

Spatial Stimuli Gradient Sketch Model

Authors: Joshin John Mathew, Alex Pappachen James

Abstract: The inability of automated edge detection methods inspired from primal sketch models to accurately calculate object edges under the influence of pixel noise is an open problem. Extending the principles of image perception i.e. Weber-Fechner law, and Sheperd similarity law, we propose a new edge detection method and formulation that use perceived brightness and neighbourhood similarity calculations… ▽ More The inability of automated edge detection methods inspired from primal sketch models to accurately calculate object edges under the influence of pixel noise is an open problem. Extending the principles of image perception i.e. Weber-Fechner law, and Sheperd similarity law, we propose a new edge detection method and formulation that use perceived brightness and neighbourhood similarity calculations in the determination of robust object edges. The robustness of the detected edges is benchmark against Sobel, SIS, Kirsch, and Prewitt edge detection methods in an example face recognition problem showing statistically significant improvement in recognition accuracy and pixel noise tolerance. △ Less

Submitted 14 February, 2015; originally announced February 2015.

Comments: accepted for publication in IEEE Signal Processing Letters, 2015

Journal ref: Volume: 22 Issue: 9 On page(s): 1336-1339, 2015

arXiv:1303.2439 [pdf]

Voxel-wise Weighted MR Image Enhancement using an Extended Neighborhood Filter

Authors: Joseph Suresh Paul, Joshin John Mathew, Souparnika Kandoth Naroth, Chandrasekar Kesavadas

Abstract: We present an edge preserving and denoising filter for enhancing the features in images, which contain an ROI having a narrow spatial extent. Typical examples include angiograms, or ROI spatially distributed in multiple locations and contained within an outlying region, such as in multiple-sclerosis. The filtering involves determination of multiplicative weights in the spatial domain using an exte… ▽ More We present an edge preserving and denoising filter for enhancing the features in images, which contain an ROI having a narrow spatial extent. Typical examples include angiograms, or ROI spatially distributed in multiple locations and contained within an outlying region, such as in multiple-sclerosis. The filtering involves determination of multiplicative weights in the spatial domain using an extended set of neighborhood directions. Equivalently, the filtering operation may be interpreted as a combination of directional filters in the frequency domain, with selective weighting for spatial frequencies contained within each direction. The advantages of the proposed filter in comparison to specialized non-linear filters, which operate on diffusion principle, are illustrated using numerical phantom data. The performance evaluation is carried out on simulated images from BrainWeb database for multiple-sclerosis, acute ischemic stroke using clinically acquired FLAIR images and MR angiograms. △ Less

Submitted 11 March, 2013; originally announced March 2013.

Showing 1–4 of 4 results for author: Mathew, J J