Skip to main content

Showing 1–9 of 9 results for author: Sarkar, A k

Searching in archive eess. Search in all archives.
.
  1. arXiv:2308.05178  [pdf

    eess.IV cs.CV

    An Improved Model for Diabetic Retinopathy Detection by using Transfer Learning and Ensemble Learning

    Authors: Md. Simul Hasan Talukder, Ajay Kirshno Sarkar, Sharmin Akter, Md. Nuhi-Alamin

    Abstract: Diabetic Retinopathy (DR) is an ocular condition caused by a sustained high level of sugar in the blood, which causes the retinal capillaries to block and bleed, causing retinal tissue damage. It usually results in blindness. Early detection can help in lowering the risk of DR and its severity. The robust and accurate prediction and detection of diabetic retinopathy is a challenging task. This pap… ▽ More

    Submitted 3 June, 2023; originally announced August 2023.

    Comments: 22 pages, 7 Tables and 7 Figures

  2. arXiv:2201.06426  [pdf, ps, other

    cs.SD cs.LG eess.AS

    On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification

    Authors: Achintya kr. Sarkar, Zheng-Hua Tan

    Abstract: Deep representation learning has gained significant momentum in advancing text-dependent speaker verification (TD-SV) systems. When designing deep neural networks (DNN) for extracting bottleneck features, key considerations include training targets, activation functions, and loss functions. In this paper, we systematically study the impact of these choices on the performance of TD-SV. For training… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  3. arXiv:2102.02074  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

    Authors: Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

    Abstract: In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set with two methods: (i) transfer learning and (ii) training from scratch. Next… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

  4. arXiv:2011.12536  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

    Authors: Achintya kr. Sarkar, Zheng-Hua Tan

    Abstract: In this letter, we propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision. Next, we explore the bottleneck (BN) feature extracted by training deep neural networks with a self-supervised objective, autoregressive predictive… ▽ More

    Submitted 25 March, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: Copyright (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: IEEE Signal Processing Letters, vol. 28, pp. 364-368, 2021

  5. arXiv:2007.13118  [pdf, other

    eess.AS cs.CV cs.SD

    UIAI System for Short-Duration Speaker Verification Challenge 2020

    Authors: Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

    Abstract: In this work, we present the system description of the UIAI entry for the short-duration speaker verification (SdSV) challenge 2020. Our focus is on Task 1 dedicated to text-dependent speaker verification. We investigate different feature extraction and modeling approaches for automatic speaker verification (ASV) and utterance verification (UV). We have also studied different fusion strategies for… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

  6. arXiv:2007.08004  [pdf, ps, other

    eess.AS cs.SD

    Data augmentation enhanced speaker enrollment for text-dependent speaker verification

    Authors: Achintya Kumar Sarkar, Himangshu Sarma, Priyanka Dwivedi, Zheng-Hua Tan

    Abstract: Data augmentation is commonly used for generating additional data from the available training data to achieve a robust estimation of the parameters of complex models like the one for speaker verification (SV), especially for under-resourced applications. SV involves training speaker-independent (SI) models and speaker-dependent models where speakers are represented by models derived from an SI mod… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Journal ref: Proc. of ICEPE 2020

  7. arXiv:2005.07383  [pdf, ps, other

    eess.AS cs.LG cs.SD

    On Bottleneck Features for Text-Dependent Speaker Verification Using X-vectors

    Authors: Achintya Kumar Sarkar, Zheng-Hua Tan

    Abstract: Applying x-vectors for speaker verification has recently attracted great interest, with the focus being on text-independent speaker verification. In this paper, we study x-vectors for text-dependent speaker verification (TD-SV), which remains unexplored. We further investigate the impact of the different bottleneck (BN) features on the performance of x-vectors, including the recently-introduced ti… ▽ More

    Submitted 1 September, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

  8. arXiv:1906.03588  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

    Authors: Zheng-Hua Tan, Achintya kr. Sarkar, Najim Dehak

    Abstract: This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the s… ▽ More

    Submitted 11 January, 2022; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: Computer Speech & Language, volume 59, January 2020, Pages 1-21

  9. arXiv:1905.04554  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

    Authors: Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

    Abstract: There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study [1] presented a time contrastive learning (TCL) concept to explore the non-stationarit… ▽ More

    Submitted 11 May, 2019; originally announced May 2019.

    Comments: Copyright (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019