Skip to main content

Showing 1–7 of 7 results for author: Sang, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.00293  [pdf, other

    eess.AS cs.LG cs.SD

    Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

    Authors: Mufan Sang, John H. L. Hansen

    Abstract: With excellent generalization ability, self-supervised speech models have shown impressive performance on various downstream speech tasks in the pre-training and fine-tuning paradigm. However, as the growing size of pre-trained models, fine-tuning becomes practically unfeasible due to heavy computation and storage overhead, as well as the risk of overfitting. Adapters are lightweight modules inser… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024

  2. arXiv:2302.08639  [pdf, other

    eess.AS cs.LG cs.SD

    Improving Transformer-based Networks With Locality For Automatic Speaker Verification

    Authors: Mufan Sang, Yong Zhao, Gang Liu, John H. L. Hansen, Jian Wu

    Abstract: Recently, Transformer-based architectures have been explored for speaker embedding extraction. Although the Transformer employs the self-attention mechanism to efficiently model the global interaction between token embeddings, it is inadequate for capturing short-range local context, which is essential for the accurate extraction of speaker information. In this study, we enhance the Transformer wi… ▽ More

    Submitted 28 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  3. arXiv:2207.04540  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

    Authors: Mufan Sang, John H. L. Hansen

    Abstract: Recently, attention mechanisms have been applied successfully in neural network-based speaker verification systems. Incorporating the Squeeze-and-Excitation block into convolutional neural networks has achieved remarkable performance. However, it uses global average pooling (GAP) to simply average the features along time and frequency dimensions, which is incapable of preserving sufficient speaker… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: Accepted to Interspeech 2022

  4. arXiv:2112.04459  [pdf, other

    eess.AS cs.LG cs.SD

    Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

    Authors: Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan

    Abstract: Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. Different from contrastive learning-based self-supervised learning methods, the prop… ▽ More

    Submitted 1 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted to ICASSP 2022

  5. arXiv:2012.06896  [pdf, other

    eess.AS cs.LG cs.SD

    DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

    Authors: Mufan Sang, Wei Xia, John H. L. Hansen

    Abstract: Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field. In this study, we propose a novel framework to disentangle speaker-related and domain-specific features and apply domain adaptation on the speaker-related feature space solely. Instead of performing domain adaptati… ▽ More

    Submitted 22 February, 2021; v1 submitted 12 December, 2020; originally announced December 2020.

    Comments: Accepted to ICASSP 2021

  6. arXiv:2009.13477  [pdf

    physics.med-ph eess.IV

    Super-Resolution Ultrasound Localization Microscopy Based on a High Frame-rate Clinical Ultrasound Scanner: An In-human Feasibility Study

    Authors: Chengwu Huang, Wei Zhang, ** Gong, U-Wai Lok, Shanshan Tang, Tinghui Yin, Xirui Zhang, Lei Zhu, Maodong Sang, Pengfei Song, Rongqin Zheng, Shigao Chen

    Abstract: Non-invasive detection of microvascular alterations in deep tissues in vivo provides critical information for clinical diagnosis and evaluation of a broad-spectrum of pathologies. Recently, the emergence of super-resolution ultrasound localization microscopy (ULM) offers new possibilities for clinical imaging of microvasculature at capillary level. Currently, the clinical utility of ULM on clinica… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 41 pages, 5 figures, 4 supplemental figures

  7. arXiv:2009.09556  [pdf, other

    eess.AS cs.LG cs.SD

    Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

    Authors: Mufan Sang, Wei Xia, John H. L. Hansen

    Abstract: In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is propo… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: Accepted to INTERSPEECH 2020