Skip to main content

Showing 1–17 of 17 results for author: Jalal, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17159  [pdf, other

    eess.AS cs.MM cs.SD

    Exploring compressibility of transformer based text-to-music (TTM) models

    Authors: Vasileios Moschopoulos, Thanasis Kotsiopoulos, Pablo Peso Parada, Konstantinos Nikiforidis, Alexandros Stergiadis, Gerasimos Papakostas, Md Asif Jalal, Jisi Zhang, Anastasios Drosou, Karthikeyan Saravanan

    Abstract: State-of-the art Text-To-Music (TTM) generative AI models are large and require desktop or server class compute, making them infeasible for deployment on mobile phones. This paper presents an analysis of trade-offs between model compression and generation performance of TTM models. We study compression through knowledge distillation and specific modifications that enable applicability over the var… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Proceedings of INTERSPEECH 2024

  2. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  3. arXiv:2401.12085  [pdf, other

    eess.AS cs.SD

    Consistency Based Unsupervised Self-training For ASR Personalisation

    Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  4. arXiv:2312.08494  [pdf, other

    cs.SD cs.LG eess.AS

    PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models

    Authors: Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli

    Abstract: Perceptual modification of voice is an elusive goal. While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to convert one voice to another, but these modifications are handled by black box models, and the specifics of what perceptual qualities to modify and ho… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  5. arXiv:2307.13343  [pdf, other

    eess.AS cs.CR cs.SD

    On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

    Authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

    Abstract: Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Proceedings of INTERSPEECH 2023

  6. arXiv:2306.17500  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition

    Authors: Anna Ollerenshaw, Md Asif Jalal, Rosanna Milner, Thomas Hain

    Abstract: Speech emotion recognition (SER) is vital for obtaining emotional intelligence and understanding the contextual meaning of speech. Variations of consonant-vowel (CV) phonemic boundaries can enrich acoustic context with linguistic cues, which impacts SER. In practice, speech emotions are treated as single labels over an acoustic segment for a given time duration. However, phone boundaries within sp… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  7. arXiv:2306.03284  [pdf, other

    cs.LG eess.IV

    Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion Generative Models

    Authors: Sriram Ravula, Brett Levac, Ajil Jalal, Jonathan I. Tamir, Alexandros G. Dimakis

    Abstract: Diffusion-based generative models have been used as powerful priors for magnetic resonance imaging (MRI) reconstruction. We present a learning method to optimize sub-sampling patterns for compressed sensing multi-coil MRI that leverages pre-trained diffusion generative models. Crucially, during training we use a single-step reconstruction based on the posterior mean estimate given by the diffusion… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  8. arXiv:2303.14795  [pdf, other

    eess.IV eess.SP

    MRI Reconstruction with Side Information using Diffusion Models

    Authors: Brett Levac, Ajil Jalal, Kannan Ramchandran, Jonathan I. Tamir

    Abstract: Magnetic resonance imaging (MRI) exam protocols consist of multiple contrast-weighted images of the same anatomy to emphasize different tissue properties. Due to the long acquisition times required to collect fully sampled k-space measurements, it is common to only collect a fraction of k-space for each scan and subsequently solve independent inverse problems for each image contrast. Recently, the… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

  9. arXiv:2303.00550  [pdf, other

    eess.AS cs.SD

    Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation

    Authors: Rehan Ahmad, Md Asif Jalal, Muhammad Umar Farooq, Anna Ollerenshaw, Thomas Hain

    Abstract: Knowledge distillation has widely been used for model compression and domain adaptation for speech applications. In the presence of multiple teachers, knowledge can easily be transferred to the student by averaging the models output. However, previous research shows that the student do not adapt well with such combination. This paper propose to use an elitist sampling strategy at the output of ens… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  10. arXiv:2211.02000  [pdf, other

    cs.SD cs.CL eess.AS

    Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

    Authors: Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

    Abstract: State-of-the-art speaker verification frameworks have typically focused on develo** models with increasingly deeper (more layers) and wider (number of channels) models to improve their verification performance. Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters… ▽ More

    Submitted 27 February, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

  11. arXiv:2211.01993  [pdf, other

    cs.CL cs.SD eess.AS

    Probing Statistical Representations For End-To-End ASR

    Authors: Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

    Abstract: End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition. In this domain there is little research to analyse internal representation dependencies and their relationship to modelling approaches. This paper investigates cross-domain language model dependencies within transformer architectures using SVCCA and uses these insights to e… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  12. arXiv:2211.00199  [pdf, other

    eess.IV eess.SP

    Accelerated Motion Correction with Deep Generative Diffusion Models

    Authors: Brett Levac, Sidharth Kumar, Ajil Jalal, Jonathan I. Tamir

    Abstract: Magnetic Resonance Imaging (MRI) is a powerful medical imaging modality, but unfortunately suffers from long scan times which, aside from increasing operational costs, can lead to image artifacts due to patient motion. Motion during the acquisition leads to inconsistencies in measured data that manifest as blurring and ghosting if unaccounted for in the image reconstruction process. Various deep l… ▽ More

    Submitted 28 September, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

  13. A cross-corpus study on speech emotion recognition

    Authors: Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain

    Abstract: For speech emotion datasets, it has been difficult to acquire large quantities of reliable data and acted emotions may be over the top compared to less expressive emotions displayed in everyday life. Lately, larger datasets with natural emotions have been created. Instead of ignoring smaller, acted datasets, this study investigates whether information learnt from acted emotions is useful for detec… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: ASRU 2019

    Journal ref: IEEE Workshop on Automatic Speech Recognition and Understanding 2019

  14. Insights on Neural Representations for End-to-End Speech Recognition

    Authors: Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

    Abstract: End-to-end automatic speech recognition (ASR) models aim to learn a generalised speech representation. However, there are limited tools available to understand the internal functions and the effect of hierarchical dependencies within the model architecture. It is crucial to understand the correlations between the layer-wise representations, to derive insights on the relationship between neural rep… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Submitted to Interspeech 2021

    Journal ref: Proc. Interspeech 2021, 4079-4083

  15. arXiv:2102.11420  [pdf, other

    cs.SD eess.AS

    Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

    Authors: Samuel J. Broughton, Md Asif Jalal, Roger K. Moore

    Abstract: Generative Adversarial Networks (GANs) are machine learning networks based around creating synthetic data. Voice Conversion (VC) is a subset of voice translation that involves translating the paralinguistic features of a source speaker to a target speaker while preserving the linguistic information. The aim of non-parallel conditional GANs for VC is to translate an acoustic speech feature sequence… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: For demo, see https://samuelbroughton.github.io/interpretability-demo-2020/

  16. arXiv:2006.13494  [pdf, other

    eess.SP

    High Dimensional Channel Estimation Using Deep Generative Networks

    Authors: Eren Balevi, Akash Doshi, Ajil Jalal, Alexandros Dimakis, Jeffrey G. Andrews

    Abstract: This paper presents a novel compressed sensing (CS) approach to high dimensional wireless channel estimation by optimizing the input to a deep generative network. Channel estimation using generative networks relies on the assumption that the reconstructed channel lies in the range of a generative model. Channel reconstruction using generative priors outperforms conventional CS techniques and requi… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

  17. arXiv:2005.06001  [pdf, other

    eess.IV cs.LG stat.ML

    Deep Learning Techniques for Inverse Problems in Imaging

    Authors: Gregory Ongie, Ajil Jalal, Christopher A. Metzler, Richard G. Baraniuk, Alexandros G. Dimakis, Rebecca Willett

    Abstract: Recent work in machine learning shows that deep neural networks can be used to solve a wide variety of inverse problems arising in computational imaging. We explore the central prevailing themes of this emerging area and present a taxonomy that can be used to categorize different problems and reconstruction methods. Our taxonomy is organized along two central axes: (1) whether or not a forward mod… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.