Skip to main content

Showing 1–26 of 26 results for author: Latif, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05784  [pdf, other

    cs.SD cs.LG eess.AS

    Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment

    Authors: Huma Ameer, Seemab Latif, Rabia Latif

    Abstract: The automated classification of stuttered speech has significant implications for timely assessments providing assistance to speech language pathologists. Despite notable advancements in the field, the cases in which multiple disfluencies occur in speech require attention. We have taken a progressive approach to fill this gap by classifying multi-stuttered speech more efficiently. The problem has… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2311.05203  [pdf, other

    cs.SD cs.LG eess.AS

    Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

    Authors: Huma Ameer, Seemab Latif, Rabia Latif, Sana Mukhtar

    Abstract: In recent years, advancements in the field of speech processing have led to cutting-edge deep learning algorithms with immense potential for real-world applications. The automated identification of stuttered speech is one of such applications that the researchers are addressing by employing deep learning techniques. Recently, researchers have utilized Wav2vec2.0, a speech recognition model to clas… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 12 pages, 6 figures, 6 tables, journal paper

  3. arXiv:2308.12792  [pdf, other

    cs.SD eess.AS

    Sparks of Large Audio Models: A Survey and Outlook

    Authors: Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller

    Abstract: This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Pr… ▽ More

    Submitted 21 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Under review, Repo URL: https://github.com/EmulationAI/awesome-large-audio-models

  4. arXiv:2307.06090  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

    Authors: Siddique Latif, Muhammad Usama, Mohammad Ibrahim Malik, Björn W. Schuller

    Abstract: Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential o… ▽ More

    Submitted 19 June, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted in IEEE Computational Intelligence Magazine

  5. arXiv:2306.13804  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers

    Authors: Syed Aun Muhammad Zaidi, Siddique Latif, Junaid Qadir

    Abstract: Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this paper, we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language SER. Our model utilises pre-trained models for multimodal feature extraction and is equipped with a dual attention mechanism including… ▽ More

    Submitted 14 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Under Review IEEE TAC

  6. arXiv:2305.11413  [pdf, other

    cs.SD eess.AS

    A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

    Authors: Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn Schuller

    Abstract: In this paper, we propose to utilise diffusion models for data augmentation in speech emotion recognition (SER). In particular, we present an effective approach to utilise improved denoising diffusion probabilistic models (IDDPM) to generate synthetic emotional data. We condition the IDDPM with the textual embedding from bidirectional encoder representations from transformers (BERT) to generate hi… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted Interspeech 2023

  7. arXiv:2305.07429  [pdf, other

    eess.IV cs.CV cs.LG

    Unlocking the Potential of Medical Imaging with ChatGPT's Intelligent Diagnostics

    Authors: Ayyub Alzahem, Shahid Latif, Wadii Boulila, Anis Koubaa

    Abstract: Medical imaging is an essential tool for diagnosing various healthcare diseases and conditions. However, analyzing medical images is a complex and time-consuming task that requires expertise and experience. This article aims to design a decision support system to assist healthcare providers and patients in making decisions about diagnosing, treating, and managing health conditions. The proposed ar… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  8. arXiv:2305.00725  [pdf, other

    cs.SD eess.AS

    Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

    Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak

    Abstract: Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: Under review

  9. arXiv:2304.11408  [pdf, other

    cs.SD eess.AS

    Lightweight Toxicity Detection in Spoken Language: A Transformer-based Approach for Edge Devices

    Authors: Ahlam Husni Abu Nada, Siddique Latif, Junaid Qadir

    Abstract: Toxicity is a prevalent social behavior that involves the use of hate speech, offensive language, bullying, and abusive speech. While text-based approaches for toxicity detection are common, there is limited research on processing speech signals in the physical world. Detecting toxicity in the physical world is challenging due to the difficulty of integrating AI-capable computers into the environm… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: Under Rewiew

  10. arXiv:2304.01576  [pdf, other

    eess.IV cs.CV cs.LG

    MESAHA-Net: Multi-Encoders based Self-Adaptive Hard Attention Network with Maximum Intensity Projections for Lung Nodule Segmentation in CT Scan

    Authors: Muhammad Usman, Azka Rehman, Abdullah Shahid, Siddique Latif, Shi Sub Byon, Sung Hyun Kim, Tariq Mahmood Khan, Yeong Gil Shin

    Abstract: Accurate lung nodule segmentation is crucial for early-stage lung cancer diagnosis, as it can substantially enhance patient survival rates. Computed tomography (CT) images are widely employed for early diagnosis in lung nodule analysis. However, the heterogeneity of lung nodules, size diversity, and the complexity of the surrounding environment pose challenges for develo** robust nodule segmenta… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  11. arXiv:2303.11607  [pdf, other

    cs.CL cs.SD eess.AS

    Transformers in Speech Processing: A Survey

    Authors: Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, Fahad Shamshad, Moazzam Shoukat, Junaid Qadir

    Abstract: The remarkable success of transformers in the field of natural language processing has sparked the interest of the speech-processing community, leading to an exploration of their potential for modeling long-range dependencies within speech sequences. Recently, transformers have gained prominence across various speech-related domains, including automatic speech recognition, speech synthesis, speech… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: under-review

  12. arXiv:2301.03751  [pdf, other

    cs.SD eess.AS

    Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation

    Authors: Abdullah Shahid, Siddique Latif, Junaid Qadir

    Abstract: Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron architecture. The proposed TTS system includes encoders for speaker a… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Under review

  13. arXiv:2211.00003  [pdf, other

    eess.IV cs.CV

    MEDS-Net: Self-Distilled Multi-Encoders Network with Bi-Direction Maximum Intensity projections for Lung Nodule Detection

    Authors: Muhammad Usman, Azka Rehman, Abdullah Shahid, Siddique Latif, Shi Sub Byon, Byoung Dai Lee, Sung Hyun Kim, Byung il Lee, Yeong Gil Shin

    Abstract: In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed ar… ▽ More

    Submitted 26 December, 2022; v1 submitted 30 October, 2022; originally announced November 2022.

  14. arXiv:2207.05298  [pdf, other

    cs.SD eess.AS

    Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller

    Abstract: Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisa… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Under review IEEE Transactions on Affective Computing

  15. arXiv:2204.08625  [pdf, other

    cs.SD eess.AS

    Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn Schuller

    Abstract: Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on ut… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted in IEEE Transactions on Affective Computing

  16. arXiv:2101.00240  [pdf, other

    cs.SD cs.LG eess.AS

    A Survey on Deep Reinforcement Learning for Audio-Based Applications

    Authors: Siddique Latif, Heriberto Cuayáhuitl, Farrukh Pervez, Fahad Shamshad, Hafiz Shehbaz Ali, Erik Cambria

    Abstract: Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to effectively solve various intractable problems in various fields. Most importantly, DRL algorithms are also being employed in audio signal processing to learn direc… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

    Comments: Under Review

  17. arXiv:2005.11172  [pdf, other

    eess.AS cs.SD

    Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition

    Authors: Thejan Rajapakshe, Siddique Latif, Rajib Rana, Sara Khalifa, Björn W. Schuller

    Abstract: Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This has led to breakthroughs in many complex tasks, such as playing the game "Go", that were previously difficult to solve. However, deep RL requires significant training time making it difficult to use in va… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1910.11256

  18. arXiv:2005.08453  [pdf, other

    cs.SD eess.AS

    Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller

    Abstract: Speech emotion recognition systems (SER) can achieve high accuracy when the training and test data are identically distributed, but this assumption is frequently violated in practice and the performance of SER systems plummet against unforeseen data shifts. The design of robust models for accurate SER is challenging, which limits its use in practical applications. In this paper we propose a deeper… ▽ More

    Submitted 25 July, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted in INTERSPEECH 2020

  19. arXiv:2005.08447  [pdf, other

    cs.SD eess.AS

    Augmenting Generative Adversarial Networks for Speech Emotion Recognition

    Authors: Siddique Latif, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller

    Abstract: Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the eff… ▽ More

    Submitted 25 July, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted in INTERSPEECH 2020

  20. arXiv:2004.07937  [pdf

    q-bio.NC cs.LG eess.SP stat.ML

    Principle components analysis for seizures prediction using wavelet transform

    Authors: Syed Muhammad Usman, Shahzad Latif, Arshad Beg

    Abstract: Epilepsy is a disease in which frequent seizures occur due to abnormal activity of neurons. Patients affected by this disease can be treated with the help of medicines or surgical procedures. However, both of these methods are not quite useful. The only method to treat epilepsy patients effectively is to predict the seizure before its onset. It has been observed that abnormal activity in the brain… ▽ More

    Submitted 9 March, 2020; originally announced April 2020.

  21. arXiv:2001.00378  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller

    Abstract: Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions. There are two main drawbacks to this approach: firstly, the feature engineering being manual is cumbersome and requir… ▽ More

    Submitted 24 September, 2021; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: Part of this work is accepted in IEEE Transactions on Affective Computing 2021. https://ieeexplore.ieee.org/document/9543566

  22. arXiv:1907.06083  [pdf, other

    cs.SD eess.AS

    Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition

    Authors: Siddique Latif, Junaid Qadir, Muhammad Bilal

    Abstract: Cross-lingual speech emotion recognition (SER) is a crucial task for many real-world applications. The performance of SER systems is often degraded by the differences in the distributions of training and test data. These differences become more apparent when training and test data belong to different languages, which cause a significant performance gap between the validation and test scores. It is… ▽ More

    Submitted 27 July, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: Accepted in Affective Computing & Intelligent Interaction (ACII 2019)

  23. arXiv:1907.06078  [pdf, other

    cs.SD eess.AS

    Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, Björn W. Schuller

    Abstract: Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for develo** any robust machine learning model in general. In this paper, we propose a solution to this problem: a… ▽ More

    Submitted 22 March, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: Accepted in IEEE Transactions on Affective Computing

  24. arXiv:1904.03833  [pdf, other

    cs.SD eess.AS

    Direct Modelling of Speech Emotion from Raw Speech

    Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps

    Abstract: Speech emotion recognition is a challenging task and heavily depends on hand-engineered acoustic features, which are typically crafted to echo human perception of speech signals. However, a filter bank that is designed from perceptual evidence is not always guaranteed to be the best in a statistical modelling framework where the end goal is for example emotion classification. This has fuelled the… ▽ More

    Submitted 27 July, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: INTERSPEECH 2019

  25. arXiv:1811.11402  [pdf, other

    cs.LG cs.CR eess.SP stat.ML

    Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness

    Authors: Siddique Latif, Rajib Rana, Junaid Qadir

    Abstract: Deep learning has undoubtedly offered tremendous improvements in the performance of state-of-the-art speech emotion recognition (SER) systems. However, recent research on adversarial examples poses enormous challenges on the robustness of SER systems by showing the susceptibility of deep neural networks to adversarial examples as they rely only on small and imperceptible perturbations. In this stu… ▽ More

    Submitted 30 December, 2018; v1 submitted 28 November, 2018; originally announced November 2018.

  26. arXiv:1712.08708  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

    Authors: Siddique Latif, Rajib Rana, Junaid Qadir, Julien Epps

    Abstract: Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have… ▽ More

    Submitted 27 July, 2020; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: Proc. Interspeech 2018