Skip to main content

Showing 1–50 of 68 results for author: Mohamed, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.13918  [pdf

    eess.SP

    Emerging Advancements in 6G NTN Radio Access Technologies: An Overview

    Authors: Husnain Shahid, Carla Amatetti, Riccardo Campana, Sorya Tong, Dorin Panaitopol, Alessandro Vanelli Coralli, Abdelhamed Mohamed, Chao Zhang, Ebraam Khalifa, Eduardo Medeiros, Estefania Recayte, Fatemeh Ghasemifard, Ji Lianghai, Juan Bucheli, Karthik Anantha Swamy, Marius Caus, Mehmet Gurelli, Miguel A. Vazquez, Musbah Shaat, Nathan Borios, Per-Erik Eriksson, Sebastian Euler, Zheng Li, Xiaotian Fu

    Abstract: The efforts on the development, standardization and improvements to communication systems towards 5G Advanced and 6G are on track to provide benefits such as an unprecedented level of connectivity and performance, enabling a diverse range of vertical services. The full integration of non-terrestrial components into 6G plays a pivotal role in realizing this paradigm shift towards ubiquitous communi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: accepted in 2024 EuCNC and 6G Summit, Antwerp, Belgium, 3_6 June 2024

  2. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  3. arXiv:2403.16973  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

    Authors: Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath

    Abstract: We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024. Data, code, and model weights are available at https://github.com/jasonppy/VoiceCraft

  4. arXiv:2402.01969  [pdf, other

    cs.LG eess.SP

    Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction

    Authors: Ahmed P. Mohamed, Byunghyun Lee, Yaguang Zhang, Max Hollingsworth, C. Robert Anderson, James V. Krogmeier, David J. Love

    Abstract: Machine learning (ML) offers a promising solution to pathloss prediction. However, its effectiveness can be degraded by the limited availability of data. To alleviate these challenges, this paper introduces a novel simulation-enhanced data augmentation method for ML pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected re… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 6 pages, 5 figures, Accepted at ICC 2024

  5. arXiv:2401.13463  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

    Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  6. arXiv:2401.09471  [pdf

    eess.IV cs.CV cs.LG

    Brain Tumor Radiogenomic Classification

    Authors: Amr Mohamed, Mahmoud Rabea, Aya Sameh, Ehab Kamal

    Abstract: The RSNA-MICCAI brain tumor radiogenomic classification challenge aimed to predict MGMT biomarker status in glioblastoma through binary classification on Multi parameter mpMRI scans: T1w, T1wCE, T2w and FLAIR. The dataset is splitted into three main cohorts: training set, validation set which were used during training, and the testing were only used during final evaluation. Images were either in a… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 6 Pages with 4 Tables, 4 Figures and 4 Images

  7. arXiv:2401.03488  [pdf, other

    cs.LG cs.CR eess.SP

    Data-Driven Subsampling in the Presence of an Adversarial Actor

    Authors: Abu Shafin Mohammad Mahdee Jameel, Ahmed P. Mohamed, **ho Yi, Aly El Gamal, Akshay Malhotra

    Abstract: Deep learning based automatic modulation classification (AMC) has received significant attention owing to its potential applications in both military and civilian use cases. Recently, data-driven subsampling techniques have been utilized to overcome the challenges associated with computational complexity and training time for AMC. Beyond these direct advantages of data-driven subsampling, these me… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICMLCN 2024

  8. arXiv:2310.10803  [pdf, other

    cs.CL eess.AS

    SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we demonstrate that a syllabic organization emerges in learning sentence-level representation of speech. In particular, we adopt "self-distillation" obj… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  9. arXiv:2310.10788  [pdf, other

    eess.AS cs.CL

    Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental proper… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  10. arXiv:2310.05513  [pdf, other

    cs.SD cs.CL eess.AS

    Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

    Authors: Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-** Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

    Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track w… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU

  11. arXiv:2309.17020  [pdf, other

    eess.AS cs.SD

    Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

    Authors: Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed

    Abstract: Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TT… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ASRU 2023 SPARKS Workshop

  12. arXiv:2309.10787  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

    Authors: Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

    Abstract: Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024; Evaluation Code: https://github.com/roger-tseng/av-superb Submission Platform: https://av.superbbenchmark.org

  13. arXiv:2309.05814  [pdf, ps, other

    eess.SP eess.SY

    Reinforcement Learning for Supply Chain Attacks Against Frequency and Voltage Control

    Authors: Amr S. Mohamed, Sumin Lee, Deepa Kundur

    Abstract: The ongoing modernization of the power system, involving new equipment installations and upgrades, exposes the power system to the introduction of malware into its operation through supply chain attacks. Supply chain attacks present a significant threat to power systems, allowing cybercriminals to bypass network defenses and execute deliberate attacks at the physical layer. Given the exponential a… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 7 pages, conference, IEEE International Conference on Machine Learning and Applications (ICMLA) 2023

  14. Reconfigurable Intelligent Surface Enabled Joint Backscattering and Communication

    Authors: **qiu Zhao, Jia Ye Shuaishuai Guo, Zhiquan Bai, Di Zhou, Abeer Mohamed

    Abstract: Reconfigurable intelligent surface (RIS) as an essential topic in the sixth-generation (6G) communications aims to enhance communication performance or mitigate undesired transmission. However, the controllability of each reflecting element on RIS also enables it to act as a passive backscatter device (BD) and transmit its information to reader devices. In this paper, we propose a RIS-enabled join… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 11 pages, 8 figures, published to IEEE TVT

    Journal ref: IEEE Transactions on Vehicular Technology, 2023

  15. arXiv:2305.11435  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

    Authors: Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

    Abstract: In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective. We demonstrate that a nearly identical model architecture (HuBERT) trained with a masked language modeling loss does not exhibit this same ability, suggesting that the visual grounding objective is responsible for the emergence of thi… ▽ More

    Submitted 23 July, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023. Code & Model: https://github.com/jasonppy/syllable-discovery

  16. arXiv:2305.10615  [pdf, other

    cs.SD cs.CL eess.AS

    ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

    Authors: Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei ** Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

    Abstract: Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic… ▽ More

    Submitted 11 August, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech

  17. Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer

    Authors: Rami Hamdi, Ahmed Ben Said, Emna Baccour, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani

    Abstract: Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Journal ref: IEEE Internet of Things Journal, 2023

  18. On the Use of Reinforcement Learning for Attacking and Defending Load Frequency Control

    Authors: Amr S. Mohamed, Deepa Kundur

    Abstract: The electric grid is an attractive target for cyberattackers given its critical nature in society. With the increasing sophistication of cyberattacks, effective grid defense will benefit from proactively identifying vulnerabilities and attack strategies. We develop a deep reinforcement learning-based method that recognizes vulnerabilities in load frequency control, an essential process that mainta… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  19. arXiv:2303.02197  [pdf, ps, other

    eess.SY

    On the Use of Safety Critical Control for Cyber-Physical Security in the Smart Grid

    Authors: Amr S. Mohamed, Mohsen Khalaf, Deepa Kundur

    Abstract: The tight coupling between communication and control in cyber-physical systems is necessary to enable the complex regulation required to operate these systems. Unfortunately, cyberattackers can exploit network vulnerabilities to compromise communication and force unsafe decision-making and dynamics. If a cyberattack is not detected and isolated in a timely manner, the control process must balance… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 9 pages, 7 figures, conference. Accepted for publishing at the 2023 IEEE Power & Energy Society General Meeting (GM)

  20. arXiv:2302.14126  [pdf, other

    eess.SY

    A Probabilistic Approach to Adaptive Protection in the Smart Grid

    Authors: Amr S. Mohamed, Deepa Kundur, Mohsen Khalaf

    Abstract: Smart grids are critical cyber-physical systems that are vital to our energy future. Smart grids' fault resilience is dependent on the use of advanced protection systems that can reliably adapt to changing conditions within the grid. The vast amount of operational data generated and collected in smart grids can be used to develop these protection systems. However, given the safety-criticality of p… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: journal, 21 pages

  21. arXiv:2302.07157  [pdf, other

    eess.IV

    Classification of Lung Pathologies in Neonates using Dual Tree Complex Wavelet Transform

    Authors: Sagarjit Aujla, Adel Mohamed, Ryan Tan, Randy Tan, Lei Gao, Naimul Khan, Karthikeyan Umapathy

    Abstract: Annually 8500 neonatal deaths are reported in the US due to respiratory failure. Recently, Lung Ultrasound (LUS), due to its radiation free nature, portability, and being cheaper is gaining wide acceptability as a diagnostic tool for lung conditions. However, lack of highly trained medical professionals has limited its use especially in remote areas. To address this, an automated screening system… ▽ More

    Submitted 17 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Under review

  22. arXiv:2301.00652  [pdf, other

    eess.AS cs.CL

    Efficient Speech Representation Learning with Low-Bit Quantization

    Authors: Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Abdelrahman Mohamed

    Abstract: With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent quantization techniques on speech representation learning models. The quantization techniques were evaluated on the SUPERB benchmark. On the ASR task, with aggressive qu… ▽ More

    Submitted 14 December, 2022; originally announced January 2023.

    Comments: 7 pages

  23. arXiv:2212.01393  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Continual Learning for On-Device Speech Recognition using Disentangled Conformers

    Authors: Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

    Abstract: Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as speech models are increasingly deployed on personal devices, such models encounter user-specific distributional shifts. To simulate this real-world scenario, we introduce LibriContinual, a continual learning benchmark for speaker-specific domain adaptation derived from LibriVox audiobooks, with dat… ▽ More

    Submitted 13 December, 2022; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 8 pages, 2 figures. Submitted to ICASSP 2023

  24. arXiv:2211.05756  [pdf, other

    cs.CL cs.SD eess.AS

    Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

    Authors: Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

    Abstract: End-to-end multilingual ASR has become more appealing because of several reasons such as simplifying the training and deployment process and positive performance transfer from high-resource to low-resource languages. However, scaling up the number of languages, total hours, and number of unique tokens is not a trivial task. This paper explores large-scale multilingual ASR models on 70 languages. W… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  25. arXiv:2211.02536  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Biased Self-supervised learning for ASR

    Authors: Florian L. Kreyssig, Yangyang Shi, **xi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland

    Abstract: Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slightly finetune the model that is used to obtain the target sequence. This leads to better performance and a substantial increase in training speed. Fur… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  26. Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

    Authors: Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala K. Anumanchipalli

    Abstract: Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

  27. arXiv:2210.08634  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

    Authors: Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee

    Abstract: We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the computation requirements of self-supervised learning (SSL) representation and to evaluate its generalizability and performance across the diverse SUPERB… ▽ More

    Submitted 29 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: Accepted by 2022 SLT Workshop

  28. arXiv:2207.10643  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    STOP: A dataset for Spoken Task Oriented Semantic Parsing

    Authors: Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

    Abstract: End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assi… ▽ More

    Submitted 18 October, 2022; v1 submitted 28 June, 2022; originally announced July 2022.

  29. arXiv:2206.13127  [pdf, other

    eess.SP cs.IT

    Intelligent Omni-Surfaces (IOSs) for the MIMO Broadcast Channel

    Authors: Abdelhamed Mohamed, Nemanja Stefan Perović, Marco Di Renzo

    Abstract: In this paper, we consider intelligent omni-surfaces (IOSs), which are capable of simultaneously reflecting and refracting electromagnetic waves. We focus our attention on the multiple-input multiple-output (MIMO) broadcast channel, and we introduce an algorithm for jointly optimizing the covariance matrix at the base station, the matrix of reflection and transmission coefficients at the IOS, and… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted to be published in the 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2022)

  30. A Proposed Sub-optimal Power Allocation using Simulated Annealing in Cognitive Radio Networks

    Authors: Abdelhamed Mohamed, Mona Shokair, Mohamed Elkordy, Said ElHalafawy

    Abstract: Due to the rapid demand for wireless services and the increase in the wireless device count, there is a lack of available spectrum bands which constrain the further development of wireless communication .Therefore, Cognitive Radio (CR) has been adopted as a promising solution because of its ability to exploit the inefficiently used spectrum of licensed bands. Orthogonal Frequency Division Multiple… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Journal ref: Int. J. Com. Dig. Sys. 5, No.3 (May-2016)

  31. arXiv:2206.03318  [pdf, other

    cs.CL cs.SD eess.AS

    LegoNN: Building Modular Encoder-Decoder Models

    Authors: Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

    Abstract: State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way… ▽ More

    Submitted 11 July, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  32. arXiv:2205.10643  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Supervised Speech Representation Learning: A Review

    Authors: Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

    Abstract: Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

  33. arXiv:2205.08330  [pdf, other

    cs.RO eess.SY

    Nonlinear Model Identification and Observer Design for Thrust Estimation of Small-scale Turbojet Engines

    Authors: Affaf Junaid Ahamad Momin, Gabriele Nava, Giuseppe LErario, Hosameldin Awadalla Omer Mohamed, Fabio Bergonti, Punith Reddy Vanteddu, Francesco Braghin, Daniele Pucci

    Abstract: Jet-powered vertical takeoff and landing (VTOL) drones require precise thrust estimation to ensure adequate stability margins and robust maneuvering. Small-scale turbojets have become good candidates for powering heavy aerial drones. However, due to limited instrumentation available in these turbojets, estimating the precise thrust using classical techniques is not straightforward. In this paper,… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 6+1 pages

  34. arXiv:2205.07180  [pdf, other

    eess.AS cs.CV cs.SD

    Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

    Authors: Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu

    Abstract: This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs. Our study focuses on the Audio-Visual Hidden Unit BERT (AV-HuBERT) approach, a recently developed general-purpose audio-visual speech pre-training framework. We conducted extensive experiments probing the ef… ▽ More

    Submitted 14 July, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: Interspeech 2022

  35. arXiv:2204.11229  [pdf, other

    cs.IT eess.SP

    Bi-objective Optimization of Information Rate and Harvested Power in RIS-aided SWIPT Systems

    Authors: Abdelhamed Mohamed, A. Zappone, Marco Di Renzo

    Abstract: The problem of simultaneously optimizing the information rate and the harvested power in a reconfigurable intelligent surface (RIS)-aided multiple-input single-output downlink wireless network with simultaneous wireless information and power transfer (SWIPT) is addressed. The beamforming vectors, RIS reflection coefficients, and power split ratios are jointly optimized subject to maximum power con… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Submitted for publication

  36. arXiv:2203.16502  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Generative Spoken Dialogue Language Modeling

    Authors: Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux

    Abstract: We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech,… ▽ More

    Submitted 22 November, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

  37. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  38. arXiv:2203.04911  [pdf, other

    cs.CL cs.SD eess.AS

    DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

    Authors: Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages… ▽ More

    Submitted 21 June, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  39. arXiv:2202.07359  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    textless-lib: a Library for Textless Spoken Language Processing

    Authors: Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

    Abstract: Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area. We describe the building blocks that the library provides and demonstrate its usability by discuss three differ… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: The library is available here https://github.com/facebookresearch/textlesslib/

  40. arXiv:2201.02184  [pdf, other

    eess.AS cs.CV cs.SD

    Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

    Authors: Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed

    Abstract: Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker's lip movements and the produced sound. We introduce Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovere… ▽ More

    Submitted 12 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  41. arXiv:2201.01763  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Robust Self-Supervised Audio-Visual Speech Recognition

    Authors: Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

    Abstract: Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe. Audio-visual speech recognition (AVSR) systems improve robustness by complementing the audio stream with the visual information that is invariant to noise and helps the model focus on the desired… ▽ More

    Submitted 14 July, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: Interspeech 2022

  42. arXiv:2111.07402  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Textless Speech Emotion Conversion using Discrete and Decomposed Representations

    Authors: Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

    Abstract: Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language translation task. We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, a… ▽ More

    Submitted 13 December, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

    Comments: Paper was published at EMNLP 2022

  43. arXiv:2111.05948  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling ASR Improves Zero and Few Shot Learning

    Authors: Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

    Abstract: With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets. To efficiently scale model sizes, we leverage various optimizations such a… ▽ More

    Submitted 29 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

  44. arXiv:2109.03264  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Text-Free Prosody-Aware Generative Spoken Language Modeling

    Authors: Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

    Abstract: Speech pre-training has primarily demonstrated efficacy on classification tasks, while its capability of generating novel speech, similar to how GPT-2 can generate coherent paragraphs, has barely been explored. Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-lik… ▽ More

    Submitted 10 May, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: ACL 2022

  45. arXiv:2106.07759  [pdf, ps, other

    eess.AS cs.CL

    Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

    Authors: Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

    Abstract: In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR). The proposed approach uses a teacher model which is updated as the exponential moving average (EMA) of the student model parameters. We demonstrate that it is critical for EMA to be accumulated with full-precision floating point. The Ka… ▽ More

    Submitted 27 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Updated with camera ready version

  46. arXiv:2106.07447  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

    Authors: Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed

    Abstract: Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  47. arXiv:2105.11013  [pdf, other

    cs.DC cs.LG eess.SY

    Distributed CNN Inference on Resource-Constrained UAVs for Surveillance Systems: Design and Optimization

    Authors: Mohammed Jouhari, Abdulla Al-Ali, Emna Baccour, Amr Mohamed, Aiman Erbad, Mohsen Guizani, Mounir Hamdi

    Abstract: Unmanned Aerial Vehicles (UAVs) have attracted great interest in the last few years owing to their ability to cover large areas and access difficult and hazardous target zones, which is not the case of traditional systems relying on direct observations obtained from fixed cameras and sensors. Furthermore, thanks to the advancements in computer vision and machine learning, UAVs are being adopted fo… ▽ More

    Submitted 23 May, 2021; originally announced May 2021.

    Comments: Accepted in IEEE Internet of Things Journal

  48. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  49. arXiv:2104.00355  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

    Authors: Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux

    Abstract: We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate representations for speech content, prosodic information, and speaker identity. This allows to synthesize speech in a controllable manner. We analyze various state-of-the-art, self-supervised representation learning methods and she… ▽ More

    Submitted 27 July, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: In Proceedings of Interspeech 2021

  50. Compress or Interfere?

    Authors: Alaa Awad Abdellatif, Lutfi Samara, Amr Mohamed, Mohsen Guizani, Aiman Erbad, Abdulla Al-Ali

    Abstract: Rapid evolution of wireless medical devices and network technologies has fostered the growth of remote monitoring systems. Such new technologies enable monitoring patients' medical records anytime and anywhere without limiting patients' activities. However, critical challenges have emerged with remote monitoring systems due to the enormous amount of generated data that need to be efficiently proce… ▽ More

    Submitted 27 June, 2020; originally announced June 2020.