Search | arXiv e-print repository

Synthetic Data Aided Federated Learning Using Foundation Models

Authors: Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, Mustafa A. Mustafa

Abstract: In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synth… ▽ More In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL), a novel data augmentation strategy that aids in homogenizing the local data present on the clients' side. DPSDA-FL improves the training of the local models by leveraging differentially private synthetic data generated from foundation models. We demonstrate the effectiveness of our approach by evaluating it on the benchmark image dataset: CIFAR-10. Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2406.06499 [pdf, other]

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

Authors: Asmar Nadeem, Faegheh Sardari, Robert Dawes, Syed Sameed Husain, Adrian Hilton, Armin Mustafa

Abstract: Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, w… ▽ More Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, we propose NarrativeBridge, an approach comprising of: (1) a novel Causal-Temporal Narrative (CTN) captions benchmark generated using a large language model and few-shot prompting, explicitly encoding cause-effect temporal relationships in video descriptions, evaluated automatically to ensure caption quality and relevance; and (2) a dedicated Cause-Effect Network (CEN) architecture with separate encoders for capturing cause and effect dynamics independently, enabling effective learning and generation of captions with causal-temporal narrative. Extensive experiments demonstrate that CEN is more accurate in articulating the causal and temporal aspects of video content than the second best model (GIT): 17.88 and 17.44 CIDEr on the MSVD and MSR-VTT datasets, respectively. The proposed framework understands and generates nuanced text descriptions with intricate causal-temporal narrative structures present in videos, addressing a critical limitation in video captioning. For project details, visit https://narrativebridge.github.io/. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06187 [pdf, other]

An Effective-Efficient Approach for Dense Multi-Label Action Detection

Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

Abstract: Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships. Recent approaches model temporal information by extracting multi-scale features through hierarc… ▽ More Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships. Recent approaches model temporal information by extracting multi-scale features through hierarchical transformer-based networks. However, the self-attention mechanism in transformers inherently loses temporal positional information. We argue that combining this with multiple sub-sampling processes in hierarchical designs can lead to further loss of positional information. Preserving this information is essential for accurate action detection. In this paper, we address this issue by proposing a novel transformer-based network that (a) employs a non-hierarchical structure when modelling different ranges of temporal dependencies and (b) embeds relative positional encoding in its transformer layers. Furthermore, to model co-occurrence action relationships, current methods explicitly embed class relations into the transformer network. However, these approaches are not computationally efficient, as the network needs to compute all possible pair action class relations. We also overcome this challenge by introducing a novel learning paradigm that allows the network to benefit from explicitly modelling temporal co-occurrence action dependencies without imposing their additional computational costs during inference. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets and show that our method improves the current state-of-the-art results. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 14 pages. arXiv admin note: substantial text overlap with arXiv:2308.05051

arXiv:2405.21069 [pdf, other]

Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction

Authors: Jean-Marc Valin, Ahmed Mustafa, Jan Büthe

Abstract: Neural vocoders are now being used in a wide range of speech processing applications. In many of those applications, the vocoder can be the most complex component, so finding lower complexity algorithms can lead to significant practical benefits. In this work, we propose FARGAN, an autoregressive vocoder that takes advantage of long-term pitch prediction to synthesize high-quality speech in small… ▽ More Neural vocoders are now being used in a wide range of speech processing applications. In many of those applications, the vocoder can be the most complex component, so finding lower complexity algorithms can lead to significant practical benefits. In this work, we propose FARGAN, an autoregressive vocoder that takes advantage of long-term pitch prediction to synthesize high-quality speech in small subframes, without the need for teacher-forcing. Experimental results show that the proposed 600~MFLOPS FARGAN vocoder can achieve both higher quality and lower complexity than existing low-complexity vocoders. The quality even matches that of existing higher-complexity vocoders. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 5 pages

arXiv:2405.18888 [pdf, other]

Proactive Load-Sha** Strategies with Privacy-Cost Trade-offs in Residential Households based on Deep Reinforcement Learning

Authors: Ruichang Zhang, Youcheng Sun, Mustafa A. Mustafa

Abstract: Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on develo** battery-aided load-sha** techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learni… ▽ More Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on develo** battery-aided load-sha** techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learning-based load-sha** algorithm (PLS-DQN) designed to protect user privacy by proactively creating artificial load signatures that mislead potential attackers. We evaluate our proposed algorithm against a non-intrusive load monitoring (NILM) adversary. The results demonstrate that our approach not only effectively conceals real energy usage patterns but also outperforms state-of-the-art methods in enhancing user privacy while maintaining cost efficiency. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 7 pages

arXiv:2405.10690 [pdf, other]

CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

Abstract: Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts… ▽ More Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts unaligned audible or visible events by introducing irrelevant modality information. In this paper, we propose CoLeaF, a novel learning framework that optimizes the integration of cross-modal context in the embedding space such that the network explicitly learns to combine cross-modal information for audible-visible events while filtering them out for unaligned events. Additionally, as videos often involve complex class relationships, modelling them improves performance. However, this introduces extra computational costs into the network. Our framework is designed to leverage cross-class relationships during training without incurring additional computations at inference. Furthermore, we propose new metrics to better evaluate a method's capabilities in performing AVVP. Our extensive experiments demonstrate that CoLeaF significantly improves the state-of-the-art results by an average of 1.9% and 2.4% F-score on the LLP and UnAV-100 datasets, respectively. △ Less

Submitted 7 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted at ECCV 2024

arXiv:2404.15886 [pdf, other]

Privacy-Preserving Billing for Local Energy Markets (Long Version)

Authors: Eman Alqahtani, Mustafa A. Mustafa

Abstract: We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from an… ▽ More We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from any potential internal collusion. We first propose a novel, efficient, and privacy-preserving individual billing scheme, achieving information-theoretic security, which serves as a building block. PBP-LEMs utilizes this scheme, along with other techniques such as multiparty computation, Pedersen commitments and inner product functional encryption, to ensure data confidentiality and accuracy. Additionally, we present three approaches, resulting in different levels of privacy and performance. We prove that the protocol meets its security and privacy requirements and is feasible for deployment in real LEMs. Our analysis also shows variations in overall performance and identifies areas where overhead is concentrated based on the applied approach. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.12103 [pdf, other]

S3R-Net: A Single-Stage Approach to Self-Supervised Shadow Removal

Authors: Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield

Abstract: In this paper we present S3R-Net, the Self-Supervised Shadow Removal Network. The two-branch WGAN model achieves self-supervision relying on the unify-and-adaptphenomenon - it unifies the style of the output data and infers its characteristics from a database of unaligned shadow-free reference images. This approach stands in contrast to the large body of supervised frameworks. S3R-Net also differe… ▽ More In this paper we present S3R-Net, the Self-Supervised Shadow Removal Network. The two-branch WGAN model achieves self-supervision relying on the unify-and-adaptphenomenon - it unifies the style of the output data and infers its characteristics from a database of unaligned shadow-free reference images. This approach stands in contrast to the large body of supervised frameworks. S3R-Net also differentiates itself from the few existing self-supervised models operating in a cycle-consistent manner, as it is a non-cyclic, unidirectional solution. The proposed framework achieves comparable numerical scores to recent selfsupervised shadow removal models while exhibiting superior qualitative performance and kee** the computational cost low. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: NTIRE workshop @ CVPR 2024. Code & models available at https://github.com/n-kubiak/S3R-Net

arXiv:2402.17387 [pdf, other]

doi 10.1109/TIV.2024.3411530

RACP: Risk-Aware Contingency Planning with Multi-Modal Predictions

Authors: Khaled A. Mustafa, Daniel Jarne Ornia, Jens Kober, Javier Alonso-Mora

Abstract: For an autonomous vehicle to operate reliably within real-world traffic scenarios, it is imperative to assess the repercussions of its prospective actions by anticipating the uncertain intentions exhibited by other participants in the traffic environment. Driven by the pronounced multi-modal nature of human driving behavior, this paper presents an approach that leverages Bayesian beliefs over the… ▽ More For an autonomous vehicle to operate reliably within real-world traffic scenarios, it is imperative to assess the repercussions of its prospective actions by anticipating the uncertain intentions exhibited by other participants in the traffic environment. Driven by the pronounced multi-modal nature of human driving behavior, this paper presents an approach that leverages Bayesian beliefs over the distribution of potential policies of other road users to construct a novel risk-aware probabilistic motion planning framework. In particular, we propose a novel contingency planner that outputs long-term contingent plans conditioned on multiple possible intents for other actors in the traffic scene. The Bayesian belief is incorporated into the optimization cost function to influence the behavior of the short-term plan based on the likelihood of other agents' policies. Furthermore, a probabilistic risk metric is employed to fine-tune the balance between efficiency and robustness. Through a series of closed-loop safety-critical simulated traffic scenarios shared with human-driven vehicles, we demonstrate the practical efficacy of our proposed approach that can handle multi-vehicle scenarios. △ Less

Submitted 19 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at IEEE Transactions on Intelligent Vehicles (T-IV)

arXiv:2402.02558 [pdf, other]

Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials

Authors: Ata Mustafa

Abstract: Large Language Models have revolutionized various fields and industries, such as Conversational AI, Content Generation, Information Retrieval, Business Intelligence, and Medical, to name a few. One major application in the field of medical is to analyze and investigate clinical trials for entailment tasks.However, It has been observed that Large Language Models are susceptible to shortcut learning… ▽ More Large Language Models have revolutionized various fields and industries, such as Conversational AI, Content Generation, Information Retrieval, Business Intelligence, and Medical, to name a few. One major application in the field of medical is to analyze and investigate clinical trials for entailment tasks.However, It has been observed that Large Language Models are susceptible to shortcut learning, factual inconsistency, and performance degradation with little variation in context. Adversarial and robust testing is performed to ensure the integrity of models output. But, ambiguity still persists. In order to ensure the integrity of the reasoning performed and investigate the model has correct syntactic and semantic understanding probing is used. Here, I used mnestic probing to investigate the Sci-five model, trained on clinical trial. I investigated the model for feature learnt with respect to natural logic. To achieve the target, I trained task specific probes. Used these probes to investigate the final layers of trained model. Then, fine tuned the trained model using iterative null projection. The results shows that model accuracy improved. During experimentation, I observed that size of the probe has affect on the fine tuning process. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01546 [pdf, other]

doi 10.1109/JIOT.2024.3362587

Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting

Authors: Yi Dong, Yingjie Wang, Mariana Gama, Mustafa A. Mustafa, Geert Deconinck, Xiaowei Huang

Abstract: In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training… ▽ More In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training without the exchange of raw data, these FL models have shown vulnerabilities to emerging attack techniques, such as Deep Leakage from Gradients and poisoning attacks. To counteract these, we initially employ a Secure-Aggregation (SecAgg) algorithm that leverages multiparty computation cryptographic techniques to mitigate the risk of gradient leakage. However, the introduction of SecAgg necessitates the deployment of additional sub-center servers for executing the multiparty computation protocol, thereby escalating computational complexity and reducing system robustness, especially in scenarios where one or more sub-centers are unavailable. To address these challenges, we introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. The Distributed Markovian Switching (DMS) topology shows strong robustness towards the poisoning attacks as well. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm. It not only significantly minimizes communication complexity but also maintains accuracy levels comparable to traditional FL methods, thereby enhancing the scalability of our load forecasting algorithm. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.03154 [pdf, other]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet

Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

Abstract: This paper introduces ViscoNet, a novel method that enhances text-to-image human generation models with visual prompting. Unlike existing methods that rely on lengthy text descriptions to control the image structure, ViscoNet allows users to specify the visual appearance of the target object with a reference image. ViscoNet disentangles the object's appearance from the image background and injects… ▽ More This paper introduces ViscoNet, a novel method that enhances text-to-image human generation models with visual prompting. Unlike existing methods that rely on lengthy text descriptions to control the image structure, ViscoNet allows users to specify the visual appearance of the target object with a reference image. ViscoNet disentangles the object's appearance from the image background and injects it into a pre-trained latent diffusion model (LDM) model via a ControlNet branch. This way, ViscoNet mitigates the style mode collapse problem and enables precise and flexible visual control. We demonstrate the effectiveness of ViscoNet on human image generation, where it can manipulate visual attributes and artistic styles with text and image prompts. We also show that ViscoNet can learn visual conditioning from small and specific object domains while preserving the generative power of the LDM backbone. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.12032 [pdf]

Effect of some plant extracts on hardwood cuttings of Bottlebrush (Callistemon viminalis)

Authors: Hemn Abdalla Mustafa, Tariq Abubakr Ahmad, Aram Akram Mohammed, Zainab Sabah Lazim, Chopi Omer Ibrahim, Roshna Faeq Kak bra, Shvan Ramzi Salih

Abstract: The study was conducted at the Collage of Agricultural Engineering Sciences, University of Sulaimani, Kurdistan Region-Iraq so as to investigate response hardwood cuttings of Callistemon viminalis to some plant extracts. The hardwood cuttings were taken on 11 March 2021 and soaked separately in 3 and 6 g/L aqueous extracts of moringa leaf, licorice root, willow shoot, fenugreek seed and cinnamon b… ▽ More The study was conducted at the Collage of Agricultural Engineering Sciences, University of Sulaimani, Kurdistan Region-Iraq so as to investigate response hardwood cuttings of Callistemon viminalis to some plant extracts. The hardwood cuttings were taken on 11 March 2021 and soaked separately in 3 and 6 g/L aqueous extracts of moringa leaf, licorice root, willow shoot, fenugreek seed and cinnamon bark for 1 hour. They were compared to the cuttings dipped in 3000 ppm IBA for 10s and control cuttings which were soaked in distilled water for 1 hour. The experiment laid out in CRD with three replications in a greenhouse, and each replication included six cuttings which planted in a mixture of sand and rice husk medium. The results showed that the highest (86.66%) rooting was achieved in the cuttings treated with 6 g/L licorice extract and they were significantly different with control cuttings (53.33%), but they were not significantly different with 3000 ppm IBA (66.66%). Cinnamon 3g/L and fenugreek 3g/L extracts gave the lowest (6.66% and 33.33%, respectively) rooting and other studied parameters. The cuttings dipped in 3000 ppm IBA gave the highest (18.91) root number and the highest (66.66%) survival cuttings after transplanting. The longest root (15.54 cm) was found in cuttings were treated with 6 g/L moringa extract. The longest (5.83 cm) shoot was observed in treated cuttings with 3 g/L willow extract. The highest chlorophyll a and b (10.08 and 4.62 mg/L, respectively) were observed in cuttings treated with 6 g/L willow extract. Moreover, 3000 ppm IBA gave the highest (20.23%) total carbohydrate and (1.77 mg/g) IAA content along with 6 g/L licorice, moringa and fenugreek extracts, after 30 days from planting of the cuttings. Licorice root extract at 6 g/L fairly improved the measurements similar to 3000 ppm IBA throughout the study. △ Less

Submitted 9 September, 2023; originally announced November 2023.

arXiv:2310.16754 [pdf, other]

CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often n… ▽ More In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often not balanced within a context; this results in poor performance. In this paper, we propose a novel end-to-end Contextual Multi-modal Alignment (CAD) network that addresses the challenges in AVQA methods by i) introducing a parameter-free stochastic Contextual block that ensures robust audio and visual alignment on the Spatial level; ii) proposing a pre-training technique for dynamic audio and visual alignment on Temporal level in a self-supervised setting, and iii) introducing a cross-attention mechanism to balance audio and visual information on Semantic level. The proposed novel CAD network improves the overall performance over the state-of-the-art methods on average by 9.4% on the MUSIC-AVQA dataset. We also demonstrate that our proposed contributions to AVQA can be added to the existing methods to improve their performance without additional complexity requirements. △ Less

Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2309.14521 [pdf, other]

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Sha**

Authors: Jan Büthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M. Goodwin

Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this… ▽ More Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal sha** module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system. △ Less

Submitted 12 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: final version, accepted at ICASSP 2024

arXiv:2308.05051 [pdf, other]

PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

Abstract: We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-att… ▽ More We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure. We argue that joining the self-attention mechanism with multiple sub-sampling processes in the hierarchical approaches results in increased loss of positional information. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets, and show that PAT improves the current state-of-the-art result by 1.1% and 0.6% mAP on the Charades and MultiTHUMOS datasets, respectively, thereby achieving the new state-of-the-art mAP at 26.5% and 44.6%, respectively. We also perform extensive ablation studies to examine the impact of the different components of our proposed network. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.11109 [pdf]

Influence of phytohormones on seed germination of Solanum linnaeanum

Authors: Aram Akram Mohammed, Haidar Anwar Arkwazee, Ayub Karim Mahmood, Hemn Abdalla Mustafa, Hawar Sleman Halshoy, Salam Mahmud Sulaiman, Jalal Hamasalih Ismael, Nawroz Abdul-razzak Tahir

Abstract: The aim of this study was to determine the germination ability and seedling growth of the apple of Sodom by soaking in water, gibberellin (GA3), naphthylacetic acid (NAA), and salicylic acid (SA), separately. The findings showed that NAA at 50 mgL-1 produced superior germination (77.78%), germination speed (1.43 seeds/time interval), hypocotyl length (1.01 cm), hypocotyl diameter (1.13 mm), leaf n… ▽ More The aim of this study was to determine the germination ability and seedling growth of the apple of Sodom by soaking in water, gibberellin (GA3), naphthylacetic acid (NAA), and salicylic acid (SA), separately. The findings showed that NAA at 50 mgL-1 produced superior germination (77.78%), germination speed (1.43 seeds/time interval), hypocotyl length (1.01 cm), hypocotyl diameter (1.13 mm), leaf number (2.66), and root number (17.25), followed by 50 and 100 mgL-1 GA3, particularly in germination percentage. The best root length (5.33 cm) was detected at 100 mgL-1 SA. In contrast, control seeds and water-soaked seeds showed inferior results. The seeds of the apple of Sodom can be germinated successfully as a result of treatment with NAA at 50 mgL-1, followed by GA3 at 50 and 100 mgL-1. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09618 [pdf, other]

Privacy Preserving Billing in Local Energy Markets with Imperfect Bid-Offer Fulfillment (Long Version)

Authors: Andrei Hutu, Mustafa A. Mustafa

Abstract: Smart grids are being increasingly deployed worldwide, as they constitute the electricity grid of the future, providing bidirectional communication between households. One of their main potential applications is the peer-to-peer (P2P) energy trading market, which promises users better electricity prices and higher incentives to produce renewable energy. However, most P2P markets require users to s… ▽ More Smart grids are being increasingly deployed worldwide, as they constitute the electricity grid of the future, providing bidirectional communication between households. One of their main potential applications is the peer-to-peer (P2P) energy trading market, which promises users better electricity prices and higher incentives to produce renewable energy. However, most P2P markets require users to submit energy bids/offers in advance, which cannot account for unexpected surpluses of energy consumption/production. Moreover, the fine-grained metering information used in calculating and settling bills/rewards is inherently sensitive and must be protected in conformity with existing privacy regulations. To address these issues, this report proposes a novel privacy-preserving billing and settlements protocol, PPBSP, for use in local energy markets with imperfect bid-offer fulfillment, which only uses homomorphically encrypted versions of the half-hourly user consumption data. PPBSP also supports various cost-sharing mechanisms among market participants, including two new and improved methods of proportionally redistributing the cost of maintaining the balance of the grid in a fair manner. An informal privacy analysis is performed, highlighting the privacy-enhancing characteristics of the protocol, which include metering data and bill confidentiality. PPBSP is also evaluated in terms of computation cost and communication overhead, demonstrating its efficiency and feasibility for markets with varying sizes. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 60 pages, 18 figures, 2 tables. This is an extended version of a paper submitted to SmartGridComm 2023

arXiv:2307.08778 [pdf, other]

Zone-Based Privacy-Preserving Billing for Local Energy Market Based on Multiparty Computation

Authors: Eman Alqahtani, Mustafa A. Mustafa

Abstract: This paper proposes a zone-based privacy-preserving billing protocol for local energy markets that takes into account energy volume deviations of market participants from their bids. Our protocol incorporates participants' locations on the grid for splitting the deviations cost. The proposed billing model employs multiparty computation so that the accurate calculation of individual bills is perfor… ▽ More This paper proposes a zone-based privacy-preserving billing protocol for local energy markets that takes into account energy volume deviations of market participants from their bids. Our protocol incorporates participants' locations on the grid for splitting the deviations cost. The proposed billing model employs multiparty computation so that the accurate calculation of individual bills is performed in a decentralised and privacy-preserving manner. We also present a security analysis as well as performance evaluations for different security settings. The results show superiority of the honest-majority model to the dishonest majority in terms of computational efficiency. They also show that the billing can be executed for 5000 users in less than nine seconds in the online phase for all security settings, demonstrating its feasibility to be deployed in real local energy markets. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.06610 [pdf, other]

LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions

Authors: Jan Büthe, Jean-Marc Valin, Ahmed Mustafa

Abstract: Classical speech coding uses low-complexity postfilters with zero lookahead to enhance the quality of coded speech, but their effectiveness is limited by their simplicity. Deep Neural Networks (DNNs) can be much more effective, but require high complexity and model size, or added delay. We propose a DNN model that generates classical filter kernels on a per-frame basis with a model of just 300~K p… ▽ More Classical speech coding uses low-complexity postfilters with zero lookahead to enhance the quality of coded speech, but their effectiveness is limited by their simplicity. Deep Neural Networks (DNNs) can be much more effective, but require high complexity and model size, or added delay. We propose a DNN model that generates classical filter kernels on a per-frame basis with a model of just 300~K parameters and 100~MFLOPS complexity, which is a practical complexity for desktop or mobile device CPUs. The lack of added delay allows it to be integrated into the Opus codec, and we demonstrate that it enables effective wideband encoding for bitrates down to 6 kb/s. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 5 pages, accepted at WASPAA 2023

arXiv:2307.04501 [pdf, other]

A Privacy-Preserving and Accountable Billing Protocol for Peer-to-Peer Energy Trading Markets

Authors: Kamil Erdayandi, Lucas C. Cordeiro, Mustafa A. Mustafa

Abstract: This paper proposes a privacy-preserving and accountable billing (PA-Bill) protocol for trading in peer-to-peer energy markets, addressing situations where there may be discrepancies between the volume of energy committed and delivered. Such discrepancies can lead to challenges in providing both privacy and accountability while maintaining accurate billing. To overcome these challenges, a universa… ▽ More This paper proposes a privacy-preserving and accountable billing (PA-Bill) protocol for trading in peer-to-peer energy markets, addressing situations where there may be discrepancies between the volume of energy committed and delivered. Such discrepancies can lead to challenges in providing both privacy and accountability while maintaining accurate billing. To overcome these challenges, a universal cost splitting mechanism is proposed that prioritises privacy and accountability. It leverages a homomorphic encryption cryptosystem to provide privacy and employs blockchain technology to establish accountability. A dispute resolution mechanism is also introduced to minimise the occurrence of erroneous bill calculations while ensuring accountability and non-repudiation throughout the billing process. Our evaluation demonstrates that PA-Bill offers an effective billing mechanism that maintains privacy and accountability in peer-to-peer energy markets utilising a semi-decentralised approach. △ Less

Submitted 11 September, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 6-pages, 1 Figure, Accepted for International Conference on Smart Energy Systems and Technologies (SEST2023)

arXiv:2306.13793 [pdf, other]

QNNRepair: Quantized Neural Network Repair

Authors: Xidan Song, Youcheng Sun, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: We present QNNRepair, the first method in the literature for repairing quantized neural networks (QNNs). QNNRepair aims to improve the accuracy of a neural network model after quantization. It accepts the full-precision and weight-quantized neural networks and a repair dataset of passing and failing tests. At first, QNNRepair applies a software fault localization method to identify the neurons tha… ▽ More We present QNNRepair, the first method in the literature for repairing quantized neural networks (QNNs). QNNRepair aims to improve the accuracy of a neural network model after quantization. It accepts the full-precision and weight-quantized neural networks and a repair dataset of passing and failing tests. At first, QNNRepair applies a software fault localization method to identify the neurons that cause performance degradation during neural network quantization. Then, it formulates the repair problem into a linear programming problem of solving neuron weights parameters, which corrects the QNN's performance on failing tests while not compromising its performance on passing tests. We evaluate QNNRepair with widely used neural network architectures such as MobileNetV2, ResNet, and VGGNet on popular datasets, including high-resolution images. We also compare QNNRepair with the state-of-the-art data-free quantization method SQuant. According to the experiment results, we conclude that QNNRepair is effective in improving the quantized model's performance in most cases. Its repaired models have 24% higher accuracy than SQuant's in the independent validation set, especially for the ImageNet dataset. △ Less

Submitted 10 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.12826 [pdf]

A Simulation Package in VBA to Support Finance Students for Constructing Optimal Portfolios

Authors: Abdulnasser Hatemi-J, Alan Mustafa

Abstract: This paper introduces a software component created in Visual Basic for Applications (VBA) that can be applied for creating an optimal portfolio using two different methods. The first method is the seminal approach of Markowitz that is based on finding budget shares via the minimization of the variance of the underlying portfolio. The second method is developed by El-Khatib and Hatemi-J, which comb… ▽ More This paper introduces a software component created in Visual Basic for Applications (VBA) that can be applied for creating an optimal portfolio using two different methods. The first method is the seminal approach of Markowitz that is based on finding budget shares via the minimization of the variance of the underlying portfolio. The second method is developed by El-Khatib and Hatemi-J, which combines risk and return directly in the optimization problem and yields budget shares that lead to maximizing the risk adjusted return of the portfolio. This approach is consistent with the expectation of rational investors since these investors consider both risk and return as the fundamental basis for selection of the investment assets. Our package offers another advantage that is usually neglected in the literature, which is the number of assets that should be included in the portfolio. The common practice is to assume that the number of assets is given exogenously when the portfolio is constructed. However, the current software component constructs all possible combinations and thus the investor can figure out empirically which portfolio is the best one among all portfolios considered. The software is consumer friendly via a graphical user interface. An application is also provided to demonstrate how the software can be used using real-time series data for several assets. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11391 [pdf, other]

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Authors: Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie **, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

Abstract: Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorisi… ▽ More Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements. △ Less

Submitted 27 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2304.08870 [pdf, other]

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

Abstract: Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion mode… ▽ More Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance. △ Less

Submitted 26 July, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops 2023

arXiv:2303.14829 [pdf, other]

SEM-POS: Grammatically and Semantically Correct Video Captioning

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and f… ▽ More Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and fuses features from different parts of speech (POS) components with visual-spatial features. We use novel combinations of different POS components - 'determinant + subject', 'auxiliary verb', 'verb', and 'determinant + object' for supervision of the POS blocks - Det + Subject, Aux Verb, Verb, and Det + Object respectively. The novel global-local fusion network together with POS blocks helps align the visual features with language description to generate grammatically and semantically correct captions. Extensive qualitative and quantitative experiments on benchmark MSVD and MSRVTT datasets demonstrate that the proposed approach generates more grammatically and semantically correct captions compared to the existing methods, achieving the new state-of-the-art. Ablations on the POS blocks and the GLFB demonstrate the impact of the contributions on the proposed method. △ Less

Submitted 4 April, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.13158 [pdf]

Improvement of Color Image Analysis Using a New Hybrid Face Recognition Algorithm based on Discrete Wavelets and Chebyshev Polynomials

Authors: Hassan Mohamed Muhi-Aldeen, Maha Ammar Mustafa, Asma A. Abdulrahman, Jabbar Abed Eleiwy, Fouad S. Tahir, Yurii Khlaponin

Abstract: This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompan… ▽ More This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompany the image, as well as because of the large amount of data that makes up the image as it is taken. These data are massive, making it difficult to deal with each other during transmission. However to address this issue, the image compression technique is used, with the image not losing information due to the readings that were obtained, and the results were satisfactory. Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR), Bit Per Pixel (BPP), and Compression Ratio (CR) Coronavirus is the initial treatment, while the processing stage is done with network training for Convolutional Neural Networks (CNN) with Discrete Second Chebeshev Wavelets Convolutional Neural Network (DSCWCNN) and Discrete Third Chebeshev Wavelets Convolutional Neural Network (DTCWCNN) to create an efficient algorithm for face recognition, and the best results were achieved in accuracy and in the least amount of time. Two samples of color images that were made or implemented were used. The proposed theory was obtained with fast and good results; the results are evident shown in the tables below. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.10846 [pdf, other]

Probabilistic Risk Assessment for Chance-Constrained Collision Avoidance in Uncertain Dynamic Environments

Authors: Khaled A. Mustafa, Oscar de Groot, Xinwei Wang, Jens Kober, Javier Alonso-Mora

Abstract: Balancing safety and efficiency when planning in crowded scenarios with uncertain dynamics is challenging where it is imperative to accomplish the robot's mission without incurring any safety violations. Typically, chance constraints are incorporated into the planning problem to provide probabilistic safety guarantees by imposing an upper bound on the collision probability of the planned trajector… ▽ More Balancing safety and efficiency when planning in crowded scenarios with uncertain dynamics is challenging where it is imperative to accomplish the robot's mission without incurring any safety violations. Typically, chance constraints are incorporated into the planning problem to provide probabilistic safety guarantees by imposing an upper bound on the collision probability of the planned trajectory. Yet, this results in overly conservative behavior on the grounds that the gap between the obtained risk and the specified upper limit is not explicitly restricted. To address this issue, we propose a real-time capable approach to quantify the risk associated with planned trajectories obtained from multiple probabilistic planners, running in parallel, with different upper bounds of the acceptable risk level. Based on the evaluated risk, the least conservative plan is selected provided that its associated risk is below a specified threshold. In such a way, the proposed approach provides probabilistic safety guarantees by attaining a closer bound to the specified risk, while being applicable to generic uncertainties of moving obstacles. We demonstrate the efficiency of our proposed approach, by improving the performance of a state-of-the-art probabilistic planner, in simulations and experiments using a mobile robot in an environment shared with humans. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted for presentation at the 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2212.07780 [pdf, ps, other]

General inequalities and new shape operator inequality for contact CR-warped product submanifolds in cosymplectic space form

Authors: Abdulqader Mustafa, Ata Assad, Cenap Ozel, Alexander Pigazzini

Abstract: We establish two main inequalities; one for the norm of the second fundamental form and the other for the matrix of the shape operator. The results obtained are for cosymplectic manifolds and, for these, we show that the contact warped product submanifolds naturally possess a geometric property; namely $\mathcal{D}_1$-minimality which, by means of the Gauss equation, allows us to obtain an optimal… ▽ More We establish two main inequalities; one for the norm of the second fundamental form and the other for the matrix of the shape operator. The results obtained are for cosymplectic manifolds and, for these, we show that the contact warped product submanifolds naturally possess a geometric property; namely $\mathcal{D}_1$-minimality which, by means of the Gauss equation, allows us to obtain an optimal general inequality. For sake of generalization, we state our hypotheses for nearly cosymplectic manifolds, then we obtain them as particular cases for cosymplectic manifolds. For the other part of the paper, we derived some inequalities and applied them to construct and introduce a shape operator inequality for cosimpleptic manifolds involving the harmonic series. As further research directions, we have addressed a couple of open problems arose naturally during this work and which depend on its results. △ Less

Submitted 15 December, 2022; originally announced December 2022.

MSC Class: 53C15; 53C40; 53C42; 53B25

arXiv:2212.07568 [pdf, other]

doi 10.1109/ICIP42928.2021.9506657

Man-recon: manifold learning for reconstruction with deep autoencoder for smart seismic interpretation

Authors: Ahmad Mustafa, Ghassan AlRegib

Abstract: Deep learning can extract rich data representations if provided sufficient quantities of labeled training data. For many tasks however, annotating data has significant costs in terms of time and money, owing to the high standards of subject matter expertise required, for example in medical and geophysical image interpretation tasks. Active Learning can identify the most informative training exampl… ▽ More Deep learning can extract rich data representations if provided sufficient quantities of labeled training data. For many tasks however, annotating data has significant costs in terms of time and money, owing to the high standards of subject matter expertise required, for example in medical and geophysical image interpretation tasks. Active Learning can identify the most informative training examples for the interpreter to train, leading to higher efficiency. We propose an Active learning method based on jointly learning representations for supervised and unsupervised tasks. The learned manifold structure is later utilized to identify informative training samples most dissimilar from the learned manifold from the error profiles on the unsupervised task. We verify the efficiency of the proposed method on a seismic facies segmentation dataset from the Netherlands F3 block survey, significantly outperforming contemporary methods to achieve the highest mean Intersection-Over-Union value of 0.773. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.07563 [pdf, other]

doi 10.1190/image2022-3752055.1

Explainable Machine Learning for Hydrocarbon Prospect Risking

Authors: Ahmad Mustafa, Ghassan AlRegib

Abstract: Hydrocarbon prospect risking is a critical application in geophysics predicting well outcomes from a variety of data including geological, geophysical, and other information modalities. Traditional routines require interpreters to go through a long process to arrive at the probability of success of specific outcomes. AI has the capability to automate the process but its adoption has been limited t… ▽ More Hydrocarbon prospect risking is a critical application in geophysics predicting well outcomes from a variety of data including geological, geophysical, and other information modalities. Traditional routines require interpreters to go through a long process to arrive at the probability of success of specific outcomes. AI has the capability to automate the process but its adoption has been limited thus far owing to a lack of transparency in the way complicated, black box models generate decisions. We demonstrate how LIME -- a model-agnostic explanation technique -- can be used to inject trust in model decisions by uncovering the model's reasoning process for individual predictions. It generates these explanations by fitting interpretable models in the local neighborhood of specific datapoints being queried. On a dataset of well outcomes and corresponding geophysical attribute data, we show how LIME can induce trust in model's decisions by revealing the decision-making process to be aligned to domain knowledge. Further, it has the potential to debug mispredictions made due to anomalous patterns in the data or faulty training datasets. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.04532 [pdf, other]

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

Authors: Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin

Abstract: GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models. However, most of their architectures require dozens of billion floating-point operations per second (GFLOPS) to generate speech waveforms in samplewise manner. This makes GAN vocoders still challenging to run on normal CPUs without accelerators or parallel computers. In this… ▽ More GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models. However, most of their architectures require dozens of billion floating-point operations per second (GFLOPS) to generate speech waveforms in samplewise manner. This makes GAN vocoders still challenging to run on normal CPUs without accelerators or parallel computers. In this work, we propose a new architecture for GAN vocoders that mainly depends on recurrent and fully-connected networks to directly generate the time domain signal in framewise manner. This results in considerable reduction of the computational cost and enables very fast generation on both GPUs and low-complexity CPUs. Experimental results show that our Framewise WaveGAN vocoder achieves significantly higher quality than auto-regressive maximum-likelihood vocoders such as LPCNet at a very low complexity of 1.2 GFLOPS. This makes GAN vocoders more practical on edge and low-power devices. △ Less

Submitted 1 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: Accepted to ICASSP 2023, demo: https://ahmed-fau.github.io/fwgan_demo/

arXiv:2212.04453 [pdf, other]

Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder

Authors: Jean-Marc Valin, Jan Büthe, Ahmed Mustafa

Abstract: Robustness to packet loss is one of the main ongoing challenges in real-time speech communication. Deep packet loss concealment (PLC) techniques have recently demonstrated improved quality compared to traditional PLC. Despite that, all PLC techniques hit fundamental limitations when too much acoustic information is lost. To reduce losses in the first place, data is commonly sent multiple times usi… ▽ More Robustness to packet loss is one of the main ongoing challenges in real-time speech communication. Deep packet loss concealment (PLC) techniques have recently demonstrated improved quality compared to traditional PLC. Despite that, all PLC techniques hit fundamental limitations when too much acoustic information is lost. To reduce losses in the first place, data is commonly sent multiple times using various redundancy mechanisms. We propose a neural speech coder specifically optimized to transmit a large amount of overlap** redundancy at a very low bitrate, up to 50x redundancy using less than 32~kb/s. Results show that the proposed redundancy is more effective than the existing Opus codec redundancy, and that the two can be combined for even greater robustness. △ Less

Submitted 24 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: Proc. ICASSP 2023, 5 pages

arXiv:2211.15387 [pdf, other]

doi 10.48550/arXiv.2211.15387

AIREPAIR: A Repair Platform for Neural Networks

Authors: Xidan Song, Youcheng Sun, Mustafa A. Mustafa, Lucas Cordeiro

Abstract: We present AIREPAIR, a platform for repairing neural networks. It features the integration of existing network repair tools. Based on AIREPAIR, one can run different repair methods on the same model, thus enabling the fair comparison of different repair techniques. We evaluate AIREPAIR with three state-of-the-art repair tools on popular deep-learning datasets and models. Our evaluation confirms th… ▽ More We present AIREPAIR, a platform for repairing neural networks. It features the integration of existing network repair tools. Based on AIREPAIR, one can run different repair methods on the same model, thus enabling the fair comparison of different repair techniques. We evaluate AIREPAIR with three state-of-the-art repair tools on popular deep-learning datasets and models. Our evaluation confirms the utility of AIREPAIR, by comparing and analyzing the results from different repair techniques. A demonstration is available at https://youtu.be/UkKw5neeWhw. △ Less

Submitted 21 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

arXiv:2209.15165 [pdf, other]

doi 10.1145/3565516.3565520

Distilling Style from Image Pairs for Global Forward and Inverse Tone Map**

Authors: Aamir Mustafa, Param Hanji, Rafal K. Mantiuk

Abstract: Many image enhancement or editing operations, such as forward and inverse tone map** or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique map**, disregarding this style. In this work, we show that information about the style can be distilled from collection… ▽ More Many image enhancement or editing operations, such as forward and inverse tone map** or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique map**, disregarding this style. In this work, we show that information about the style can be distilled from collections of image pairs and encoded into a 2- or 3-dimensional vector. This gives us not only an efficient representation but also an interpretable latent space for editing the image style. We represent the global color map** between a pair of images as a custom normalizing flow, conditioned on a polynomial basis of the pixel color. We show that such a network is more effective than PCA or VAE at encoding image style in low-dimensional space and lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement over the state-of-the-art methods. △ Less

Submitted 4 October, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Published in European Conference on Visual Media Production (CVMP '22)

arXiv:2209.06493 [pdf, ps, other]

Geometric inequalities on bi-warped product submanifolds of locally conformal almost cosymplectic manifolds

Authors: Ramandeep Kaur, Gauree Shanker, Alexander Pigazzini, Saeid Jafari, Cenap Ozel, Abdulqader Mustafa

Abstract: In this paper we present not only some properties related to bi-warped product submanifolds of locally conformal almost cosymplectic manifolds, but also we show how the squared norm of the second fundamental form and the bi-warped product's war** functions are related when the bi-warped product submanifold has a proper slant submanifold as a base or fiber. In this paper we present not only some properties related to bi-warped product submanifolds of locally conformal almost cosymplectic manifolds, but also we show how the squared norm of the second fundamental form and the bi-warped product's war** functions are related when the bi-warped product submanifold has a proper slant submanifold as a base or fiber. △ Less

Submitted 20 September, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 21 pages

MSC Class: 53C15; 53C18; 53C25; 53D15

arXiv:2207.10275 [pdf, other]

Adversary Detection and Resilient Control for Multi-Agent Systems

Authors: Aquib Mustafa, Dimitra Panagou

Abstract: This paper presents an adversary detection mechanism and a resilient control framework for multi-agent systems under spatiotemporal constraints. Safety in multi-agent systems is typically addressed under the assumption that all agents collaborate to ensure the forward invariance of a desired safe set. This work analyzes agent behaviors based on certain behavior metrics, and designs a proactive adv… ▽ More This paper presents an adversary detection mechanism and a resilient control framework for multi-agent systems under spatiotemporal constraints. Safety in multi-agent systems is typically addressed under the assumption that all agents collaborate to ensure the forward invariance of a desired safe set. This work analyzes agent behaviors based on certain behavior metrics, and designs a proactive adversary detection mechanism based on the notion of the critical region for the system operation. In particular, the presented detection mechanism not only identifies adversarial agents, but also ensures all-time safety for intact agents. Then, based on the analysis and detection results, a resilient QP-based controller is presented to ensure safety and liveness constraints for intact agents. Simulation results validate the efficacy of the presented theoretical contributions. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2206.11485 [pdf, other]

Patient Aware Active Learning for Fine-Grained OCT Classification

Authors: Yash-yee Logan, Ryan Benkert, Ahmad Mustafa, Gukyeong Kwon, Ghassan AlRegib

Abstract: This paper considers making active learning more sensible from a medical perspective. In practice, a disease manifests itself in different forms across patient cohorts. Existing frameworks have primarily used mathematical constructs to engineer uncertainty or diversity-based methods for selecting the most informative samples. However, such algorithms do not present themselves naturally as usable b… ▽ More This paper considers making active learning more sensible from a medical perspective. In practice, a disease manifests itself in different forms across patient cohorts. Existing frameworks have primarily used mathematical constructs to engineer uncertainty or diversity-based methods for selecting the most informative samples. However, such algorithms do not present themselves naturally as usable by the medical community and healthcare providers. Thus, their deployment in clinical settings is very limited, if any. For this purpose, we propose a framework that incorporates clinical insights into the sample selection process of active learning that can be incorporated with existing algorithms. Our medically interpretable active learning framework captures diverse disease manifestations from patients to improve generalization performance of OCT classification. After comprehensive experiments, we report that incorporating patient insights within the active learning framework yields performance that matches or surpasses five commonly used paradigms on two architectures with a dataset having imbalanced patient distributions. Also, the framework integrates within existing medical practices and thus can be used by healthcare providers. △ Less

Submitted 27 June, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: IEEE International Conference on Image Processing (ICIP)

arXiv:2206.06043 [pdf, other]

Combining BMC and Fuzzing Techniques for Finding Software Vulnerabilities in Concurrent Programs

Authors: Fatimah K. Aljaafari, Rafael Menezes, Edoardo Manino, Fedor Shmarov, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find… ▽ More Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find software vulnerabilities in concurrent programs. Since there are no publicly-available GBF tools for concurrent code, we first propose OpenGBF -- a new open-source concurrency-aware gray-box fuzzer that explores different thread schedules by instrumenting the code under test with random delays. Then, we build an ensemble of a BMC tool and OpenGBF in the following way. On the one hand, when the BMC tool in the ensemble returns a counterexample, we use it as a seed for OpenGBF, thus increasing the likelihood of executing paths guarded by complex mathematical expressions. On the other hand, we aggregate the outcomes of the BMC and GBF tools in the ensemble using a decision matrix, thus improving the accuracy of EBF. We evaluate EBF against state-of-the-art pure BMC tools and show that it can generate up to 14.9% more correct verification witnesses than the corresponding BMC tools alone. Furthermore, we demonstrate the efficacy of OpenGBF, by showing that it can find 24.2% of the vulnerabilities in our evaluation suite, while non-concurrency-aware GBF tools can only find 0.55%. Finally, thanks to our concurrency-aware OpenGBF, EBF detects a data race in the open-source wolfMqtt library and reproduces known bugs in several other real-world programs, which demonstrates its effectiveness in finding vulnerabilities in real-world software. △ Less

Submitted 20 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.08914 [pdf, ps, other]

Existence and Nonexistence of Warped Product Submanifolds of Almost Contact Manifolds

Authors: Abdulqader Mustafa, Cenap Ozel, Alexander Pigazzini, Richard Pincak

Abstract: This paper has two goals; the first is to generalize results for the existence and nonexistence of warped product submanifolds of almost contact manifolds, accordingly a self-contained reference of such submanifolds is offered to save efforts of potential research. Most of the results of this paper are general and decisive enough to generalize both discovered and not discovered results. Moreover,… ▽ More This paper has two goals; the first is to generalize results for the existence and nonexistence of warped product submanifolds of almost contact manifolds, accordingly a self-contained reference of such submanifolds is offered to save efforts of potential research. Most of the results of this paper are general and decisive enough to generalize both discovered and not discovered results. Moreover, a discrete example of contact CR-warped product submanifold in Kenmotsu manifold is constructed. For further research direction, we addressed a couple of open problems arose from the results of this paper. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2109.08911

MSC Class: 53C15; 53C40; 53C42; 53B25

arXiv:2205.05785 [pdf, other]

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

Authors: Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy

Abstract: As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthes… ▽ More As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthesized concealment. We propose a hybrid neural PLC architecture where the missing speech is synthesized using a generative model conditioned using a predictive model. The resulting algorithm achieves natural concealment that surpasses the quality of existing conventional PLC algorithms and ranked second in the Interspeech 2022 PLC Challenge. We show that our solution not only works for uncompressed audio, but is also applicable to a modern speech codec. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2203.04907 [pdf, other]

KPE: Keypoint Pose Encoding for Transformer-based Image Generation

Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

Abstract: Transformers have recently been shown to generate high quality images from text input. However, the existing method of pose conditioning using skeleton image tokens is computationally inefficient and generate low quality images. Therefore we propose a new method; Keypoint Pose Encoding (KPE); KPE is 10 times more memory efficient and over 73% faster at generating high quality images from text inpu… ▽ More Transformers have recently been shown to generate high quality images from text input. However, the existing method of pose conditioning using skeleton image tokens is computationally inefficient and generate low quality images. Therefore we propose a new method; Keypoint Pose Encoding (KPE); KPE is 10 times more memory efficient and over 73% faster at generating high quality images from text input conditioned on the pose. The pose constraint improves the image quality and reduces errors on body extremities such as arms and legs. The additional benefits include invariance to changes in the target image domain and image resolution, making it easily scalable to higher resolution images. We demonstrate the versatility of KPE by generating photorealistic multiperson images derived from the DeepFashion dataset. We also introduce a evaluation method People Count Error (PCE) that is effective in detecting error in generated human images. △ Less

Submitted 6 October, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Journal ref: British Machine Vision Conference (BMVC) 2022

arXiv:2203.00465 [pdf, ps, other]

doi 10.1109/TCC.2024.3375801

Efficient User-Centric Privacy-Friendly and Flexible Wearable Data Aggregation and Sharing

Authors: Khlood Jastaniah, Ning Zhang, Mustafa A. Mustafa

Abstract: Wearable devices can offer services to individuals and the public. However, wearable data collected by cloud providers may pose privacy risks. To reduce these risks while maintaining full functionality, healthcare systems require solutions for privacy-friendly data processing and sharing that can accommodate three main use cases: (i) data owners requesting processing of their own data, and multipl… ▽ More Wearable devices can offer services to individuals and the public. However, wearable data collected by cloud providers may pose privacy risks. To reduce these risks while maintaining full functionality, healthcare systems require solutions for privacy-friendly data processing and sharing that can accommodate three main use cases: (i) data owners requesting processing of their own data, and multiple data requesters requesting data processing of (ii) a single or (iii) multiple data owners. Existing work lacks data owner access control and does not efficiently support these cases, making them unsuitable for wearable devices. To address these limitations, we propose a novel, efficient, user-centric, privacy-friendly, and flexible data aggregation and sharing scheme, named SAMA. SAMA uses a multi-key partial homomorphic encryption scheme to allow flexibility in accommodating the aggregation of data originating from a single or multiple data owners while preserving privacy during the processing. It also uses ciphertext-policy attribute-based encryption scheme to support fine-grain sharing with multiple data requesters based on user-centric access control. Formal security analysis shows that SAMA supports data confidentiality and authorisation. SAMA has also been analysed in terms of computational and communication overheads. Our experimental results demonstrate that SAMA supports privacy-preserving flexible data aggregation more efficiently than the relevant state-of-the-art solutions. △ Less

Submitted 3 March, 2024; v1 submitted 1 March, 2022; originally announced March 2022.

ACM Class: E.3; J.3

arXiv:2201.02997 [pdf, other]

Performance Analysis of Event-Triggered Consensus Control for Multi-agent Systems under Cyber-Physical Attacks

Authors: Farzaneh Tatari, Aquib Mustafa, Majid Mazouchi, Hamidreza Modares, Christos G. Panayiotou, Marios M. Polycarpou

Abstract: This work presents a rigorous analysis of the adverse effects of cyber-physical attacks on the performance of multi-agent consensus with event-triggered control protocols. It is shown how a strategic malicious attack on sensors and actuators can deceive the triggering condition of both state-based event-triggered mechanism and combinational state-based event-triggered mechanism, which are commonpl… ▽ More This work presents a rigorous analysis of the adverse effects of cyber-physical attacks on the performance of multi-agent consensus with event-triggered control protocols. It is shown how a strategic malicious attack on sensors and actuators can deceive the triggering condition of both state-based event-triggered mechanism and combinational state-based event-triggered mechanism, which are commonplace and widely used in the literature. More precisely, it is first shown that a deception attack in the case of combinational state-based event-triggered mechanism can result in a non-triggering misbehavior, in the sense that the compromised agent does not trigger any event and consequently results in partial feedback disconnectivity by preventing information from reaching the local neighbors of the compromised agent. This indicates that the combinational state-based event-triggered mechanism can be leveraged by the attacker to harm the network connectivity by rendering the recent data unavailable to agents. It is then shown that the deception attack in the case of state-based event-triggered mechanism can result in a continuous-triggering misbehavior in the sense that the event-triggered mechanism continuously generates triggering events resulting in undesirable phenomenon of Zeno behavior. Finally, numerical simulations are presented to illustrate the theoretical findings. △ Less

Submitted 2 March, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2201.01810 [pdf, other]

Privacy-Friendly Peer-to-Peer Energy Trading: A Game Theoretical Approach

Authors: Kamil Erdayandi, Amrit Paudel, Lucas Cordeiro, Mustafa A. Mustafa

Abstract: In this paper, we propose a decentralized, privacy-friendly energy trading platform (PFET) based on game theoretical approach - specifically Stackelberg competition. Unlike existing trading schemes, PFET provides a competitive market in which prices and demands are determined based on competition, and computations are performed in a decentralized manner which does not rely on trusted third parties… ▽ More In this paper, we propose a decentralized, privacy-friendly energy trading platform (PFET) based on game theoretical approach - specifically Stackelberg competition. Unlike existing trading schemes, PFET provides a competitive market in which prices and demands are determined based on competition, and computations are performed in a decentralized manner which does not rely on trusted third parties. It uses homomorphic encryption cryptosystem to encrypt sensitive information of buyers and sellers such as sellers$'$ prices and buyers$'$ demands. Buyers calculate total demand on particular seller using an encrypted data and sensitive buyer profile data is hidden from sellers. Hence, privacy of both sellers and buyers is preserved. Through privacy analysis and performance evaluation, we show that PFET preserves users$'$ privacy in an efficient manner. △ Less

Submitted 28 May, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: To be published in IEEE Power & Energy Society General Meeting (GM), 2022

ACM Class: E.3; I.2.11

arXiv:2110.12914 [pdf, other]

SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition

Authors: Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield

Abstract: We present SILT, a Self-supervised Implicit Lighting Transfer method. Unlike previous research on scene relighting, we do not seek to apply arbitrary new lighting configurations to a given scene. Instead, we wish to transfer the lighting style from a database of other scenes, to provide a uniform lighting style regardless of the input. The solution operates as a two-branch network that first aims… ▽ More We present SILT, a Self-supervised Implicit Lighting Transfer method. Unlike previous research on scene relighting, we do not seek to apply arbitrary new lighting configurations to a given scene. Instead, we wish to transfer the lighting style from a database of other scenes, to provide a uniform lighting style regardless of the input. The solution operates as a two-branch network that first aims to map input images of any arbitrary lighting style to a unified domain, with extra guidance achieved through implicit image decomposition. We then remap this unified input domain using a discriminator that is presented with the generated outputs and the style reference, i.e. images of the desired illumination conditions. Our method is shown to outperform supervised relighting solutions across two different datasets without requiring lighting supervision. △ Less

Submitted 15 March, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: Accepted to BMVC 2021. The code and pre-trained models can be found at https://github.com/n-kubiak/SILT

arXiv:2110.06701 [pdf, ps, other]

A General Inequality for Warped Product $CR$-Submanifolds of Kähler Manifolds

Authors: Abdulqader Mustafa, Cenap Ozel, Patrick Linker, Monika Sati, Alexander Pigazzini

Abstract: In this paper, warped product contact $CR$-submanifolds in Sasakian, Kenmotsu and cosymplectic manifolds are shown to possess a geometric property; namely $\mathcal{D}_T$-minimal. Taking benefit from this property, an optimal general inequality for warped product contact $CR$-submanifolds is established in both Sasakian and Kenmotsu manifolds by means of the Gauss equation, we leave cosyplectic be… ▽ More In this paper, warped product contact $CR$-submanifolds in Sasakian, Kenmotsu and cosymplectic manifolds are shown to possess a geometric property; namely $\mathcal{D}_T$-minimal. Taking benefit from this property, an optimal general inequality for warped product contact $CR$-submanifolds is established in both Sasakian and Kenmotsu manifolds by means of the Gauss equation, we leave cosyplectic because it is an easy structure. Moreover, a rich geometry appears when the necessity and sufficiency are proved and discussed in the equality case. Applying this general inequality, the inequalities obtained by Munteanu are derived as particular cases, whereas the inequality obtained in [1] is corrected. Up to now, the method used by Chen and Munteanu can not extended for general ambient manifolds, this is because many limitations in using Codazzi equation. Hence, Our method depends on the Gauss equation. The inequality is constructed to involve an intrinsic invariant (scalar curvature) controlled by an extrinsic one (the second fundamental form), which provides an answer for Problem [1]. As further research directions, we have addressed a couple of open problems arose naturally during this work and depending on its results. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.08911

MSC Class: 53C15; 53C40; 53C42; 53B25

arXiv:2109.08911 [pdf, ps, other]

First Chen Inequality for General Warped Product Submanifolds of a Riemannian Space Form and Applications

Authors: Abdulqader Mustafa, Cenap Ozel, Alexander Pigazzini, Ramandeep Kaur, Gauree Shanker

Abstract: In this paper, the first Chen inequality is proved for general warped product submanifolds in Riemannian space forms, this inequality involves intrinsic invariants ($δ$-invariant and sectional curvature) controlled by an extrinsic one (the mean curvature vector), which provides an answer for Problem 1. As a geometric application, this inequality is applied to derive a necessary condition for the i… ▽ More In this paper, the first Chen inequality is proved for general warped product submanifolds in Riemannian space forms, this inequality involves intrinsic invariants ($δ$-invariant and sectional curvature) controlled by an extrinsic one (the mean curvature vector), which provides an answer for Problem 1. As a geometric application, this inequality is applied to derive a necessary condition for the immersed submanifold to be minimal in Riemannian space forms, which presents a partial answer for the well-known problem proposed by S.S. Chern, Problem 2. For further research directions, we address a couple of open problems; namely Problem 3 and Problem 4. △ Less

Submitted 6 September, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: This is a completely new version

MSC Class: 53C15; 53C40; 53C42; 53B25

arXiv:2108.04051 [pdf, other]

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Authors: Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, Nicola Pia

Abstract: Recently, GAN vocoders have seen rapid progress in speech synthesis, starting to outperform autoregressive models in perceptual quality with much higher generation speed. However, autoregressive vocoders are still the common choice for neural generation of speech signals coded at very low bit rates. In this paper, we present a GAN vocoder which is able to generate wideband speech waveforms from pa… ▽ More Recently, GAN vocoders have seen rapid progress in speech synthesis, starting to outperform autoregressive models in perceptual quality with much higher generation speed. However, autoregressive vocoders are still the common choice for neural generation of speech signals coded at very low bit rates. In this paper, we present a GAN vocoder which is able to generate wideband speech waveforms from parameters coded at 1.6 kbit/s. The proposed model is a modified version of the StyleMelGAN vocoder that can run in frame-by-frame manner, making it suitable for streaming applications. The experimental results show that the proposed model significantly outperforms prior autoregressive vocoders like LPCNet for very low bit rate speech coding, with computational complexity of about 5 GMACs, providing a new state of the art in this domain. Moreover, this streamwise adversarial vocoder delivers quality competitive to advanced speech codecs such as EVS at 5.9 kbit/s on clean speech, which motivates further usage of feed-forward fully-convolutional models for low bit rate speech coding. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021)

arXiv:2106.13385 [pdf, other]

Trends, Politics, Sentiments, and Misinformation: Understanding People's Reactions to COVID-19 During its Early Stages

Authors: Omar Abdel Wahab, Ali Mustafa, André Bertrand Abisseck Bamatakina

Abstract: The sudden outbreak of COVID-19 resulted in large volumes of data shared on different social media platforms. Analyzing and visualizing these data is doubtlessly essential to having a deep understanding of the pandemic's impacts on people's lives and their reactions to them. In this work, we conduct a large-scale spatiotemporal data analytic study to understand peoples' reactions to the COVID-19 p… ▽ More The sudden outbreak of COVID-19 resulted in large volumes of data shared on different social media platforms. Analyzing and visualizing these data is doubtlessly essential to having a deep understanding of the pandemic's impacts on people's lives and their reactions to them. In this work, we conduct a large-scale spatiotemporal data analytic study to understand peoples' reactions to the COVID-19 pandemic during its early stages. In particular, we analyze a JSON-based dataset that is collected from news/messages/boards/blogs in English about COVID-19 over a period of 4 months, for a total of 5.2M posts. The data are collected from December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK. Our study aims mainly to understand which implications of COVID-19 have interested social media users the most and how did they vary over time, the spatiotemporal distribution of misinformation, and the public opinion toward public figures during the pandemic. Our results can be used by many parties (e.g., governments, psychologists, etc.) to make more informative decisions, taking into account the actual interests and opinions of the people. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: 13 pages, 6 figures

Showing 1–50 of 240 results for author: Mustafa, A