Search | arXiv e-print repository

Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game

Authors: Prisha Samadarshi, Mariam Mustafa, Anushka Kulkarni, Raven Rothkopf, Tuhin Chakrabarty, Smaranda Muresan

Abstract: The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety… ▽ More The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games. Compared to GPT-4o, novice and expert players perform better, with expert human players significantly outperforming GPT-4o. To deepen our understanding we create a taxonomy of the knowledge types required to successfully categorize words in the Connections game, revealing that LLMs struggle with associative, encyclopedic, and linguistic knowledge. Our findings establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems. △ Less

Submitted 22 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2405.18888 [pdf, other]

Proactive Load-Sha** Strategies with Privacy-Cost Trade-offs in Residential Households based on Deep Reinforcement Learning

Authors: Ruichang Zhang, Youcheng Sun, Mustafa A. Mustafa

Abstract: Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on develo** battery-aided load-sha** techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learni… ▽ More Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on develo** battery-aided load-sha** techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learning-based load-sha** algorithm (PLS-DQN) designed to protect user privacy by proactively creating artificial load signatures that mislead potential attackers. We evaluate our proposed algorithm against a non-intrusive load monitoring (NILM) adversary. The results demonstrate that our approach not only effectively conceals real energy usage patterns but also outperforms state-of-the-art methods in enhancing user privacy while maintaining cost efficiency. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 7 pages

arXiv:2404.15886 [pdf, other]

Privacy-Preserving Billing for Local Energy Markets (Long Version)

Authors: Eman Alqahtani, Mustafa A. Mustafa

Abstract: We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from an… ▽ More We propose a privacy-preserving billing protocol for local energy markets (PBP-LEMs) that takes into account market participants' energy volume deviations from their bids. PBP-LEMs enables a group of market entities to jointly compute participants' bills in a decentralized and privacy-preserving manner without sacrificing correctness. It also mitigates risks on individuals' privacy arising from any potential internal collusion. We first propose a novel, efficient, and privacy-preserving individual billing scheme, achieving information-theoretic security, which serves as a building block. PBP-LEMs utilizes this scheme, along with other techniques such as multiparty computation, Pedersen commitments and inner product functional encryption, to ensure data confidentiality and accuracy. Additionally, we present three approaches, resulting in different levels of privacy and performance. We prove that the protocol meets its security and privacy requirements and is feasible for deployment in real LEMs. Our analysis also shows variations in overall performance and identifies areas where overhead is concentrated based on the applied approach. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2402.01546 [pdf, other]

doi 10.1109/JIOT.2024.3362587

Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting

Authors: Yi Dong, Yingjie Wang, Mariana Gama, Mustafa A. Mustafa, Geert Deconinck, Xiaowei Huang

Abstract: In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training… ▽ More In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training without the exchange of raw data, these FL models have shown vulnerabilities to emerging attack techniques, such as Deep Leakage from Gradients and poisoning attacks. To counteract these, we initially employ a Secure-Aggregation (SecAgg) algorithm that leverages multiparty computation cryptographic techniques to mitigate the risk of gradient leakage. However, the introduction of SecAgg necessitates the deployment of additional sub-center servers for executing the multiparty computation protocol, thereby escalating computational complexity and reducing system robustness, especially in scenarios where one or more sub-centers are unavailable. To address these challenges, we introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. The Distributed Markovian Switching (DMS) topology shows strong robustness towards the poisoning attacks as well. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm. It not only significantly minimizes communication complexity but also maintains accuracy levels comparable to traditional FL methods, thereby enhancing the scalability of our load forecasting algorithm. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2307.09618 [pdf, other]

Privacy Preserving Billing in Local Energy Markets with Imperfect Bid-Offer Fulfillment (Long Version)

Authors: Andrei Hutu, Mustafa A. Mustafa

Abstract: Smart grids are being increasingly deployed worldwide, as they constitute the electricity grid of the future, providing bidirectional communication between households. One of their main potential applications is the peer-to-peer (P2P) energy trading market, which promises users better electricity prices and higher incentives to produce renewable energy. However, most P2P markets require users to s… ▽ More Smart grids are being increasingly deployed worldwide, as they constitute the electricity grid of the future, providing bidirectional communication between households. One of their main potential applications is the peer-to-peer (P2P) energy trading market, which promises users better electricity prices and higher incentives to produce renewable energy. However, most P2P markets require users to submit energy bids/offers in advance, which cannot account for unexpected surpluses of energy consumption/production. Moreover, the fine-grained metering information used in calculating and settling bills/rewards is inherently sensitive and must be protected in conformity with existing privacy regulations. To address these issues, this report proposes a novel privacy-preserving billing and settlements protocol, PPBSP, for use in local energy markets with imperfect bid-offer fulfillment, which only uses homomorphically encrypted versions of the half-hourly user consumption data. PPBSP also supports various cost-sharing mechanisms among market participants, including two new and improved methods of proportionally redistributing the cost of maintaining the balance of the grid in a fair manner. An informal privacy analysis is performed, highlighting the privacy-enhancing characteristics of the protocol, which include metering data and bill confidentiality. PPBSP is also evaluated in terms of computation cost and communication overhead, demonstrating its efficiency and feasibility for markets with varying sizes. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 60 pages, 18 figures, 2 tables. This is an extended version of a paper submitted to SmartGridComm 2023

arXiv:2307.08778 [pdf, other]

Zone-Based Privacy-Preserving Billing for Local Energy Market Based on Multiparty Computation

Authors: Eman Alqahtani, Mustafa A. Mustafa

Abstract: This paper proposes a zone-based privacy-preserving billing protocol for local energy markets that takes into account energy volume deviations of market participants from their bids. Our protocol incorporates participants' locations on the grid for splitting the deviations cost. The proposed billing model employs multiparty computation so that the accurate calculation of individual bills is perfor… ▽ More This paper proposes a zone-based privacy-preserving billing protocol for local energy markets that takes into account energy volume deviations of market participants from their bids. Our protocol incorporates participants' locations on the grid for splitting the deviations cost. The proposed billing model employs multiparty computation so that the accurate calculation of individual bills is performed in a decentralised and privacy-preserving manner. We also present a security analysis as well as performance evaluations for different security settings. The results show superiority of the honest-majority model to the dishonest majority in terms of computational efficiency. They also show that the billing can be executed for 5000 users in less than nine seconds in the online phase for all security settings, demonstrating its feasibility to be deployed in real local energy markets. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.04501 [pdf, other]

A Privacy-Preserving and Accountable Billing Protocol for Peer-to-Peer Energy Trading Markets

Authors: Kamil Erdayandi, Lucas C. Cordeiro, Mustafa A. Mustafa

Abstract: This paper proposes a privacy-preserving and accountable billing (PA-Bill) protocol for trading in peer-to-peer energy markets, addressing situations where there may be discrepancies between the volume of energy committed and delivered. Such discrepancies can lead to challenges in providing both privacy and accountability while maintaining accurate billing. To overcome these challenges, a universa… ▽ More This paper proposes a privacy-preserving and accountable billing (PA-Bill) protocol for trading in peer-to-peer energy markets, addressing situations where there may be discrepancies between the volume of energy committed and delivered. Such discrepancies can lead to challenges in providing both privacy and accountability while maintaining accurate billing. To overcome these challenges, a universal cost splitting mechanism is proposed that prioritises privacy and accountability. It leverages a homomorphic encryption cryptosystem to provide privacy and employs blockchain technology to establish accountability. A dispute resolution mechanism is also introduced to minimise the occurrence of erroneous bill calculations while ensuring accountability and non-repudiation throughout the billing process. Our evaluation demonstrates that PA-Bill offers an effective billing mechanism that maintains privacy and accountability in peer-to-peer energy markets utilising a semi-decentralised approach. △ Less

Submitted 11 September, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 6-pages, 1 Figure, Accepted for International Conference on Smart Energy Systems and Technologies (SEST2023)

arXiv:2306.13793 [pdf, other]

QNNRepair: Quantized Neural Network Repair

Authors: Xidan Song, Youcheng Sun, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: We present QNNRepair, the first method in the literature for repairing quantized neural networks (QNNs). QNNRepair aims to improve the accuracy of a neural network model after quantization. It accepts the full-precision and weight-quantized neural networks and a repair dataset of passing and failing tests. At first, QNNRepair applies a software fault localization method to identify the neurons tha… ▽ More We present QNNRepair, the first method in the literature for repairing quantized neural networks (QNNs). QNNRepair aims to improve the accuracy of a neural network model after quantization. It accepts the full-precision and weight-quantized neural networks and a repair dataset of passing and failing tests. At first, QNNRepair applies a software fault localization method to identify the neurons that cause performance degradation during neural network quantization. Then, it formulates the repair problem into a linear programming problem of solving neuron weights parameters, which corrects the QNN's performance on failing tests while not compromising its performance on passing tests. We evaluate QNNRepair with widely used neural network architectures such as MobileNetV2, ResNet, and VGGNet on popular datasets, including high-resolution images. We also compare QNNRepair with the state-of-the-art data-free quantization method SQuant. According to the experiment results, we conclude that QNNRepair is effective in improving the quantized model's performance in most cases. Its repaired models have 24% higher accuracy than SQuant's in the independent validation set, especially for the ImageNet dataset. △ Less

Submitted 10 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.11391 [pdf, other]

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Authors: Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie **, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

Abstract: Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorisi… ▽ More Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements. △ Less

Submitted 27 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2303.13158 [pdf]

Improvement of Color Image Analysis Using a New Hybrid Face Recognition Algorithm based on Discrete Wavelets and Chebyshev Polynomials

Authors: Hassan Mohamed Muhi-Aldeen, Maha Ammar Mustafa, Asma A. Abdulrahman, Jabbar Abed Eleiwy, Fouad S. Tahir, Yurii Khlaponin

Abstract: This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompan… ▽ More This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompany the image, as well as because of the large amount of data that makes up the image as it is taken. These data are massive, making it difficult to deal with each other during transmission. However to address this issue, the image compression technique is used, with the image not losing information due to the readings that were obtained, and the results were satisfactory. Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR), Bit Per Pixel (BPP), and Compression Ratio (CR) Coronavirus is the initial treatment, while the processing stage is done with network training for Convolutional Neural Networks (CNN) with Discrete Second Chebeshev Wavelets Convolutional Neural Network (DSCWCNN) and Discrete Third Chebeshev Wavelets Convolutional Neural Network (DTCWCNN) to create an efficient algorithm for face recognition, and the best results were achieved in accuracy and in the least amount of time. Two samples of color images that were made or implemented were used. The proposed theory was obtained with fast and good results; the results are evident shown in the tables below. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2211.15387 [pdf, other]

doi 10.48550/arXiv.2211.15387

AIREPAIR: A Repair Platform for Neural Networks

Authors: Xidan Song, Youcheng Sun, Mustafa A. Mustafa, Lucas Cordeiro

Abstract: We present AIREPAIR, a platform for repairing neural networks. It features the integration of existing network repair tools. Based on AIREPAIR, one can run different repair methods on the same model, thus enabling the fair comparison of different repair techniques. We evaluate AIREPAIR with three state-of-the-art repair tools on popular deep-learning datasets and models. Our evaluation confirms th… ▽ More We present AIREPAIR, a platform for repairing neural networks. It features the integration of existing network repair tools. Based on AIREPAIR, one can run different repair methods on the same model, thus enabling the fair comparison of different repair techniques. We evaluate AIREPAIR with three state-of-the-art repair tools on popular deep-learning datasets and models. Our evaluation confirms the utility of AIREPAIR, by comparing and analyzing the results from different repair techniques. A demonstration is available at https://youtu.be/UkKw5neeWhw. △ Less

Submitted 21 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

arXiv:2206.06043 [pdf, other]

Combining BMC and Fuzzing Techniques for Finding Software Vulnerabilities in Concurrent Programs

Authors: Fatimah K. Aljaafari, Rafael Menezes, Edoardo Manino, Fedor Shmarov, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find… ▽ More Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find software vulnerabilities in concurrent programs. Since there are no publicly-available GBF tools for concurrent code, we first propose OpenGBF -- a new open-source concurrency-aware gray-box fuzzer that explores different thread schedules by instrumenting the code under test with random delays. Then, we build an ensemble of a BMC tool and OpenGBF in the following way. On the one hand, when the BMC tool in the ensemble returns a counterexample, we use it as a seed for OpenGBF, thus increasing the likelihood of executing paths guarded by complex mathematical expressions. On the other hand, we aggregate the outcomes of the BMC and GBF tools in the ensemble using a decision matrix, thus improving the accuracy of EBF. We evaluate EBF against state-of-the-art pure BMC tools and show that it can generate up to 14.9% more correct verification witnesses than the corresponding BMC tools alone. Furthermore, we demonstrate the efficacy of OpenGBF, by showing that it can find 24.2% of the vulnerabilities in our evaluation suite, while non-concurrency-aware GBF tools can only find 0.55%. Finally, thanks to our concurrency-aware OpenGBF, EBF detects a data race in the open-source wolfMqtt library and reproduces known bugs in several other real-world programs, which demonstrates its effectiveness in finding vulnerabilities in real-world software. △ Less

Submitted 20 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2203.04347 [pdf]

An accurate IoT Intrusion Detection Framework using Apache Spark

Authors: Mohamed Abushwereb, Mouhammd Alkasassbeh, Mohammad Almseidin, Muhannad Mustafa

Abstract: The internet has caused tremendous changes since its appearance in the 1980s, and now, the Internet of Things (IoT) seems to be doing the same. The potential of IoT has made it the center of attention for many people, but, where some see an opportunity to contribute, others may see IoT networks as a target to be exploited. The high number of IoT devices makes them the perfect setup for staging den… ▽ More The internet has caused tremendous changes since its appearance in the 1980s, and now, the Internet of Things (IoT) seems to be doing the same. The potential of IoT has made it the center of attention for many people, but, where some see an opportunity to contribute, others may see IoT networks as a target to be exploited. The high number of IoT devices makes them the perfect setup for staging denial-of-service attacks (DoS) that can have devastating consequences. This renders the need for cybersecurity measures such as intrusion detection systems (IDSs) evident. The aim of this paper is to build an IDS using the big data platform, Apache Spark. Apache Spark was used along with its ML library (MLlib) and the BoT-IoT dataset. The IDS was then tested and evaluated based on F-Measure (f1), as was the standard when evaluating imbalanced data. Two rounds of tests were performed, a partial dataset for minimizing bias, and the full BoT-IoT dataset for exploring big data and ML capabilities in a security setting. For the partial dataset, the Random Forest algorithm had the highest performance for binary classification at an average f1 measure of 99.7%, as well as 99.6% for main category classification, and an 88.5% f1 measure for sub category classification. As for the complete dataset, the Decision Tree algorithm scored the highest f1 measures for all conducted tests; 97.9% for binary classification, 79% for main category classification, and 77% for sub category classification. △ Less

Submitted 21 February, 2022; originally announced March 2022.

Comments: 15 pages

arXiv:2203.00465 [pdf, ps, other]

doi 10.1109/TCC.2024.3375801

Efficient User-Centric Privacy-Friendly and Flexible Wearable Data Aggregation and Sharing

Authors: Khlood Jastaniah, Ning Zhang, Mustafa A. Mustafa

Abstract: Wearable devices can offer services to individuals and the public. However, wearable data collected by cloud providers may pose privacy risks. To reduce these risks while maintaining full functionality, healthcare systems require solutions for privacy-friendly data processing and sharing that can accommodate three main use cases: (i) data owners requesting processing of their own data, and multipl… ▽ More Wearable devices can offer services to individuals and the public. However, wearable data collected by cloud providers may pose privacy risks. To reduce these risks while maintaining full functionality, healthcare systems require solutions for privacy-friendly data processing and sharing that can accommodate three main use cases: (i) data owners requesting processing of their own data, and multiple data requesters requesting data processing of (ii) a single or (iii) multiple data owners. Existing work lacks data owner access control and does not efficiently support these cases, making them unsuitable for wearable devices. To address these limitations, we propose a novel, efficient, user-centric, privacy-friendly, and flexible data aggregation and sharing scheme, named SAMA. SAMA uses a multi-key partial homomorphic encryption scheme to allow flexibility in accommodating the aggregation of data originating from a single or multiple data owners while preserving privacy during the processing. It also uses ciphertext-policy attribute-based encryption scheme to support fine-grain sharing with multiple data requesters based on user-centric access control. Formal security analysis shows that SAMA supports data confidentiality and authorisation. SAMA has also been analysed in terms of computational and communication overheads. Our experimental results demonstrate that SAMA supports privacy-preserving flexible data aggregation more efficiently than the relevant state-of-the-art solutions. △ Less

Submitted 3 March, 2024; v1 submitted 1 March, 2022; originally announced March 2022.

ACM Class: E.3; J.3

arXiv:2201.01810 [pdf, other]

Privacy-Friendly Peer-to-Peer Energy Trading: A Game Theoretical Approach

Authors: Kamil Erdayandi, Amrit Paudel, Lucas Cordeiro, Mustafa A. Mustafa

Abstract: In this paper, we propose a decentralized, privacy-friendly energy trading platform (PFET) based on game theoretical approach - specifically Stackelberg competition. Unlike existing trading schemes, PFET provides a competitive market in which prices and demands are determined based on competition, and computations are performed in a decentralized manner which does not rely on trusted third parties… ▽ More In this paper, we propose a decentralized, privacy-friendly energy trading platform (PFET) based on game theoretical approach - specifically Stackelberg competition. Unlike existing trading schemes, PFET provides a competitive market in which prices and demands are determined based on competition, and computations are performed in a decentralized manner which does not rely on trusted third parties. It uses homomorphic encryption cryptosystem to encrypt sensitive information of buyers and sellers such as sellers$'$ prices and buyers$'$ demands. Buyers calculate total demand on particular seller using an encrypted data and sensitive buyer profile data is hidden from sellers. Hence, privacy of both sellers and buyers is preserved. Through privacy analysis and performance evaluation, we show that PFET preserves users$'$ privacy in an efficient manner. △ Less

Submitted 28 May, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: To be published in IEEE Power & Energy Society General Meeting (GM), 2022

ACM Class: E.3; I.2.11

arXiv:2109.05688 [pdf]

Thematic analysis of multiple sclerosis research by enhanced strategic diagram

Authors: Nazlahshaniza Shafina, Che Aishah Nazariah Ismaila, Mohd Zulkifli Mustafa, Nurhafizah Ghani, Asma Hayati Ahmad, Zahiruddin Othman, Adi Wijaya, Rahimah Zakaria

Abstract: This bibliometric review summarised the research trends and analysed research areas in multiple sclerosis (MS) over the last decade. The documents containing the term "multiple sclerosis" in the article title were retrieved from the Scopus database. We found a total of 18003 articles published in journals in the English language between 2012 and 2021. The emerging keywords identified utilising the… ▽ More This bibliometric review summarised the research trends and analysed research areas in multiple sclerosis (MS) over the last decade. The documents containing the term "multiple sclerosis" in the article title were retrieved from the Scopus database. We found a total of 18003 articles published in journals in the English language between 2012 and 2021. The emerging keywords identified utilising the enhanced strategic diagram were "covid-19", "teriflunomide", "clinical trial", "microglia", "b cells", "myelin", "brain", "white matter", "functional connectivity", "pain", "employment", "health-related quality of life", "meta-analysis" and "comorbidity". In conclusion, this study demonstrates the tremendous growth of MS literature worldwide, which is expected to grow more than double during the next decade especially in the identified emerging topics. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: 20 pages,6 figures

arXiv:2106.12662 [pdf, other]

doi 10.3847/1538-4357/ac5faa

Fast, high-fidelity Lyman $α$ forests with convolutional neural networks

Authors: Peter Harrington, Mustafa Mustafa, Max Dornfest, Benjamin Horowitz, Zarija Lukić

Abstract: Full-physics cosmological simulations are powerful tools for studying the formation and evolution of structure in the universe but require extreme computational resources. Here, we train a convolutional neural network to use a cheaper N-body-only simulation to reconstruct the baryon hydrodynamic variables (density, temperature, and velocity) on scales relevant to the Lyman-$α$ (Ly$α$) forest, usin… ▽ More Full-physics cosmological simulations are powerful tools for studying the formation and evolution of structure in the universe but require extreme computational resources. Here, we train a convolutional neural network to use a cheaper N-body-only simulation to reconstruct the baryon hydrodynamic variables (density, temperature, and velocity) on scales relevant to the Lyman-$α$ (Ly$α$) forest, using data from Nyx simulations. We show that our method enables rapid estimation of these fields at a resolution of $\sim$20kpc, and captures the statistics of the Ly$α$ forest with much greater accuracy than existing approximations. Because our model is fully-convolutional, we can train on smaller simulation boxes and deploy on much larger ones, enabling substantial computational savings. Furthermore, as our method produces an approximation for the hydrodynamic fields instead of Ly$α$ flux directly, it is not limited to a particular choice of ionizing background or mean transmitted flux. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 10 pages, 6 figures

arXiv:2106.09338

Investigating Misinformation Dissemination on Social Media in Pakistan

Authors: Danyal Haroon, Hammad Arif, Ahmed Abdullah Tariq, fareeda nawaz, Dr. Ihsan Ayyub Qazi, Dr. Maryam mustafa

Abstract: Fake news and misinformation are one of the most significant challenges brought about by advances in communication technologies. We chose to research the spread of fake news in Pakistan because of some unfortunate incidents that took place during 2020. These included the downplaying of the severity of the COVID-19 pandemic, and protests by right-wing political movements. We observed that fake news… ▽ More Fake news and misinformation are one of the most significant challenges brought about by advances in communication technologies. We chose to research the spread of fake news in Pakistan because of some unfortunate incidents that took place during 2020. These included the downplaying of the severity of the COVID-19 pandemic, and protests by right-wing political movements. We observed that fake news and misinformation contributed significantly to these events and especially affected low-literate and low-income populations. We conducted a cross-platform comparison of misinformation on WhatsApp, Twitter and YouTube with a primary focus on messages shared in public WhatsApp groups, and analysed the characteristics of misinformation, techniques used to make is believable, and how users respond to it. To the best of our knowledge, this is the first attempt to compare misinformation on all three platforms in Pakistan. Data collected over a span of eight months helped us identify fake news and misinformation related to politics, religion and health, among other categories. Common elements which were used by fake news creators in Pakistan to make false content seem believable included: appeals to emotion, conspiracy theories, political and religious polarization, incorrect facts and impersonation of credible sources. △ Less

Submitted 8 August, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: i want to further work on it

arXiv:2103.11363 [pdf, other]

Finding Security Vulnerabilities in IoT Cryptographic Protocol and Concurrent Implementations

Authors: Fatimah Aljaafari, Rafael Menezes, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: Internet of Things (IoT) consists of a large number of devices connected through a network, which exchange a high volume of data, thereby posing new security, privacy, and trust issues. One way to address these issues is ensuring data confidentiality using lightweight encryption algorithms for IoT protocols. However, the design and implementation of such protocols is an error-prone task; flaws in… ▽ More Internet of Things (IoT) consists of a large number of devices connected through a network, which exchange a high volume of data, thereby posing new security, privacy, and trust issues. One way to address these issues is ensuring data confidentiality using lightweight encryption algorithms for IoT protocols. However, the design and implementation of such protocols is an error-prone task; flaws in the implementation can lead to devastating security vulnerabilities. Here we propose a new verification approach named Encryption-BMC and Fuzzing (EBF), which combines Bounded Model Checking (BMC) and Fuzzing techniques to check for security vulnerabilities that arise from concurrent implementations of cyrptographic protocols, which include data race, thread leak, arithmetic overflow, and memory safety. EBF models IoT protocols as a client and server using POSIX threads, thereby simulating both entities' communication. It also employs static and dynamic verification to cover the system's state-space exhaustively. We evaluate EBF against three benchmarks. First, we use the concurrency benchmark from SV-COMP and show that it outperforms other state-of-the-art tools such as ESBMC, AFL, Lazy-CSeq, and TSAN with respect to bug finding. Second, we evaluate an open-source implementation called WolfMQTT. It is an MQTT client implementation that uses the WolfSSL library. We show that \tool detects a data race bug, which other approaches are unable to find. Third, to show the effectiveness of EBF, we replicate some known vulnerabilities in OpenSSL and CyaSSL (lately WolfSSL) libraries. EBF can detect the bugs in minimum time. △ Less

Submitted 27 April, 2021; v1 submitted 21 March, 2021; originally announced March 2021.

arXiv:2103.09360 [pdf, other]

Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving deep spatial transformers

Authors: Ashesh Chattopadhyay, Mustafa Mustafa, Pedram Hassanzadeh, Eviatar Bach, Karthik Kashinath

Abstract: There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent sp… ▽ More There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent space of the U-NETs to preserve a property called equivariance, which is related to correctly capturing rotations and scalings of features in spatio-temporal data, 2) a data-assimilation (DA) algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, which combines forecasts from DDWP models with different time steps through DA, improving the accuracy of forecasts at short intervals. To show the benefit/feasibility of each component, we use geopotential height at 500~hPa (Z500) from ERA5 reanalysis and examine the short-term forecast accuracy of specific setups of the DDWP framework. Results show that the equivariance-preserving networks (U-STNs) clearly outperform the U-NETs, for example improving the forecast skill by $45\%$. Using a sigma-point ensemble Kalman (SPEnKF) algorithm for DA and U-STN as the forward model, we show that stable, accurate DA cycles are achieved even with high observation noise. The DDWP+DA framework substantially benefits from large ($O(1000)$) ensembles that are inexpensively generated with the data-driven forward model in each DA cycle. The multi-time-step DDWP+DA framework also shows promises, e.g., it reduces the average error by factors of 2-3. △ Less

Submitted 16 March, 2021; originally announced March 2021.

Comments: Under review in Geoscientific Model Development

arXiv:2101.04293 [pdf, other]

Estimating Galactic Distances From Images Using Self-supervised Representation Learning

Authors: Md Abul Hayat, Peter Harrington, George Stein, Zarija Lukić, Mustafa Mustafa

Abstract: We use a contrastive self-supervised learning framework to estimate distances to galaxies from their photometric images. We incorporate data augmentations from computer vision as well as an application-specific augmentation accounting for galactic dust. We find that the resulting visual representations of galaxy images are semantically useful and allow for fast similarity searches, and can be succ… ▽ More We use a contrastive self-supervised learning framework to estimate distances to galaxies from their photometric images. We incorporate data augmentations from computer vision as well as an application-specific augmentation accounting for galactic dust. We find that the resulting visual representations of galaxy images are semantically useful and allow for fast similarity searches, and can be successfully fine-tuned for the task of redshift estimation. We show that (1) pretraining on a large corpus of unlabeled data followed by fine-tuning on some labels can attain the accuracy of a fully-supervised model which requires 2-4x more labeled data, and (2) that by fine-tuning our self-supervised representations using all available data labels in the Main Galaxy Sample of the Sloan Digital Sky Survey (SDSS), we outperform the state-of-the-art supervised learning method. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:2101.01950 [pdf, other]

HERMES: Scalable, Secure, and Privacy-Enhancing Vehicle Access System

Authors: Iraklis Symeonidis, Dragos Rotaru, Mustafa A. Mustafa, Bart Mennink, Bart Preneel, Panos Papadimitratos

Abstract: We propose HERMES, a scalable, secure, and privacy-enhancing system for users to share and access vehicles. HERMES securely outsources operations of vehicle access token generation to a set of untrusted servers. It builds on an earlier proposal, namely SePCAR [1], and extends the system design for improved efficiency and scalability. To cater to system and user needs for secure and private computa… ▽ More We propose HERMES, a scalable, secure, and privacy-enhancing system for users to share and access vehicles. HERMES securely outsources operations of vehicle access token generation to a set of untrusted servers. It builds on an earlier proposal, namely SePCAR [1], and extends the system design for improved efficiency and scalability. To cater to system and user needs for secure and private computations, HERMES utilizes and combines several cryptographic primitives with secure multiparty computation efficiently. It conceals secret keys of vehicles and transaction details from the servers, including vehicle booking details, access token information, and user and vehicle identities. It also provides user accountability in case of disputes. Besides, we provide semantic security analysis and prove that HERMES meets its security and privacy requirements. Last but not least, we demonstrate that HERMES is efficient and, in contrast to SePCAR, scales to a large number of users and vehicles, making it practical for real-world deployments. We build our evaluations with two different multiparty computation protocols: HtMAC-MiMC and CBC-MAC-AES. Our results demonstrate that HERMES with HtMAC-MiMC requires only approx 1,83 ms for generating an access token for a single-vehicle owner and approx 11,9 ms for a large branch of rental companies with over a thousand vehicles. It handles 546 and 84 access token generations per second, respectively. This results in HERMES being 696 (with HtMAC-MiMC) and 42 (with CBC-MAC-AES) times faster compared to in SePCAR for a single-vehicle owner access token generation. Furthermore, we show that HERMES is practical on the vehicle side, too, as access token operations performed on a prototype vehicle on-board unit take only approx 62,087 ms. △ Less

Submitted 19 March, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

arXiv:2012.13083 [pdf, other]

doi 10.3847/2041-8213/abf2c7

Self-Supervised Representation Learning for Astronomical Images

Authors: Md Abul Hayat, George Stein, Peter Harrington, Zarija Lukić, Mustafa Mustafa

Abstract: Sky surveys are the largest data generators in astronomy, making automated tools for extracting meaningful scientific information an absolute necessity. We show that, without the need for labels, self-supervised learning recovers representations of sky survey images that are semantically useful for a variety of scientific tasks. These representations can be directly used as features, or fine-tuned… ▽ More Sky surveys are the largest data generators in astronomy, making automated tools for extracting meaningful scientific information an absolute necessity. We show that, without the need for labels, self-supervised learning recovers representations of sky survey images that are semantically useful for a variety of scientific tasks. These representations can be directly used as features, or fine-tuned, to outperform supervised methods trained only on labeled data. We apply a contrastive learning framework on multi-band galaxy photometry from the Sloan Digital Sky Survey (SDSS) to learn image representations. We then use them for galaxy morphology classification, and fine-tune them for photometric redshift estimation, using labels from the Galaxy Zoo 2 dataset and SDSS spectroscopy. In both downstream tasks, using the same learned representations, we outperform the supervised state-of-the-art results, and we show that our approach can achieve the accuracy of supervised models while using 2-4 times fewer labels for training. △ Less

Submitted 8 April, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: The codes, trained models, and data can be found at https://portal.nersc.gov/project/dasrepo/self-supervised-learning-sdss

Journal ref: The Astrophysical Journal Letters, Volume 911 (2021), Number 2, Letter 33

arXiv:2010.00072 [pdf, ps, other]

Using Machine Learning to Augment Coarse-Grid Computational Fluid Dynamics Simulations

Authors: Jaideep Pathak, Mustafa Mustafa, Karthik Kashinath, Emmanuel Motheau, Thorsten Kurth, Marcus Day

Abstract: Simulation of turbulent flows at high Reynolds number is a computationally challenging task relevant to a large number of engineering and scientific applications in diverse fields such as climate science, aerodynamics, and combustion. Turbulent flows are typically modeled by the Navier-Stokes equations. Direct Numerical Simulation (DNS) of the Navier-Stokes equations with sufficient numerical reso… ▽ More Simulation of turbulent flows at high Reynolds number is a computationally challenging task relevant to a large number of engineering and scientific applications in diverse fields such as climate science, aerodynamics, and combustion. Turbulent flows are typically modeled by the Navier-Stokes equations. Direct Numerical Simulation (DNS) of the Navier-Stokes equations with sufficient numerical resolution to capture all the relevant scales of the turbulent motions can be prohibitively expensive. Simulation at lower-resolution on a coarse-grid introduces significant errors. We introduce a machine learning (ML) technique based on a deep neural network architecture that corrects the numerical errors induced by a coarse-grid simulation of turbulent flows at high-Reynolds numbers, while simultaneously recovering an estimate of the high-resolution fields. Our proposed simulation strategy is a hybrid ML-PDE solver that is capable of obtaining a meaningful high-resolution solution trajectory while solving the system PDE at a lower resolution. The approach has the potential to dramatically reduce the expense of turbulent flow simulations. As a proof-of-concept, we demonstrate our ML-PDE strategy on a two-dimensional turbulent (Rayleigh Number $Ra=10^9$) Rayleigh-Bénard Convection (RBC) problem. △ Less

Submitted 3 October, 2020; v1 submitted 30 September, 2020; originally announced October 2020.

Comments: Corrected typographical errors in the previous version related to the incorrectly formatted accented character "é" appearing in various places in the manuscript

arXiv:2006.11847 [pdf, other]

An image encryption algorithm based on chaotic Lorenz system and novel primitive polynomial S-boxes

Authors: Temadher Alassiry Al-Maadeed, Iqtadar Hussain, Amir Anees, M. T. Mustafa

Abstract: Nowadays, the chaotic cryptosystems are gaining more attention due to their efficiency, the assurance of robustness and high sensitivity corresponding to initial conditions. In literature, on one hand there are many encryption algorithms that only guarantee security while on the other hand there are schemes based on chaotic systems that only promise the uncertainty. Due to these limitations, each… ▽ More Nowadays, the chaotic cryptosystems are gaining more attention due to their efficiency, the assurance of robustness and high sensitivity corresponding to initial conditions. In literature, on one hand there are many encryption algorithms that only guarantee security while on the other hand there are schemes based on chaotic systems that only promise the uncertainty. Due to these limitations, each of these approaches cannot adequately encounter the challenges of current scenario. Here we take a unified approach and propose an image encryption algorithm based on Lorenz chaotic system and primitive irreducible polynomial S-boxes. First, we propose 16 different S-boxes based on projective general linear group and 16 primitive irreducible polynomials of Galois field of order 256, and then utilize these S-boxes with combination of chaotic map in image encryption scheme. Three chaotic sequences can be produced by the Lorenz chaotic system corresponding to variables $x$, $y$ and $z$. We construct a new pseudo random chaotic sequence $k_i$ based on $x$, $y$ and $z$. The plain image is encrypted by the use of chaotic sequence $k_i$ and XOR operation to get a ciphered image. To demonstrate the strength of presented image encryption, some renowned analyses as well as MATLAB simulations are performed. △ Less

Submitted 21 June, 2020; originally announced June 2020.

arXiv:2006.00719 [pdf, other]

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Authors: Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney

Abstract: We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order m… ▽ More We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order methods is their heavier per iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of Adam. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of ADAHESSIAN is comparable to first order methods, and that it exhibits robustness towards its hyperparameters. △ Less

Submitted 28 April, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Journal ref: AAAI 2021

arXiv:2005.01463 [pdf, other]

MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework

Authors: Chiyu Max Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, Karthik Kashinath, Mustafa Mustafa, Hamdi A. Tchelepi, Philip Marcus, Prabhat, Anima Anandkumar

Abstract: We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Par… ▽ More We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MeshfreeFlowNet on the task of super-resolution of turbulent flows in the Rayleigh-Benard convection problem. Across a diverse set of evaluation metrics, we show that MeshfreeFlowNet significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MeshfreeFlowNet and show that it efficiently scales across large clusters, achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of less than 4 minutes. △ Less

Submitted 21 August, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: Supplementary Video: https://youtu.be/mjqwPch9gDo. Accepted to SC20

arXiv:2004.12264 [pdf, other]

A novel encryption algorithm using multiple semifield S-boxes based on permutation of symmetric group

Authors: Iqtadar Hussain, Amir Anees, Temadher Alassiry Al-Maadeed, M. T. Mustafa

Abstract: With the tremendous benefits of internet and advanced communications, there is a serious threat from the data security perspective. There is a need of secure and robust encryption algorithm that can be implemented on each and diverse software and hardware platforms. Also, in block symmetric encryption algorithms, substitution boxes are the most vital part. In this paper, we investigate semifield s… ▽ More With the tremendous benefits of internet and advanced communications, there is a serious threat from the data security perspective. There is a need of secure and robust encryption algorithm that can be implemented on each and diverse software and hardware platforms. Also, in block symmetric encryption algorithms, substitution boxes are the most vital part. In this paper, we investigate semifield substitution boxes using permutation of symmetric group on a set of size 8 S_8 and establish an effective procedure for generating S_8 semifield substitution boxes having same algebraic properties. Further, the strength analysis of the generated substitution boxes is carried out using the well-known standards namely bijectivity, nonlinearity, strict avalanche criterion, bit independence criterion, XOR table and differential invariant. Based on the analysis results, it is shown that the cryptographic strength of generated substitution boxes is on par with the best known $8\times 8$ substitution boxes. As application, an encryption algorithm is proposed that can be employed to strengthen any kind of secure communication. The presented algorithm is mainly based on the Shannon idea of (S-P) network where the process of substitution is performed by the proposed S_8 semifield substitution boxes and permutation operation is performed by the binary cyclic shift of substitution box transformed data. In addition, the proposed encryption algorithm utilizes two different chaotic maps. In order to ensure the appropriate utilization of these chaotic maps, we carry out in-depth analyses of their behavior in the context of secure communication and apply the pseudo-random sequences of chaotic maps in the proposed image encryption algorithm accordingly. The statistical and simulation results imply that our encryption scheme is secure against different attacks and can resist linear and differential cryptanalysis. △ Less

Submitted 25 April, 2020; originally announced April 2020.

Comments: 23 pages

MSC Class: 94A60; 12K10

arXiv:2001.09837 [pdf, other]

Verifying Software Vulnerabilities in IoT Cryptographic Protocols

Authors: Fatimah Aljaafari, Lucas C. Cordeiro, Mustafa A. Mustafa

Abstract: Internet of Things (IoT) is a system that consists of a large number of smart devices connected through a network. The number of these devices is increasing rapidly, which creates a massive and complex network with a vast amount of data communicated over that network. One way to protect this data in transit, i.e., to achieve data confidentiality, is to use lightweight encryption algorithms for IoT… ▽ More Internet of Things (IoT) is a system that consists of a large number of smart devices connected through a network. The number of these devices is increasing rapidly, which creates a massive and complex network with a vast amount of data communicated over that network. One way to protect this data in transit, i.e., to achieve data confidentiality, is to use lightweight encryption algorithms for IoT protocols. However, the design and implementation of such protocols is an error-prone task; flaws in the implementation can lead to devastating security vulnerabilities. These vulnerabilities can be exploited by an attacker and affect users' privacy. There exist various techniques to verify software and detect vulnerabilities. Bounded Model Checking (BMC) and Fuzzing are useful techniques to check the correctness of a software system concerning its specifications. Here we describe a framework called Encryption-BMC and Fuzzing (EBF) using combined BMC and fuzzing techniques. We evaluate the application of EBF verification framework on a case study, i.e., the S-MQTT protocol, to check security vulnerabilities in cryptographic protocols for IoT. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.05707 [pdf]

Attack based DoS attack detection using multiple classifier

Authors: Mohamed Abushwereb, Muhannad Mustafa, Mouhammd Al-kasassbeh, Malik Qasaimeh

Abstract: One of the most common internet attacks causing significant economic losses in recent years is the Denial of Service (DoS) flooding attack. As a countermeasure, intrusion detection systems equipped with machine learning classification algorithms were developed to detect anomalies in network traffic. These classification algorithms had varying degrees of success, depending on the type of DoS attack… ▽ More One of the most common internet attacks causing significant economic losses in recent years is the Denial of Service (DoS) flooding attack. As a countermeasure, intrusion detection systems equipped with machine learning classification algorithms were developed to detect anomalies in network traffic. These classification algorithms had varying degrees of success, depending on the type of DoS attack used. In this paper, we use an SNMP-MIB dataset from real testbed to explore the most prominent DoS attacks and the chances of their detection based on the classification algorithm used. The results show that most DOS attacks used nowadays can be detected with high accuracy using machine learning classification techniques based on features provided by SNMP-MIB. We also conclude that of all the attacks we studied, the Slowloris attack had the highest detection rate, on the other hand TCP-SYN had the lowest detection rate throughout all classification techniques, despite being one of the most used DoS attacks. △ Less

Submitted 16 January, 2020; originally announced January 2020.

arXiv:1911.08655 [pdf, other]

Towards Physics-informed Deep Learning for Turbulent Flow Prediction

Authors: Rui Wang, Karthik Kashinath, Mustafa Mustafa, Adrian Albert, Rose Yu

Abstract: While deep learning has shown tremendous success in a wide range of domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this paper, we aim to predict turbulent flow by learning its highly nonlinear dynamics from spatiotemporal velocity fields of large-scale fluid flow simulations of relevance to t… ▽ More While deep learning has shown tremendous success in a wide range of domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this paper, we aim to predict turbulent flow by learning its highly nonlinear dynamics from spatiotemporal velocity fields of large-scale fluid flow simulations of relevance to turbulence modeling and climate modeling. We adopt a hybrid approach by marrying two well-established turbulent flow simulation techniques with deep learning. Specifically, we introduce trainable spectral filters in a coupled model of Reynolds-averaged Navier-Stokes (RANS) and Large Eddy Simulation (LES), followed by a specialized U-net for prediction. Our approach, which we call turbulent-Flow Net (TF-Net), is grounded in a principled physics model, yet offers the flexibility of learned representations. We compare our model, TF-Net, with state-of-the-art baselines and observe significant reductions in error for predictions 60 frames ahead. Most importantly, our method predicts physical fields that obey desirable physical characteristics, such as conservation of mass, whilst faithfully emulating the turbulent kinetic energy field and spectrum, which are critical for accurate prediction of turbulent flows. △ Less

Submitted 13 June, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

arXiv:1908.09168 [pdf, other]

A Novel Method to Generate Key-Dependent S-Boxes with Identical Algebraic Properties

Authors: Ahmad Y. Al-Dweik, Iqtadar Hussain, Moutaz S. Saleh, M. T. Mustafa

Abstract: The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is s… ▽ More The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is sufficiently secure, they do not keep non-linearity invariant. This work provides an algorithmic scheme to generate key-dependent dynamic $n\times n$ clone s-boxes having the same algebraic properties namely bijection, nonlinearity, the strict avalanche criterion (SAC), the output bits independence criterion (BIC) as of the initial seed s-box. The method is based on group action of symmetric group $S_n$ and a subgroup $S_{2^n}$ respectively on columns and rows of Boolean functions ($GF(2^n)\to GF(2)$) of s-box. Invariance of the bijection, nonlinearity, SAC, and BIC for the generated clone copies is proved. As illustration, examples are provided for $n=8$ and $n=4$ along with comparison of the algebraic properties of the clone and initial seed s-box. The proposed method is an extension of [3,4,5,6] which involved group action of $S_8$ only on columns of Boolean functions ($GF(2^8)\to GF(2)$ ) of s-box. For $n=4$, we have used an initial $4\times 4$ s-box constructed by Carlisle Adams and Stafford Tavares [7] to generated $(4!)^2$ clone copies. For $n=8$, it can be seen [3,4,5,6] that the number of clone copies that can be constructed by permuting the columns is $8!$. For each column permutation, the proposed method enables to generate $8!$ clone copies by permuting the rows. △ Less

Submitted 3 May, 2021; v1 submitted 24 August, 2019; originally announced August 2019.

arXiv:1906.11488 [pdf, other]

Finding Security Vulnerabilities in Unmanned Aerial Vehicles Using Software Verification

Authors: Omar M. Alhawi, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: The proliferation of Unmanned Aerial Vehicles (UAVs) embedded with vulnerable monolithic software has recently raised serious concerns about their security due to concurrency aspects and fragile communication links. However, verifying security in UAV software based on traditional testing remains an open challenge mainly due to scalability and deployment issues. Here we investigate software verific… ▽ More The proliferation of Unmanned Aerial Vehicles (UAVs) embedded with vulnerable monolithic software has recently raised serious concerns about their security due to concurrency aspects and fragile communication links. However, verifying security in UAV software based on traditional testing remains an open challenge mainly due to scalability and deployment issues. Here we investigate software verification techniques to detect security vulnerabilities in typical UAVs. In particular, we investigate existing software analyzers and verifiers, which implement fuzzing and bounded model checking (BMC) techniques, to detect memory safety and concurrency errors. We also investigate fragility aspects related to the UAV communication link. All UAV components (e.g., position, velocity, and attitude control) heavily depend on the communication link. Our preliminary results show that fuzzing and BMC techniques can detect various software vulnerabilities, which are of particular interest to ensure security in UAVs. We were able to perform successful cyber-attacks via penetration testing against the UAV both connection and software system. As a result, we demonstrate real cyber-threats with the possibility of exploiting further security vulnerabilities in real-world UAV software in the foreseeable future. △ Less

Submitted 11 October, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: 16 pages, 7 figures, conference

arXiv:1802.07233 [pdf, other]

Frictionless Authentication Systems: Emerging Trends, Research Challenges and Opportunities

Authors: Tim Van hamme, Vera Rimmer, Davy Preuveneers, Wouter Joosen, Mustafa A. Mustafa, Aysajan Abidin, Enrique Argones Rúa

Abstract: Authentication and authorization are critical security layers to protect a wide range of online systems, services and content. However, the increased prevalence of wearable and mobile devices, the expectations of a frictionless experience and the diverse user environments will challenge the way users are authenticated. Consumers demand secure and privacy-aware access from any device, whenever and… ▽ More Authentication and authorization are critical security layers to protect a wide range of online systems, services and content. However, the increased prevalence of wearable and mobile devices, the expectations of a frictionless experience and the diverse user environments will challenge the way users are authenticated. Consumers demand secure and privacy-aware access from any device, whenever and wherever they are, without any obstacles. This paper reviews emerging trends and challenges with frictionless authentication systems and identifies opportunities for further research related to the enrollment of users, the usability of authentication schemes, as well as security and privacy trade-offs of mobile and wearable continuous authentication systems. △ Less

Submitted 20 February, 2018; originally announced February 2018.

Comments: published at the 11th International Conference on Emerging Security Information, Systems and Technologies (SECURWARE 2017)

arXiv:1802.07231 [pdf, other]

Frictionless Authentication System: Security & Privacy Analysis and Potential Solutions

Authors: Mustafa A. Mustafa, Aysajan Abidin, Enrique Argones Rúa

Abstract: This paper proposes a frictionless authentication system, provides a comprehensive security analysis of and proposes potential solutions for this system. It first presents a system that allows users to authenticate to services in a frictionless manner, i.e., without the need to perform any particular authentication-related actions. Based on this system model, the paper analyses security problems a… ▽ More This paper proposes a frictionless authentication system, provides a comprehensive security analysis of and proposes potential solutions for this system. It first presents a system that allows users to authenticate to services in a frictionless manner, i.e., without the need to perform any particular authentication-related actions. Based on this system model, the paper analyses security problems and potential privacy threats imposed on users, leading to the specification of a set of security and privacy requirements. These requirements can be used as a guidance on designing secure and privacy-friendly frictionless authentication systems. The paper also sketches three potential solutions for such systems and highlights their advantages and disadvantages. △ Less

Submitted 20 February, 2018; originally announced February 2018.

Comments: published at the 11th International Conference on Emerging Security Information, Systems and Technologies (SECURWARE 2017)

arXiv:1801.08354 [pdf, other]

Secure and Privacy-Friendly Local Electricity Trading and Billing in Smart Grid

Authors: Aysajan Abidin, Abdelrahaman Aly, Sara Cleemput, Mustafa A. Mustafa

Abstract: This paper proposes two decentralised, secure and privacy-friendly protocols for local electricity trading and billing, respectively. The trading protocol employs a bidding algorithm based upon secure multiparty computations and allows users to trade their excess electricity among themselves. The bid selection and calculation of the trading price are performed in a decentralised and oblivious mann… ▽ More This paper proposes two decentralised, secure and privacy-friendly protocols for local electricity trading and billing, respectively. The trading protocol employs a bidding algorithm based upon secure multiparty computations and allows users to trade their excess electricity among themselves. The bid selection and calculation of the trading price are performed in a decentralised and oblivious manner. The billing protocol is based on a simple privacy-friendly aggregation technique that allows suppliers to compute their customers' monthly bills without learning their fine-grained electricity consumption data. We also implemented and tested the performance of the trading protocol with realistic data. Our results show that it can be performed for 2500 bids in less than five minutes in the on-line phase, showing its feasibility for a typical electricity trading period of 30 minutes. △ Less

Submitted 25 January, 2018; originally announced January 2018.

arXiv:1801.08353 [pdf, other]

A Secure and Privacy-preserving Protocol for Smart Metering Operational Data Collection

Authors: Mustafa A. Mustafa, Sara Cleemput, Abelrahaman Aly, Aysajan Abidin

Abstract: In this paper we propose a novel protocol that allows suppliers and grid operators to collect users' aggregate metering data in a secure and privacy-preserving manner. We use secure multiparty computation to ensure privacy protection. In addition, we propose three different data aggregation algorithms that offer different balances between privacy-protection and performance. Our protocol is designe… ▽ More In this paper we propose a novel protocol that allows suppliers and grid operators to collect users' aggregate metering data in a secure and privacy-preserving manner. We use secure multiparty computation to ensure privacy protection. In addition, we propose three different data aggregation algorithms that offer different balances between privacy-protection and performance. Our protocol is designed for a realistic scenario in which the data need to be sent to different parties, such as grid operators and suppliers. Furthermore, it facilitates an accurate calculation of transmission, distribution and grid balancing fees in a privacy-preserving manner. We also present a security analysis and a performance evaluation of our protocol based on well known multiparty computation algorithms implemented in C++. △ Less

Submitted 14 March, 2019; v1 submitted 25 January, 2018; originally announced January 2018.

Comments: Accepted for publication at IEEE Transactions on Smart Grid

arXiv:1712.09388 [pdf, other]

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Authors: Amrita Mathuriya, Thorsten Kurth, Vivek Rane, Mustafa Mustafa, Lei Shao, Debbie Bard, Prabhat, Victor W Lee

Abstract: We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with s… ▽ More We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with synchronous SGD. We studied scaling of two convolution neural networks - ResNet-50, a state-of-the-art deep network for classification with roughly 25.5 million parameters, and HEP-CNN, a shallow topology with less than 1 million parameters for common scientific usages. For ResNet-50, we achieve >80% scaling efficiency on up to 128 workers, using 32 parameter servers (PS tasks) with a steep decline down to 23% for 512 workers using 64 PS tasks. Our analysis of the efficiency drop points to low network bandwidth utilization due to combined effect of three factors. (a) Heterogeneous distributed parallelization algorithm which uses PS tasks as centralized servers for gradient averaging is suboptimal for utilizing interconnect bandwidth. (b) Load imbalance among PS tasks hinders their efficient scaling. (c) Underlying communication primitive GRPC is currently inefficient on Cori high-speed interconnect. The HEP-CNN demands less interconnect bandwidth, and shows >80% weak scaling efficiency for up to 256 nodes with only 1 PS task. Our findings are applicable to other deep learning networks. Big networks with millions of parameters stumble upon the issues discussed here. Shallower networks like HEP-CNN with relatively lower number of parameters can efficiently enjoy weak scaling even with a single parameter server. △ Less

Submitted 26 December, 2017; originally announced December 2017.

Comments: Published as a poster in NIPS 2017 Workshop: Deep Learning At Supercomputer Scale

arXiv:1706.02390 [pdf, other]

doi 10.1186/s40668-019-0029-9

CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks

Authors: Mustafa Mustafa, Deborah Bard, Wahid Bhimji, Zarija Lukić, Rami Al-Rfou, Jan M. Kratochvil

Abstract: Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulat… ▽ More Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulators of fully-fledged simulations. These generative models have the potential to make a dramatic shift in the field of scientific simulations, but for that shift to happen we need to study the performance of such generators in the precision regime needed for science applications. To this end, in this work we apply Generative Adversarial Networks to the problem of generating weak lensing convergence maps. We show that our generator network produces maps that are described by, with high statistical confidence, the same summary statistics as the fully simulated maps. △ Less

Submitted 22 May, 2019; v1 submitted 7 June, 2017; originally announced June 2017.

Comments: 11 pages, 8 figures

Journal ref: Computational Astrophysics and CosmologySimulations, Data Analysis and Algorithms 2019 6:1

arXiv:1407.0080 [pdf]

doi 10.1007/978-3-642-33515-0_39

Velocity Selection for High-Speed UGVs in Rough Unknown Terrains using Force Prediction

Authors: Graeme N. Wilson, Alejandro Ramirez-Serrano, Mahmoud Mustafa, Krispin A. Davies

Abstract: Enabling high speed navigation of Unmanned Ground Vehicles (UGVs) in unknown rough terrain where limited or no information is available in advance requires the assessment of terrain in front of the UGV. Attempts have been made to predict the forces the terrain exerts on the UGV for the purpose of determining the maximum allowable velocity for a given terrain. However, current methods produce overl… ▽ More Enabling high speed navigation of Unmanned Ground Vehicles (UGVs) in unknown rough terrain where limited or no information is available in advance requires the assessment of terrain in front of the UGV. Attempts have been made to predict the forces the terrain exerts on the UGV for the purpose of determining the maximum allowable velocity for a given terrain. However, current methods produce overly aggressive velocity profiles which could damage the UGV. This paper presents three novel safer methods of force prediction that produce effective velocity profiles. Two models, Instantaneous Elevation Change Model (IECM) and Sinusoidal Base Excitation Model: using Excitation Force (SBEM:EF), predict the forces exerted by the terrain on the vehicle at the ground contact point, while another method, Sinusoidal Base Excitation Model: using Transmitted Force (SBEM:TF), predicts the forces transmitted to the vehicle frame by the suspension. △ Less

Submitted 30 June, 2014; originally announced July 2014.

Comments: 10 pages, 6 figures, Proceedings of 5th International Conference on Intelligent Robotics and Applications, Concordia University, October 3-5, 2012, Montreal, Canada

Journal ref: 5th International Conference, ICIRA 2012, Montreal, Canada, October 3-5, 2012, Proceedings, Part II. 7507: 387-396

arXiv:1208.2376 [pdf, ps, other]

doi 10.1109/BWCCA.2012.73

Analytical Survey of Wearable Sensors

Authors: A. Rehman, M. Mustafa, N. Javaid, U. Qasim, Z. A. Khan

Abstract: Wearable sensors inWireless Body Area Networks (WBANs) provide health and physical activity monitoring. Modern communication systems have extended this monitoring remotely. In this survey, various types of wearable sensors discussed, their medical applications like ECG, EEG, blood pressure, detection of blood glucose level, pulse rate, respiration rate and non medical applications like daily exerc… ▽ More Wearable sensors inWireless Body Area Networks (WBANs) provide health and physical activity monitoring. Modern communication systems have extended this monitoring remotely. In this survey, various types of wearable sensors discussed, their medical applications like ECG, EEG, blood pressure, detection of blood glucose level, pulse rate, respiration rate and non medical applications like daily exercise monitoring and motion detection of different body parts. Different types of noise removing filters also discussed at the end that are helpful in to remove noise from ECG signals. Main purpose of this survey is to provide a platform for researchers in wearable sensors for WBANs. △ Less

Submitted 11 August, 2012; originally announced August 2012.

Comments: BioSPAN with 7th IEEE International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA 2012), Victoria, Canada, 2012

arXiv:1002.4831 [pdf]

On Analysis and Evaluation of Multi-Sensory Cognitive Learning of a Mathematical Topic Using Artificial Neural Networks

Authors: F. A. Al-Zahrani, H. M. Mustafa, A. Al-Hamadi

Abstract: This piece of research belongs to the field of educational assessment issue based upon the cognitive multimedia theory. Considering that theory; visual and auditory material should be presented simultaneously to reinforce the retention of a mathematical learned topic, a carefully computer-assisted learning (CAL) module is designed for development of a multimedia tutorial for our suggested mathem… ▽ More This piece of research belongs to the field of educational assessment issue based upon the cognitive multimedia theory. Considering that theory; visual and auditory material should be presented simultaneously to reinforce the retention of a mathematical learned topic, a carefully computer-assisted learning (CAL) module is designed for development of a multimedia tutorial for our suggested mathematical topic. The designed CAL module is a multimedia tutorial computer package with visual and/or auditory material. So, via suggested computer package, Multi-Sensory associative memories and classical conditioning theories are practically applicable at an educational field (a children classroom). It is noticed that comparative practical results obtained are interesting for field application of CAL package with and without associated teacher's voice. Finally, the presented study highly recommends application of a novel teaching trend aiming to improve quality of children mathematical learning performance. △ Less

Submitted 25 February, 2010; originally announced February 2010.

Comments: Journal of Telecommunications,Volume 1, Issue 1, pp99-104, February 2010

Showing 1–42 of 42 results for author: Mustafa, M