Search | arXiv e-print repository

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark. △ Less

Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2311.18041 [pdf, other]

Zero-shot Conversational Summarization Evaluations with small Large Language Models

Authors: Ramesh Manuvinakurike, Saurav Sahay, Sangeeta Manepalli, Lama Nachman

Abstract: Large Language Models (LLMs) exhibit powerful summarization abilities. However, their capabilities on conversational summarization remains under explored. In this work we evaluate LLMs (approx. 10 billion parameters) on conversational summarization and showcase their performance on various prompts. We show that the summaries generated by models depend on the instructions and the performance of LLM… ▽ More Large Language Models (LLMs) exhibit powerful summarization abilities. However, their capabilities on conversational summarization remains under explored. In this work we evaluate LLMs (approx. 10 billion parameters) on conversational summarization and showcase their performance on various prompts. We show that the summaries generated by models depend on the instructions and the performance of LLMs vary with different instructions sometimes resulting steep drop in ROUGE scores if prompts are not selected carefully. We also evaluate the models with human evaluations and discuss the limitations of the models on conversational summarization △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted at RoF0Mo workshop at Neurips 2023

arXiv:2311.09173 [pdf]

doi 0.17705/1jais.00816

Design Theory for Societal Digital Transformation: The Case of Digital Global Health

Authors: Jorn Braa, Sundeep Sahay, Eric Monteiro

Abstract: With societal challenges, including but not limited to human development, equity, social justice, and climate change, societal-level digital transformation (SDT) is of imminent relevance and theoretical interest. While building on local-level efforts, societal-level transformation is a nonlinear extension of the local level. Unfortunately, academic discourse on digital transformation has largely l… ▽ More With societal challenges, including but not limited to human development, equity, social justice, and climate change, societal-level digital transformation (SDT) is of imminent relevance and theoretical interest. While building on local-level efforts, societal-level transformation is a nonlinear extension of the local level. Unfortunately, academic discourse on digital transformation has largely left SDT unaccounted for. Drawing on more than 25 years of intensive, interventionist research engagement with the digital transformation of public healthcare information management and delivery in more than 80 countries in the Global South, we contribute to theorizing SDT in the form of a design theory consisting of six interconnected design principles. These design principles articulate the interplay and tensions of accommodating over time increased diversity and flexibility in digital solutions, while simultaneously connecting local, national, and regional/ global efforts. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Journal ref: Journal of the AIS, 24(6), 2023

arXiv:2310.11079 [pdf, other]

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Authors: Hsuan Su, Cheng-Chu Cheng, Hua Farn, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Abstract: Recently, researchers have made considerable improvements in dialogue systems with the progress of large language models (LLMs) such as ChatGPT and GPT-4. These LLM-based chatbots encode the potential biases while retaining disparities that can harm humans during interactions. The traditional biases investigation methods often rely on human-written test cases. However, these test cases are usually… ▽ More Recently, researchers have made considerable improvements in dialogue systems with the progress of large language models (LLMs) such as ChatGPT and GPT-4. These LLM-based chatbots encode the potential biases while retaining disparities that can harm humans during interactions. The traditional biases investigation methods often rely on human-written test cases. However, these test cases are usually expensive and limited. In this work, we propose a first-of-its-kind method that automatically generates test cases to detect LLMs' potential gender bias. We apply our method to three well-known LLMs and find that the generated test cases effectively identify the presence of biases. To address the biases identified, we propose a mitigation strategy that uses the generated test cases as demonstrations for in-context learning to circumvent the need for parameter fine-tuning. The experimental results show that LLMs generate fairer responses with the proposed approach. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2306.00482 [pdf, other]

Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

Authors: Eda Okur, Roddy Fuentes Alba, Saurav Sahay, Lama Nachman

Abstract: Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understandin… ▽ More Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities. We validate the advantages of a multi-task architecture for NLU and experiment with a diverse set of pretrained language representations for Intent Recognition and Entity Extraction tasks in the math learning domain. To recognize kids' speech in realistic home environments, we investigate several ASR systems, including the commercial Google Cloud and the latest open-source Whisper solutions with varying model sizes. We evaluate the SLU pipeline by testing our best-performing NLU models on noisy ASR output to inspect the challenges of understanding children for math learning in authentic homes. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at ACL 2023

arXiv:2303.04361 [pdf, other]

Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

Authors: Sumanta Bhattacharyya, Ramesh Manuvinakurike, Sahisnu Mazumder, Saurav Sahay

Abstract: In this work, we develop a prompting approach for incremental summarization of task videos. We develop a sample-efficient few-shot approach for extracting semantic concepts as an intermediate step. We leverage an existing model for extracting the concepts from the images and extend it to videos and introduce a clustering and querying approach for sample efficiency, motivated by the recent advances… ▽ More In this work, we develop a prompting approach for incremental summarization of task videos. We develop a sample-efficient few-shot approach for extracting semantic concepts as an intermediate step. We leverage an existing model for extracting the concepts from the images and extend it to videos and introduce a clustering and querying approach for sample efficiency, motivated by the recent advances in perceiver-based architectures. Our work provides further evidence that an approach with richer input context with relevant entities and actions from the videos and using these as prompts could enhance the summaries generated by the model. We show the results on a relevant dataset and discuss possible directions for the work. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.05888 [pdf, other]

Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Authors: Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-yi Lee

Abstract: With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models impl… ▽ More With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems' responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2212.01032 [pdf, other]

Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning

Authors: Shih-Cheng Huang, Shih-Heng Wang, Min-Han Shih, Saurav Sahay, Hung-yi Lee

Abstract: Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization. To tackle these issues, we propose a general PE priming framework… ▽ More Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization. To tackle these issues, we propose a general PE priming framework to enhance and explore the few-shot adaptation and generalization ability of PE methods. In this framework, PLMs are primed with PE methods for rapidly adapting to various target tasks. To evaluate the generalization ability of these PE methods, we conduct experiments on a few-shot cross-domain benchmark containing 160 diverse NLP tasks. Our experiment not only reveals the best priming strategy but also verifies that priming facilitates the adaptation to target tasks. △ Less

Submitted 30 May, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

arXiv:2211.03511 [pdf, other]

End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics

Authors: Eda Okur, Saurav Sahay, Roddy Fuentes Alba, Lama Nachman

Abstract: The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to… ▽ More The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to a point where they could play a significant role in hel** students learn fundamental math concepts. This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions. We discuss our efforts to improve the SDS pipeline built for math learning, for which we explore utilizing MathBERT representations for potential enhancement to the Natural Language Understanding (NLU) module. We perform an end-to-end evaluation using real-world deployment outputs from the Automatic Speech Recognition (ASR), Intent Recognition, and Dialogue Manager (DM) components to understand how error propagation affects the overall performance in real-world scenarios. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: Proceedings of the 1st Workshop on Mathematical Natural Language Processing (MathNLP) at EMNLP 2022

arXiv:2211.01824 [pdf, other]

Human in the loop approaches in multi-modal conversational task guidance system development

Authors: Ramesh Manuvinakurike, Sovan Biswas, Giuseppe Raffa, Richard Beckwith, Anthony Rhodes, Meng Shi, Gesem Gudino Mejia, Saurav Sahay, Lama Nachman

Abstract: Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervi… ▽ More Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervised approaches fail to deliver the expected results in terms of overall performance, user experience and adaptation to realistic conditions. In this preliminary work we first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations. We finally develop a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot experiment. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: SCAI @ SIGIR

arXiv:2206.03931 [pdf, other]

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

Authors: Hsuan Su, Pohan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Abstract: Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explore… ▽ More Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model. △ Less

Submitted 13 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

arXiv:2206.02733 [pdf, other]

doi 10.1007/978-3-030-97532-6_4

Deep Reinforcement Learning for Cybersecurity Threat Detection and Protection: A Review

Authors: Mohit Sewak, Sanjay K. Sahay, Hemant Rathore

Abstract: The cybersecurity threat landscape has lately become overly complex. Threat actors leverage weaknesses in the network and endpoint security in a very coordinated manner to perpetuate sophisticated attacks that could bring down the entire network and many critical hosts in the network. Increasingly advanced deep and machine learning-based solutions have been used in threat detection and protection.… ▽ More The cybersecurity threat landscape has lately become overly complex. Threat actors leverage weaknesses in the network and endpoint security in a very coordinated manner to perpetuate sophisticated attacks that could bring down the entire network and many critical hosts in the network. Increasingly advanced deep and machine learning-based solutions have been used in threat detection and protection. The application of these techniques has been reviewed well in the scientific literature. Deep Reinforcement Learning has shown great promise in develo** AI-based solutions for areas that had earlier required advanced human cognizance. Different techniques and algorithms under deep reinforcement learning have shown great promise in applications ranging from games to industrial processes, where it is claimed to augment systems with general AI capabilities. These algorithms have recently also been used in cybersecurity, especially in threat detection and endpoint protection, where these are showing state-of-the-art results. Unlike supervised machines and deep learning, deep reinforcement learning is used in more diverse ways and is empowering many innovative applications in the threat defense landscape. However, there does not exist any comprehensive review of these unique applications and accomplishments. Therefore, in this paper, we intend to fill this gap and provide a comprehensive review of the different applications of deep reinforcement learning in cybersecurity threat detection and protection. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Journal ref: International Conference On Secure Knowledge Management In Artificial Intelligence Era. Springer, Cham, 2021

arXiv:2205.13754 [pdf, other]

NLU for Game-based Learning in Real: Initial Evaluations

Authors: Eda Okur, Saurav Sahay, Lama Nachman

Abstract: Intelligent systems designed for play-based interactions should be contextually aware of the users and their surroundings. Spoken Dialogue Systems (SDS) are critical for these interactive agents to carry out effective goal-oriented communication with users in real-time. For the real-world (i.e., in-the-wild) deployment of such conversational agents, improving the Natural Language Understanding (NL… ▽ More Intelligent systems designed for play-based interactions should be contextually aware of the users and their surroundings. Spoken Dialogue Systems (SDS) are critical for these interactive agents to carry out effective goal-oriented communication with users in real-time. For the real-world (i.e., in-the-wild) deployment of such conversational agents, improving the Natural Language Understanding (NLU) module of the goal-oriented SDS pipeline is crucial, especially with limited task-specific datasets. This study explores the potential benefits of a recently proposed transformer-based multi-task NLU architecture, mainly to perform Intent Recognition on small-size domain-specific educational game datasets. The evaluation datasets were collected from children practicing basic math concepts via play-based interactions in game-based learning settings. We investigate the NLU performances on the initial proof-of-concept game datasets versus the real-world deployment datasets and observe anticipated performance drops in-the-wild. We have shown that compared to the more straightforward baseline approaches, Dual Intent and Entity Transformer (DIET) architecture is robust enough to handle real-world data to a large extent for the Intent Recognition task on these domain-specific in-the-wild game datasets. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: Proceedings of the Games and Natural Language Processing Workshop at LREC 2022

arXiv:2205.04006 [pdf, other]

Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System

Authors: Eda Okur, Saurav Sahay, Lama Nachman

Abstract: Contextually aware intelligent agents are often required to understand the users and their surroundings in real-time. Our goal is to build Artificial Intelligence (AI) systems that can assist children in their learning process. Within such complex frameworks, Spoken Dialogue Systems (SDS) are crucial building blocks to handle efficient task-oriented communication with children in game-based learni… ▽ More Contextually aware intelligent agents are often required to understand the users and their surroundings in real-time. Our goal is to build Artificial Intelligence (AI) systems that can assist children in their learning process. Within such complex frameworks, Spoken Dialogue Systems (SDS) are crucial building blocks to handle efficient task-oriented communication with children in game-based learning settings. We are working towards a multimodal dialogue system for younger kids learning basic math concepts. Our focus is on improving the Natural Language Understanding (NLU) module of the task-oriented SDS pipeline with limited datasets. This work explores the potential benefits of data augmentation with paraphrase generation for the NLU models trained on small task-specific datasets. We also investigate the effects of extracting entities for conceivably further data expansion. We have shown that paraphrasing with model-in-the-loop (MITL) strategies using small seed data is a promising approach yielding improved performance results for the Intent Recognition task. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022)

arXiv:2203.07657 [pdf, other]

Seamlessly Integrating Factual Information and Social Content with Persuasive Dialogue

Authors: Maximillian Chen, Weiyan Shi, Feifan Yan, Ryan Hou, **gwen Zhang, Saurav Sahay, Zhou Yu

Abstract: Complex conversation settings such as persuasion involve communicating changes in attitude or behavior, so users' perspectives need to be addressed, even when not directly related to the topic. In this work, we contribute a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue. Our framework is generalizable to any dialog… ▽ More Complex conversation settings such as persuasion involve communicating changes in attitude or behavior, so users' perspectives need to be addressed, even when not directly related to the topic. In this work, we contribute a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue. Our framework is generalizable to any dialogue tasks that have mixed social and task contents. We conducted a study that compared user evaluations of our framework versus a baseline end-to-end generation model. We found our framework was evaluated more favorably in all dimensions including competence and friendliness, compared to the end-to-end model which does not explicitly handle social content or factual questions. △ Less

Submitted 23 September, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: To appear in Proceedings of AACL-IJCNLP 2022; 16 pages, 4 figures, 7 tables

arXiv:2112.02246 [pdf, other]

Controllable Response Generation for Assistive Use-cases

Authors: Shachi H Kumar, Hsuan Su, Ramesh Manuvinakurike, Saurav Sahay, Lama Nachman

Abstract: Conversational agents have become an integral part of the general population for simple task enabling situations. However, these systems are yet to have any social impact on the diverse and minority population, for example, hel** people with neurological disorders, for example ALS, and people with speech, language and social communication disorders. Language model technology can play a huge role… ▽ More Conversational agents have become an integral part of the general population for simple task enabling situations. However, these systems are yet to have any social impact on the diverse and minority population, for example, hel** people with neurological disorders, for example ALS, and people with speech, language and social communication disorders. Language model technology can play a huge role to help these users carry out daily communication and social interactions. To enable this population, we build a dialog system that can be controlled by users using cues or keywords. We build models that can suggest relevant cues in the dialog response context which is used to control response generation and can speed up communication. We also introduce a keyword loss to lexically constrain the model output. We show both qualitatively and quantitatively that our models can effectively induce the keyword into the model response without degrading the quality of response. In the context of usage of such systems for people with degenerative disorders, we present human evaluation of our cue or keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control. Our study shows that keyword control on end to end response generation models is powerful and can enable and empower users with degenerative disorders to carry out their day to day communication. △ Less

Submitted 4 December, 2021; originally announced December 2021.

arXiv:2111.14484 [pdf, other]

Energy-Efficient Implementation of Generative Adversarial Networks on Passive RRAM Crossbar Arrays

Authors: Siddharth Satyam, Honey Nikam, Shubham Sahay

Abstract: Generative algorithms such as GANs are at the cusp of next revolution in the field of unsupervised learning and large-scale artificial data generation. However, the adversarial (competitive) co-training of the discriminative and generative networks in GAN makes them computationally intensive and hinders their deployment on the resource-constrained IoT edge devices. Moreover, the frequent data tran… ▽ More Generative algorithms such as GANs are at the cusp of next revolution in the field of unsupervised learning and large-scale artificial data generation. However, the adversarial (competitive) co-training of the discriminative and generative networks in GAN makes them computationally intensive and hinders their deployment on the resource-constrained IoT edge devices. Moreover, the frequent data transfer between the discriminative and generative networks during training significantly degrades the efficacy of the von-Neumann GAN accelerators such as those based on GPU and FPGA. Therefore, there is an urgent need for development of ultra-compact and energy-efficient hardware accelerators for GANs. To this end, in this work, we propose to exploit the passive RRAM crossbar arrays for performing key operations of a fully-connected GAN: (a) true random noise generation for the generator network, (b) vector-by-matrix-multiplication with unprecedented energy-efficiency during the forward pass and backward propagation and (C) in-situ adversarial training using a hardware friendly Manhattan's rule. Our extensive analysis utilizing an experimentally calibrated phenomological model for passive RRAM crossbar array reveals an unforeseen trade-off between the accuracy and the energy dissipated while training the GAN network with different noise inputs to the generator. Furthermore, our results indicate that the spatial and temporal variations and true random noise, which are otherwise undesirable for memory application, boost the energy-efficiency of the GAN implementation on passive RRAM crossbar arrays without degrading its accuracy. △ Less

Submitted 19 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.04588 [pdf, other]

doi 10.1109/TED.2021.3133197

Long Short-Term Memory Implementation Exploiting Passive RRAM Crossbar Array

Authors: Honey Nikam, Siddharth Satyam, Shubham Sahay

Abstract: The ever-increasing demand to extract temporal correlations across sequential data and perform context-based learning in this era of big data has led to the development of long short-term memory (LSTM) networks. Furthermore, there is an urgent need to perform these time-series data-dependent applications including speech/video processing and recognition, language modelling and translation, etc. on… ▽ More The ever-increasing demand to extract temporal correlations across sequential data and perform context-based learning in this era of big data has led to the development of long short-term memory (LSTM) networks. Furthermore, there is an urgent need to perform these time-series data-dependent applications including speech/video processing and recognition, language modelling and translation, etc. on compact internet-of-things (IoT) edge devices with limited energy. To this end, in this work, for the first time, we propose an extremely area- and energy-efficient LSTM network implementation exploiting the passive resistive random access memory (RRAM) crossbar array. We developed a hardware-aware LSTM network simulation framework and performed an extensive analysis of the proposed LSTM implementation considering the non-ideal hardware artifacts such as spatial (device-to-device) and temporal variations, non-linearity, noise, etc. utilizing an experimentally calibrated comprehensive phenomenological model for passive RRAM crossbar array. Our results indicate that the proposed passive RRAM crossbar-based LSTM network implementation not only outperforms the prior digital and active 1T-1R crossbar-based LSTM implementations by more than three orders of magnitude in terms of area and two orders of magnitude in terms of training energy for identical network accuracy, but also exhibits robustness against spatial and temporal variations and noise, and a faster convergence rate. Our work may provide the incentive for experimental realization of LSTM networks on passive RRAM crossbar arrays. △ Less

Submitted 8 November, 2021; originally announced November 2021.

arXiv:2110.09654 [pdf, other]

Privacy-Preserving Mutual Authentication and Key Agreement Scheme for Multi-Server Healthcare System

Authors: Trupil Limbasiya, Sanjay K. Sahay, Bharath Sridharan

Abstract: The usage of different technologies and smart devices helps people to get medical services remotely for multiple benefits. Thus, critical and sensitive data is exchanged between a user and a doctor. When health data is transmitted over a common channel, it becomes essential to preserve various privacy and security properties in the system. Further, the number of users for remote services is increa… ▽ More The usage of different technologies and smart devices helps people to get medical services remotely for multiple benefits. Thus, critical and sensitive data is exchanged between a user and a doctor. When health data is transmitted over a common channel, it becomes essential to preserve various privacy and security properties in the system. Further, the number of users for remote services is increasing day-by-day exponentially, and thus, it is not adequate to deal with all users using the one server due to the verification overhead, server failure, and scalability issues. Thus, researchers proposed various authentication protocols for multi-server architecture, but most of them are vulnerable to different security attacks and require high computational resources during the implementation. To Tackle privacy and security issues using less computational resources, we propose a privacy-preserving mutual authentication and key agreement protocol for a multi-server healthcare system. We discuss the proposed scheme's security analysis and performance results to understand its security strengths and the computational resource requirement, respectively. Further, we do the comparison of security and performance results with recent relevant authentication protocols. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 22 Pages

Journal ref: Information Systems Frontiers, Vol. 23, No. 4, p. 835, 2021

arXiv:2109.11542 [pdf, other]

doi 10.1109/IJCNN52387.2021.9534016

ADVERSARIALuscator: An Adversarial-DRL Based Obfuscator and Metamorphic Malware SwarmGenerator

Authors: Mohit Sewak, Sanjay K. Sahay, Hemant Rathore

Abstract: Advanced metamorphic malware and ransomware, by using obfuscation, could alter their internal structure with every attack. If such malware could intrude even into any of the IoT networks, then even if the original malware instance gets detected, by that time it can still infect the entire network. It is challenging to obtain training data for such evasive malware. Therefore, in this paper, we pres… ▽ More Advanced metamorphic malware and ransomware, by using obfuscation, could alter their internal structure with every attack. If such malware could intrude even into any of the IoT networks, then even if the original malware instance gets detected, by that time it can still infect the entire network. It is challenging to obtain training data for such evasive malware. Therefore, in this paper, we present ADVERSARIALuscator, a novel system that uses specialized Adversarial-DRL to obfuscate malware at the opcode level and create multiple metamorphic instances of the same. To the best of our knowledge, ADVERSARIALuscator is the first-ever system that adopts the Markov Decision Process-based approach to convert and find a solution to the problem of creating individual obfuscations at the opcode level. This is important as the machine language level is the least at which functionality could be preserved so as to mimic an actual attack effectively. ADVERSARIALuscator is also the first-ever system to use efficient continuous action control capable of deep reinforcement learning agents like the Proximal Policy Optimization in the area of cyber security. Experimental results indicate that ADVERSARIALuscator could raise the metamorphic probability of a corpus of malware by >0.45. Additionally, more than 33% of metamorphic instances generated by ADVERSARIALuscator were able to evade the most potent IDS. If such malware could intrude even into any of the IoT networks, then even if the original malware instance gets detected, by that time it can still infect the entire network. Hence ADVERSARIALuscator could be used to generate data representative of a swarm of very potent and coordinated AI-based metamorphic malware attacks. The so generated data and simulations could be used to bolster the defenses of an IDS against an actual AI-based metamorphic attack from advanced malware and ransomware. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Journal ref: 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-9

arXiv:2109.11500 [pdf, other]

doi 10.1109/IJCNN52387.2021.9533323

LSTM Hyper-Parameter Selection for Malware Detection: Interaction Effects and Hierarchical Selection Approach

Authors: Mohit Sewak, Sanjay K. Sahay, Hemant Rathore

Abstract: Long-Short-Term-Memory (LSTM) networks have shown great promise in artificial intelligence (AI) based language modeling. Recently, LSTM networks have also become popular for designing AI-based Intrusion Detection Systems (IDS). However, its applicability in IDS is studied largely in the default settings as used in language models. Whereas security applications offer distinct conditions and hence w… ▽ More Long-Short-Term-Memory (LSTM) networks have shown great promise in artificial intelligence (AI) based language modeling. Recently, LSTM networks have also become popular for designing AI-based Intrusion Detection Systems (IDS). However, its applicability in IDS is studied largely in the default settings as used in language models. Whereas security applications offer distinct conditions and hence warrant careful consideration while applying such recurrent networks. Therefore, we conducted one of the most exhaustive works on LSTM hyper-parameters for IDS and experimented with approx. 150 LSTM configurations to determine its hyper-parameters relative importance, interaction effects, and optimal selection approach for designing an IDS. We conducted multiple analyses of the results of these experiments and empirically controlled for the interaction effects of different hyper-parameters covariate levels. We found that for security applications, especially for designing an IDS, neither similar relative importance as applicable to language models is valid, nor is the standard linear method for hyper-parameter selection ideal. We ascertained that the interaction effect plays a crucial role in determining the relative importance of hyper-parameters. We also discovered that after controlling for the interaction effect, the correct relative importance for LSTMs for an IDS is batch-size, followed by dropout ratio and padding. The findings are significant because when LSTM was first used for language models, the focus had mostly been on increasing the number of layers to enhance performance. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Journal ref: 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-9

arXiv:2109.05470 [pdf, other]

doi 10.1109/LCN52139.2021.9524929

DRo: A data-scarce mechanism to revolutionize the performance of Deep Learning based Security Systems

Authors: Mohit Sewak, Sanjay K. Sahay, Hemant Rathore

Abstract: Supervised Deep Learning requires plenty of labeled data to converge, and hence perform optimally for task-specific learning. Therefore, we propose a novel mechanism named DRo (for Deep Routing) for data-scarce domains like security. The DRo approach builds upon some of the recent developments in Deep-Clustering. In particular, it exploits the self-augmented training mechanism using synthetically… ▽ More Supervised Deep Learning requires plenty of labeled data to converge, and hence perform optimally for task-specific learning. Therefore, we propose a novel mechanism named DRo (for Deep Routing) for data-scarce domains like security. The DRo approach builds upon some of the recent developments in Deep-Clustering. In particular, it exploits the self-augmented training mechanism using synthetically generated local perturbations. DRo not only allays the challenges with sparse-labeled data but also offers many unique advantages. We also developed a system named DRoID that uses the DRo mechanism for enhancing the performance of an existing Malware Detection System that uses (low information features like the) Android implicit Intent(s) as the only features. We conduct experiments on DRoID using a popular and standardized Android malware dataset and found that the DRo mechanism could successfully reduce the false-alarms generated by the downstream classifier by 67.9%, and also simultaneously boosts its accuracy by 11.3%. This is significant not only because the gains achieved are unparalleled but also because the features used were never considered rich enough to train a classifier on; and hence no decent performance could ever be reported by any malware classification system till-date using these features in isolation. Owing to the results achieved, the DRo mechanism claims a dominant position amongst all known systems that aims to enhance the classification performance of deep learning models with sparse-labeled data. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Journal ref: 2021 IEEE 46th Conference on Local Computer Networks (LCN), 2021, pp. 581-588

arXiv:2108.09950 [pdf]

Digital Resilience for What? Case Study of South Korea

Authors: Kyung Ryul Park, Sundeep Sahay, Jørn Braa, Pamod Amarakoon

Abstract: Resilience has become an emerging topic in various fields of academic research. In spite of its widespread use, there remains conceptual confusion over what resilience means particularly in multi-disciplinary studies including the field of ICT and Development. With the potential of digital technology, research is needed to critically question what key socio-institutional values related to resilien… ▽ More Resilience has become an emerging topic in various fields of academic research. In spite of its widespread use, there remains conceptual confusion over what resilience means particularly in multi-disciplinary studies including the field of ICT and Development. With the potential of digital technology, research is needed to critically question what key socio-institutional values related to resilience are being strengthened, for what and for whom through the different conceptualizations of resilience. In this study, we conduct an interpretive case study on South Korea's response to the pandemic and construct a chronological narrative to identify key aspects of digital resilience. We identify agility, diversity, and plurality - enabled by active roles of various stakeholders, including citizens, research communities, and private sector - as keys to digital resilience to the pandemic. Findings from the case of South Korea provide implications to ICT4D research while discussing how develo** countries, where a national single window platform is typically implemented with greater level of homogeneity, achieve digital resilience with inclusive innovation with plurality of diverse platforms. △ Less

Submitted 23 August, 2021; originally announced August 2021.