-
A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems
Authors:
Rupak Raj Ghimire,
Bal Krishna Bal,
Prakash Poudyal
Abstract:
In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nep…
▽ More
In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system. In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Nepali Video Captioning using CNN-RNN Architecture
Authors:
Bipesh Subedi,
Saugat Singh,
Bal Krishna Bal
Abstract:
This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Tr…
▽ More
This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Translate, the study trains various CNN-RNN architectures. The research explores the effectiveness of CNNs (e.g., EfficientNetB0, ResNet101, VGG16) paired with different RNN decoders like LSTM, GRU, and BiLSTM. Evaluation involves BLEU and METEOR metrics, with the best model being EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score of 17 and METEOR score of 46. The article also outlines challenges and future directions for advancing Nepali video captioning, offering a crucial resource for further research in this area.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks
Authors:
Yagmur Yigit,
Bahadir Bal,
Aytac Karameseoglu,
Trung Q. Duong,
Berk Canberk
Abstract:
Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architectur…
▽ More
Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architecture based on the digital twin for ISP core networks. We implemented a Yet Another Next Generation (YANG) model and an automated feature selection (AutoFS) module to handle core network data. We used an online learning approach to update the model instantly and efficiently, improve the learning model quickly, and ensure accurate predictions. Finally, we reveal that our proposed solution successfully detects DDoS attacks and updates the feature selection method and learning model with a true classification rate of ninety-seven percent. Our proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts.
△ Less
Submitted 25 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?
Authors:
Siyang Song,
Micol Spitale,
Yiming Luo,
Batuhan Bal,
Hatice Gunes
Abstract:
According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions,…
▽ More
According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
△ Less
Submitted 23 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
AssembleRL: Learning to Assemble Furniture from Their Point Clouds
Authors:
Özgür Aslan,
Burak Bolat,
Batuhan Bal,
Tuğba Tümer,
Erol Şahin,
Sinan Kalkan
Abstract:
The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent…
▽ More
The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent years have witnessed promising learning-based approaches for furniture assembly, they assume the availability of correct connection labels for each assembly step, which are expensive to obtain in practice. In this paper, we alleviate this assumption and aim to solve furniture assembly with as little human expertise and supervision as possible. To be specific, we assume the availability of the assembled point cloud, and comparing the point cloud of the current assembly and the point cloud of the target product, obtain a novel reward signal based on two measures: Incorrectness and incompleteness. We show that our novel reward signal can train a deep network to successfully assemble different types of furniture. Code and networks available here: https://github.com/METU-KALFA/AssembleRL
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Statistical Shape Analysis of Brain Arterial Networks (BAN)
Authors:
Xiaoyang Guo,
Aditi Basu Bal,
Tom Needham,
Anuj Srivastava
Abstract:
Structures of brain arterial networks (BANs) - that are complex arrangements of individual arteries, their branching patterns, and inter-connectivities - play an important role in characterizing and understanding brain physiology. One would like tools for statistically analyzing the shapes of BANs, i.e. quantify shape differences, compare population of subjects, and study the effects of covariates…
▽ More
Structures of brain arterial networks (BANs) - that are complex arrangements of individual arteries, their branching patterns, and inter-connectivities - play an important role in characterizing and understanding brain physiology. One would like tools for statistically analyzing the shapes of BANs, i.e. quantify shape differences, compare population of subjects, and study the effects of covariates on these shapes. This paper mathematically represents and statistically analyzes BAN shapes as elastic shape graphs. Each elastic shape graph is made up of nodes that are connected by a number of 3D curves, and edges, with arbitrary shapes. We develop a mathematical representation, a Riemannian metric and other geometrical tools, such as computations of geodesics, means and covariances, and PCA for analyzing elastic graphs and BANs. This analysis is applied to BANs after separating them into four components -- top, bottom, left, and right. This framework is then used to generate shape summaries of BANs from 92 subjects, and to study the effects of age and gender on shapes of BAN components. We conclude that while gender effects require further investigation, the age has a clear, quantifiable effect on BAN shapes. Specifically, we find an increased variance in BAN shapes as age increases.
△ Less
Submitted 22 March, 2022; v1 submitted 7 July, 2020;
originally announced July 2020.