Search | arXiv e-print repository

A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems

Authors: Rupak Raj Ghimire, Bal Krishna Bal, Prakash Poudyal

Abstract: In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nep… ▽ More In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system. In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted in International Conference on Technologies for Computer, Electrical, Electronics & Communication (ICT-CEEL 2023)

arXiv:2311.02699 [pdf]

Nepali Video Captioning using CNN-RNN Architecture

Authors: Bipesh Subedi, Saugat Singh, Bal Krishna Bal

Abstract: This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Tr… ▽ More This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Translate, the study trains various CNN-RNN architectures. The research explores the effectiveness of CNNs (e.g., EfficientNetB0, ResNet101, VGG16) paired with different RNN decoders like LSTM, GRU, and BiLSTM. Evaluation involves BLEU and METEOR metrics, with the best model being EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score of 17 and METEOR score of 46. The article also outlines challenges and future directions for advancing Nepali video captioning, offering a crucial resource for further research in this area. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 6 pages, 5 figures, 3 tables. Presented in the International Conference on Technologies for Computer, Electrical, Electronics & Communication (ICT-CEEL 2023), Bhaktapur, Nepal

ACM Class: I.2.7; I.2.10

Journal ref: In proceedings of the International Conference on Technologies for Computer, Electrical, Electronics & Communication (ICT-CEEL 2023), Bhaktapur, Nepal. Part-1 94-99

arXiv:2310.12924 [pdf, other]

doi 10.1109/MCOMSTD.0001.2100022

Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks

Authors: Yagmur Yigit, Bahadir Bal, Aytac Karameseoglu, Trung Q. Duong, Berk Canberk

Abstract: Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architectur… ▽ More Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architecture based on the digital twin for ISP core networks. We implemented a Yet Another Next Generation (YANG) model and an automated feature selection (AutoFS) module to handle core network data. We used an online learning approach to update the model instantly and efficiently, improve the learning model quickly, and ensure accurate predictions. Finally, we reveal that our proposed solution successfully detects DDoS attacks and updates the feature selection method and learning model with a true classification rate of ninety-seven percent. Our proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts. △ Less

Submitted 25 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Journal ref: IEEE Communications Standards Magazine, vol. 6, no. 3, pp. 38-44, September 2022

arXiv:2302.06514 [pdf, other]

Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?

Authors: Siyang Song, Micol Spitale, Yiming Luo, Batuhan Bal, Hatice Gunes

Abstract: According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions,… ▽ More According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions. △ Less

Submitted 23 March, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

MSC Class: 68T40

arXiv:2209.07268 [pdf, other]

AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Authors: Özgür Aslan, Burak Bolat, Batuhan Bal, Tuğba Tümer, Erol Şahin, Sinan Kalkan

Abstract: The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent… ▽ More The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent years have witnessed promising learning-based approaches for furniture assembly, they assume the availability of correct connection labels for each assembly step, which are expensive to obtain in practice. In this paper, we alleviate this assumption and aim to solve furniture assembly with as little human expertise and supervision as possible. To be specific, we assume the availability of the assembled point cloud, and comparing the point cloud of the current assembly and the point cloud of the target product, obtain a novel reward signal based on two measures: Incorrectness and incompleteness. We show that our novel reward signal can train a deep network to successfully assemble different types of furniture. Code and networks available here: https://github.com/METU-KALFA/AssembleRL △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 6 pages, 6 figures, iros2022

arXiv:2007.04793 [pdf, other]

Statistical Shape Analysis of Brain Arterial Networks (BAN)

Authors: Xiaoyang Guo, Aditi Basu Bal, Tom Needham, Anuj Srivastava

Abstract: Structures of brain arterial networks (BANs) - that are complex arrangements of individual arteries, their branching patterns, and inter-connectivities - play an important role in characterizing and understanding brain physiology. One would like tools for statistically analyzing the shapes of BANs, i.e. quantify shape differences, compare population of subjects, and study the effects of covariates… ▽ More Structures of brain arterial networks (BANs) - that are complex arrangements of individual arteries, their branching patterns, and inter-connectivities - play an important role in characterizing and understanding brain physiology. One would like tools for statistically analyzing the shapes of BANs, i.e. quantify shape differences, compare population of subjects, and study the effects of covariates on these shapes. This paper mathematically represents and statistically analyzes BAN shapes as elastic shape graphs. Each elastic shape graph is made up of nodes that are connected by a number of 3D curves, and edges, with arbitrary shapes. We develop a mathematical representation, a Riemannian metric and other geometrical tools, such as computations of geodesics, means and covariances, and PCA for analyzing elastic graphs and BANs. This analysis is applied to BANs after separating them into four components -- top, bottom, left, and right. This framework is then used to generate shape summaries of BANs from 92 subjects, and to study the effects of age and gender on shapes of BAN components. We conclude that while gender effects require further investigation, the age has a clear, quantifiable effect on BAN shapes. Specifically, we find an increased variance in BAN shapes as age increases. △ Less

Submitted 22 March, 2022; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2003.00287

Showing 1–6 of 6 results for author: Bal, B