-
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Authors:
Manish Dhakal,
Arman Chhetri,
Aman Kumar Gupta,
Prabin Lamichhane,
Suraj Pandey,
Subarna Shakya
Abstract:
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen…
▽ More
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequency Cepstral Coefficients (MFCCs) are used as audio features to feed into the model. The model having Bidirectional LSTM paired with ResNet and one-dimensional CNN produces the best results for this dataset out of all the models (neural networks with variations of LSTM, GRU, CNN, and ResNet) that have been trained so far. This novel model uses Connectionist Temporal Classification (CTC) function for loss calculation during training and CTC beam search decoding for predicting characters as the most likely sequence of Nepali text. On the test dataset, the character error rate (CER) of 17.06 percent has been achieved. The source code is available at: https://github.com/manishdhakal/ASR-Nepali-using-CNN-BiLSTM-ResNet.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Contextual Spelling Correction with Language Model for Low-resource Setting
Authors:
Nishant Luitel,
Nirajan Bekoju,
Anand Kumar Sah,
Subarna Shakya
Abstract:
The task of Spell Correction(SC) in low-resource languages presents a significant challenge due to the availability of only a limited corpus of data and no annotated spelling correction datasets. To tackle these challenges a small-scale word-based transformer LM is trained to provide the SC model with contextual understanding. Further, the probabilistic error rules are extracted from the corpus in…
▽ More
The task of Spell Correction(SC) in low-resource languages presents a significant challenge due to the availability of only a limited corpus of data and no annotated spelling correction datasets. To tackle these challenges a small-scale word-based transformer LM is trained to provide the SC model with contextual understanding. Further, the probabilistic error rules are extracted from the corpus in an unsupervised way to model the tendency of error happening(error model). Then the combination of LM and error model is used to develop the SC model through the well-known noisy channel framework. The effectiveness of this approach is demonstrated through experiments on the Nepali language where there is access to just an unprocessed corpus of textual data.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Can Perplexity Predict Fine-Tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali
Authors:
Nishant Luitel,
Nirajan Bekoju,
Anand Kumar Sah,
Subarna Shakya
Abstract:
Recent language models use subwording mechanisms to handle Out-of-Vocabulary(OOV) words seen during test time and, their generation capacity is generally measured using perplexity, an intrinsic metric. It is known that increasing the subword granularity results in a decrease of perplexity value. However, the study of how subwording affects the understanding capacity of language models has been ver…
▽ More
Recent language models use subwording mechanisms to handle Out-of-Vocabulary(OOV) words seen during test time and, their generation capacity is generally measured using perplexity, an intrinsic metric. It is known that increasing the subword granularity results in a decrease of perplexity value. However, the study of how subwording affects the understanding capacity of language models has been very few and only limited to a handful of languages. To reduce this gap we used 6 different tokenization schemes to pretrain relatively small language models in Nepali and used the representations learned to finetune on several downstream tasks. Although byte-level BPE algorithm has been used in recent models like GPT, RoBERTa we show that on average they are sub-optimal in comparison to algorithms such as SentencePiece in finetuning performances for Nepali. Additionally, similar recent studies have focused on the Bert-based language model. We, however, pretrain and finetune sequential transformer-based language models.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Homemade kit for demonstrating Barkhausen Effect
Authors:
Shantanu Shakya,
Navinder Singh
Abstract:
This paper presents an innovative and cost-effective approach to understanding the Barkhausen effect through the design and implementation of an educational kit. The Barkhausen effect, characterized by Barkhausen noise (BN) during magnetization changes in soft magnetic materials, is explored for its application in probing hysteresis properties and magnetization dynamics. The study investigates sca…
▽ More
This paper presents an innovative and cost-effective approach to understanding the Barkhausen effect through the design and implementation of an educational kit. The Barkhausen effect, characterized by Barkhausen noise (BN) during magnetization changes in soft magnetic materials, is explored for its application in probing hysteresis properties and magnetization dynamics. The study investigates scaling properties, categorizing ferromagnetic materials based on scaling exponents. The primary contribution is the introduction of a practical and accessible kit for hands-on Barkhausen Effect demonstrations, revolutionizing the educational experience. This kit enables students to not only comprehend the intricacies of BN but also calculate the scaling constant ($τ$) for Soft Iron samples. The paper demonstrates the successful construction of the kit, its signal amplification capabilities, and data collection accuracy, showcasing its potential for widespread educational use.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Interpreting Indirect Answers to Yes-No Questions in Multiple Languages
Authors:
Zijie Wang,
Md Mosharaf Hossain,
Shivam Mathur,
Terry Cruz Melo,
Kadir Bulut Ozler,
Keun Hee Park,
Jacob Quintero,
MohammadHossein Rezaei,
Shreya Nupur Shakya,
Md Nayem Uddin,
Eduardo Blanco
Abstract:
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are us…
▽ More
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Analysis, Detection, and Classification of Android Malware using System Calls
Authors:
Shubham Shakya,
Mayank Dave
Abstract:
With the increasing popularity of Android in the last decade, Android is popular among users as well as attackers. The vast number of android users grabs the attention of attackers on android. Due to the continuous evolution of the variety and attacking techniques of android malware, our detection methods should need an update too. Most of the researcher's works are based on static features, and v…
▽ More
With the increasing popularity of Android in the last decade, Android is popular among users as well as attackers. The vast number of android users grabs the attention of attackers on android. Due to the continuous evolution of the variety and attacking techniques of android malware, our detection methods should need an update too. Most of the researcher's works are based on static features, and very few focus on dynamic features. In this paper, we are filling the literature gap by detecting android malware using System calls. We are running the malicious app in a monitored and controlled environment using an emulator to detect malware. Malicious behavior is activated with some simulated events during its runtime to activate its hostile behavior. Logs collected during the app's runtime are analyzed and fed to different machine learning models for Detection and Family classification of Malware. The result indicates that K-Nearest Neighbor and the Decision Tree gave the highest accuracy in malware detection and Family Classification respectively.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Effect of scattering and electronic noise upon selection of detectors for Gamma Computerized Tomography
Authors:
Kajal Kumari,
Snehlata Shakya,
Mayank Goswami
Abstract:
Computed tomography (CT) has become a vital tool in a variety of fields as a result of technological developments and continual improvement. High-quality CT images are desirable for image interpretation and obtaining information from CT images. A variety of things influence the CT image quality. Various research groups have investigated and attempted to improve image quality by examining noise/err…
▽ More
Computed tomography (CT) has become a vital tool in a variety of fields as a result of technological developments and continual improvement. High-quality CT images are desirable for image interpretation and obtaining information from CT images. A variety of things influence the CT image quality. Various research groups have investigated and attempted to improve image quality by examining noise/error associated with CT geometry. This study aims to select detectors for CT, which yield the least amount of noise in projection data. Three distinct gamma-ray detectors that are routinely used in CT have been compared in terms of scattering and electrical noise. The sensitivity of Kanpur Theorem-1 to scattering noise is demonstrated in this work and used to quantify the relative level of scattering noise. The detector measures the signal multiple times, and the standard deviation of the signal is used to calculate the electronic noise. It is observed that IC CsI(Tl) scintillation detector produces low electronic noise and relative scattering noise as compared to conventional electronic detectors; NaI(Tl) and HPGe.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Age Range Estimation using MTCNN and VGG-Face Model
Authors:
Dipesh Gyawali,
Prashanga Pokharel,
Ashutosh Chauhan,
Subodh Chandra Shakya
Abstract:
The Convolutional Neural Network has amazed us with its usage on several applications. Age range estimation using CNN is emerging due to its application in myriad of areas which makes it a state-of-the-art area for research and improve the estimation accuracy. A deep CNN model is used for identification of people's age range in our proposed work. At first, we extracted only face images from image…
▽ More
The Convolutional Neural Network has amazed us with its usage on several applications. Age range estimation using CNN is emerging due to its application in myriad of areas which makes it a state-of-the-art area for research and improve the estimation accuracy. A deep CNN model is used for identification of people's age range in our proposed work. At first, we extracted only face images from image dataset using MTCNN to remove unnecessary features other than face from the image. Secondly, we used random crop technique for data augmentation to improve the model performance. We have used the concept of transfer learning in our research. A pretrained face recognition model i.e VGG-Face is used to build our model for identification of age range whose performance is evaluated on Adience Benchmark for confirming the efficacy of our work. The performance in test set outperformed existing state-of-the-art by substantial margins.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
A Comparison of Semantic Similarity Methods for Maximum Human Interpretability
Authors:
Pinky Sitikhu,
Kritish Pahi,
Pujan Thapa,
Subarna Shakya
Abstract:
The inclusion of semantic information in any similarity measures improves the efficiency of the similarity measure and provides human interpretable results for further analysis. The similarity calculation method that focuses on features related to the text's words only, will give less accurate results. This paper presents three different methods that not only focus on the text's words but also inc…
▽ More
The inclusion of semantic information in any similarity measures improves the efficiency of the similarity measure and provides human interpretable results for further analysis. The similarity calculation method that focuses on features related to the text's words only, will give less accurate results. This paper presents three different methods that not only focus on the text's words but also incorporates semantic information of texts in their feature vector and computes semantic similarities. These methods are based on corpus-based and knowledge-based methods, which are: cosine similarity using tf-idf vectors, cosine similarity using word embedding and soft cosine similarity using word embedding. Among these three, cosine similarity using tf-idf vectors performed best in finding similarities between short news texts. The similar texts given by the method are easy to interpret and can be used directly in other information retrieval applications.
△ Less
Submitted 30 October, 2019; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Fine-grained Sentiment Classification using BERT
Authors:
Manish Munikar,
Sushil Shakya,
Aakash Shrestha
Abstract:
Sentiment classification is an important process in understanding people's perception towards a product, service, or topic. Many natural language processing models have been proposed to solve the sentiment classification problem. However, most of them have focused on binary sentiment classification. In this paper, we use a promising deep learning model called BERT to solve the fine-grained sentime…
▽ More
Sentiment classification is an important process in understanding people's perception towards a product, service, or topic. Many natural language processing models have been proposed to solve the sentiment classification problem. However, most of them have focused on binary sentiment classification. In this paper, we use a promising deep learning model called BERT to solve the fine-grained sentiment classification task. Experiments show that our model outperforms other popular models for this task without sophisticated architecture. We also demonstrate the effectiveness of transfer learning in natural language processing in the process.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Quanvolutional Neural Networks: Powering Image Recognition with Quantum Circuits
Authors:
Maxwell Henderson,
Samriddhi Shakya,
Shashindra Pradhan,
Tristan Cook
Abstract:
Convolutional neural networks (CNNs) have rapidly risen in popularity for many machine learning applications, particularly in the field of image recognition. Much of the benefit generated from these networks comes from their ability to extract features from the data in a hierarchical manner. These features are extracted using various transformational layers, notably the convolutional layer which g…
▽ More
Convolutional neural networks (CNNs) have rapidly risen in popularity for many machine learning applications, particularly in the field of image recognition. Much of the benefit generated from these networks comes from their ability to extract features from the data in a hierarchical manner. These features are extracted using various transformational layers, notably the convolutional layer which gives the model its name. In this work, we introduce a new type of transformational layer called a quantum convolution, or quanvolutional layer. Quanvolutional layers operate on input data by locally transforming the data using a number of random quantum circuits, in a way that is similar to the transformations performed by random convolutional filter layers. Provided these quantum transformations produce meaningful features for classification purposes, then the overall algorithm could be quite useful for near term quantum computing, because it requires small quantum circuits with little to no error correction. In this work, we empirically evaluated the potential benefit of these quantum transformations by comparing three types of models built on the MNIST dataset: CNNs, quantum convolutional neural networks (QNNs), and CNNs with additional non-linearities introduced. Our results showed that the QNN models had both higher test set accuracy as well as faster training compared to the purely classical CNNs.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Word Sense Disambiguation using WSD specific Wordnet of Polysemy Words
Authors:
Udaya Raj Dhungana,
Subarna Shakya,
Kabita Baral,
Bharat Sharma
Abstract:
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy word based on the clue words. The related words for each sense of a polysemy word as well as single sense word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concep…
▽ More
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy word based on the clue words. The related words for each sense of a polysemy word as well as single sense word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of WordNet, we developed a new model of WordNet that organizes the different senses of polysemy words as well as the single sense words based on the clue words. These clue words for each sense of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or adverb.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.