-
Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
Authors:
Fatema Tuj Johora Faria,
Mukaffi Bin Moin,
Ahmed Al Wase,
Mehidi Ahmmed,
Md. Rabius Sani,
Tashreef Muhammad
Abstract:
The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by cre…
▽ More
The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by creating a collection of 32,500 sentences, encompassing Bangla, Banglish, and English, representing five regional Bangla dialects. Our aim is to translate these regional dialects into standard Bangla and detect regions accurately. To achieve this, we proposed models known as mT5 and BanglaT5 for translating regional dialects into standard Bangla. Additionally, we employed mBERT and Bangla-bert-base to determine the specific regions from where these dialects originated. Our experimental results showed the highest BLEU score of 69.06 for Mymensingh regional dialects and the lowest BLEU score of 36.75 for Chittagong regional dialects. We also observed the lowest average word error rate of 0.1548 for Mymensingh regional dialects and the highest of 0.3385 for Chittagong regional dialects. For region detection, we achieved an accuracy of 85.86% for Bangla-bert-base and 84.36% for mBERT. This is the first large-scale investigation of Bangla regional dialects to Bangla machine translation. We believe our findings will not only pave the way for future work on Bangla regional dialects to Bangla machine translation, but will also be useful in solving similar language-related challenges in low-resource language conditions.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Utilizing Technical Data to Discover Similar Companies in Dhaka Stock Exchange
Authors:
Tashreef Muhammad,
Tahsin Aziz,
Mohammad Shafiul Alam
Abstract:
Stock market investment have been an ideal form of investment for many years. Investing capitals smartly in stock market yields high profit returns. But there are many companies available in a market. Currently there are more than $345$ active companies who have stocks in Dhaka Stock Exchange (DSE). Analyzing all these companies is quite impossible. However, many companies tend to move together. T…
▽ More
Stock market investment have been an ideal form of investment for many years. Investing capitals smartly in stock market yields high profit returns. But there are many companies available in a market. Currently there are more than $345$ active companies who have stocks in Dhaka Stock Exchange (DSE). Analyzing all these companies is quite impossible. However, many companies tend to move together. This study aims at finding which companies in DSE have a close connection and move alongside each other. By analyzing this relation, the investors and traders will be able to analyze a lot of companies' statistics from a calculating just a handful number of companies. The conducted experiment yielded promising results. It was found that though the system was not given anything other than technical data, it was able to identify companies that show domain specific outcomes. In other words, a relation between technical data and fundamental data was discovered from the conducted experiment.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Frequency Distribution of Prime Numbers between an Integer and its Square: A Case Study
Authors:
Tashreef Muhammad,
G. M. Shahariar,
Tahsin Aziz,
Mohammad Shafiul Alam
Abstract:
The chronicle of prime numbers travel back thousands of years in human history. Not only the traits of prime numbers have surprised people, but also all those endeavors made for ages to find a pattern in the appearance of prime numbers has been captivating them. Until recently, it was firmly believed that prime numbers do not maintain any pattern of occurrence among themselves. This statement is c…
▽ More
The chronicle of prime numbers travel back thousands of years in human history. Not only the traits of prime numbers have surprised people, but also all those endeavors made for ages to find a pattern in the appearance of prime numbers has been captivating them. Until recently, it was firmly believed that prime numbers do not maintain any pattern of occurrence among themselves. This statement is conferred not to be completely true. This paper is also an attempt to discover a pattern in the occurrence of prime numbers. This work intends to introduce some mathematical well-known equations that point to the existence of a simplistic pattern in the number of primes within the range of a number and its square. We assume that the rigorous evaluation of the perceived pattern may benefit in many aspects such as applications of encryption, algorithms concerning prime numbers, and many more.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Can Transformer Models Effectively Detect Software Aspects in StackOverflow Discussion?
Authors:
Nibir Chandra Mandal,
Tashreef Muhammad,
G. M. Shahariar
Abstract:
Dozens of new tools and technologies are being incorporated to help developers, which is becoming a source of consternation as they struggle to choose one over the others. For example, there are at least ten frameworks available to developers for develo** web applications, posing a conundrum in selecting the best one that meets their needs. As a result, developers are continuously searching for…
▽ More
Dozens of new tools and technologies are being incorporated to help developers, which is becoming a source of consternation as they struggle to choose one over the others. For example, there are at least ten frameworks available to developers for develo** web applications, posing a conundrum in selecting the best one that meets their needs. As a result, developers are continuously searching for all of the benefits and drawbacks of each API, framework, tool, and so on. One of the typical approaches is to examine all of the features through official documentation and discussion. This approach is time-consuming, often makes it difficult to determine which aspects are the most important to a particular developer and whether a particular aspect is important to the community at large. In this paper, we have used a benchmark API aspects dataset (Opiner) collected from StackOverflow posts and observed how Transformer models (BERT, RoBERTa, DistilBERT, and XLNet) perform in detecting software aspects in textual developer discussion with respect to the baseline Support Vector Machine (SVM) model. Through extensive experimentation, we have found that transformer models improve the performance of baseline SVM for most of the aspects, i.e., `Performance', `Security', `Usability', `Documentation', `Bug', `Legal', `OnlySentiment', and `Others'. However, the models fail to apprehend some of the aspects (e.g., `Community' and `Potability') and their performance varies depending on the aspects. Also, larger architectures like XLNet are ineffective in interpreting software aspects compared to smaller architectures like DistilBERT.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
An Approach of Adjusting the Switch Probability based on Dimension Size: A Case Study for Performance Improvement of the Flower Pollination Algorithm
Authors:
Tahsin Aziz,
Tashreef Muhammad,
Md. Rashedul Karim Chowdhury,
Mohammad Shafiul Alam
Abstract:
Numerous meta-heuristic algorithms have been influenced by nature. Over the past couple of decades, their quantity has been significantly escalating. The majority of these algorithms attempt to emulate natural biological and physical phenomena. This research concentrates on the Flower Pollination algorithm, which is one of several bio-inspired algorithms. The original approach was suggested for po…
▽ More
Numerous meta-heuristic algorithms have been influenced by nature. Over the past couple of decades, their quantity has been significantly escalating. The majority of these algorithms attempt to emulate natural biological and physical phenomena. This research concentrates on the Flower Pollination algorithm, which is one of several bio-inspired algorithms. The original approach was suggested for pollen grain exploration and exploitation in confined space using a specific global pollination and local pollination strategy. As a "swarm intelligence" meta-heuristic algorithm, its strength lies in locating the vicinity of the optimum solution rather than identifying the minimum. A modification to the original method is detailed in this work. This research found that by changing the specific value of "switch probability" with dynamic values of different dimension sizes and functions, the outcome was mainly improved over the original flower pollination method.
△ Less
Submitted 20 August, 2022;
originally announced August 2022.
-
Transformer-Based Deep Learning Model for Stock Price Prediction: A Case Study on Bangladesh Stock Market
Authors:
Tashreef Muhammad,
Anika Bintee Aftab,
Md. Mainul Ahsan,
Maishameem Meherin Muhu,
Muhammad Ibrahim,
Shahidul Islam Khan,
Mohammad Shafiul Alam
Abstract:
In modern capital market the price of a stock is often considered to be highly volatile and unpredictable because of various social, financial, political and other dynamic factors. With calculated and thoughtful investment, stock market can ensure a handsome profit with minimal capital investment, while incorrect prediction can easily bring catastrophic financial loss to the investors. This paper…
▽ More
In modern capital market the price of a stock is often considered to be highly volatile and unpredictable because of various social, financial, political and other dynamic factors. With calculated and thoughtful investment, stock market can ensure a handsome profit with minimal capital investment, while incorrect prediction can easily bring catastrophic financial loss to the investors. This paper introduces the application of a recently introduced machine learning model - the Transformer model, to predict the future price of stocks of Dhaka Stock Exchange (DSE), the leading stock exchange in Bangladesh. The transformer model has been widely leveraged for natural language processing and computer vision tasks, but, to the best of our knowledge, has never been used for stock price prediction task at DSE. Recently the introduction of time2vec encoding to represent the time series features has made it possible to employ the transformer model for the stock price prediction. This paper concentrates on the application of transformer-based model to predict the price movement of eight specific stocks listed in DSE based on their historical daily and weekly data. Our experiments demonstrate promising results and acceptable root mean squared error on most of the stocks.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
The Sensorium competition on predicting large-scale mouse primary visual cortex activity
Authors:
Konstantin F. Willeke,
Paul G. Fahey,
Mohammad Bashiri,
Laura Pede,
Max F. Burg,
Christoph Blessing,
Santiago A. Cadena,
Zhiwei Ding,
Konstantin-Klemens Lurz,
Kayla Ponder,
Taliah Muhammad,
Saumil S. Patel,
Alexander S. Ecker,
Andreas S. Tolias,
Fabian H. Sinz
Abstract:
The neural underpinning of the biological visual system is challenging to study experimentally, in particular as the neuronal activity becomes increasingly nonlinear with respect to visual input. Artificial neural networks (ANNs) can serve a variety of goals for improving our understanding of this complex system, not only serving as predictive digital twins of sensory cortex for novel hypothesis g…
▽ More
The neural underpinning of the biological visual system is challenging to study experimentally, in particular as the neuronal activity becomes increasingly nonlinear with respect to visual input. Artificial neural networks (ANNs) can serve a variety of goals for improving our understanding of this complex system, not only serving as predictive digital twins of sensory cortex for novel hypothesis generation in silico, but also incorporating bio-inspired architectural motifs to progressively bridge the gap between biological and machine vision. The mouse has recently emerged as a popular model system to study visual information processing, but no standardized large-scale benchmark to identify state-of-the-art models of the mouse visual system has been established. To fill this gap, we propose the Sensorium benchmark competition. We collected a large-scale dataset from mouse primary visual cortex containing the responses of more than 28,000 neurons across seven mice stimulated with thousands of natural images, together with simultaneous behavioral measurements that include running speed, pupil dilation, and eye movements. The benchmark challenge will rank models based on predictive performance for neuronal responses on a held-out test set, and includes two tracks for model input limited to either stimulus only (Sensorium) or stimulus plus behavior (Sensorium+). We provide a starting kit to lower the barrier for entry, including tutorials, pre-trained baseline models, and APIs with one line commands for data loading and submission. We would like to see this as a starting point for regular challenges and data releases, and as a standard tool for measuring progress in large-scale neural system identification models of the mouse visual system and beyond.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Learning From Brains How to Regularize Machines
Authors:
Zhe Li,
Wieland Brendel,
Edgar Y. Walker,
Erick Cobos,
Taliah Muhammad,
Jacob Reimer,
Matthias Bethge,
Fabian H. Sinz,
Xaq Pitkow,
Andreas S. Tolias
Abstract:
Despite impressive performance on numerous visual tasks, Convolutional Neural Networks (CNNs) --- unlike brains --- are often highly sensitive to small perturbations of their input, e.g. adversarial noise leading to erroneous decisions. We propose to regularize CNNs using large-scale neuroscience data to learn more robust neural features in terms of representational similarity. We presented natura…
▽ More
Despite impressive performance on numerous visual tasks, Convolutional Neural Networks (CNNs) --- unlike brains --- are often highly sensitive to small perturbations of their input, e.g. adversarial noise leading to erroneous decisions. We propose to regularize CNNs using large-scale neuroscience data to learn more robust neural features in terms of representational similarity. We presented natural images to mice and measured the responses of thousands of neurons from cortical visual areas. Next, we denoised the notoriously variable neural activity using strong predictive models trained on this large corpus of responses from the mouse visual system, and calculated the representational similarity for millions of pairs of images from the model's predictions. We then used the neural representation similarity to regularize CNNs trained on image classification by penalizing intermediate representations that deviated from neural ones. This preserved performance of baseline models when classifying images under standard benchmarks, while maintaining substantially higher performance compared to baseline or control models when classifying noisy images. Moreover, the models regularized with cortical representations also improved model robustness in terms of adversarial attacks. This demonstrates that regularizing with neural data can be an effective tool to create an inductive bias towards more robust inference.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Penalized matrix decomposition for denoising, compression, and improved demixing of functional imaging data
Authors:
E. Kelly Buchanan,
Ian Kinsella,
Ding Zhou,
Rong Zhu,
Pengcheng Zhou,
Felipe Gerhard,
John Ferrante,
Ying Ma,
Sharon Kim,
Mohammed Shaik,
Yajie Liang,
Rongwen Lu,
Jacob Reimer,
Paul Fahey,
Taliah Muhammad,
Graham Dempsey,
Elizabeth Hillman,
Na Ji,
Andreas Tolias,
Liam Paninski
Abstract:
Calcium imaging has revolutionized systems neuroscience, providing the ability to image large neural populations with single-cell resolution. The resulting datasets are quite large, which has presented a barrier to routine open sharing of this data, slowing progress in reproducible research. State of the art methods for analyzing this data are based on non-negative matrix factorization (NMF); thes…
▽ More
Calcium imaging has revolutionized systems neuroscience, providing the ability to image large neural populations with single-cell resolution. The resulting datasets are quite large, which has presented a barrier to routine open sharing of this data, slowing progress in reproducible research. State of the art methods for analyzing this data are based on non-negative matrix factorization (NMF); these approaches solve a non-convex optimization problem, and are effective when good initializations are available, but can break down in low-SNR settings where common initialization approaches fail. Here we introduce an approach to compressing and denoising functional imaging data. The method is based on a spatially-localized penalized matrix decomposition (PMD) of the data to separate (low-dimensional) signal from (temporally-uncorrelated) noise. This approach can be applied in parallel on local spatial patches and is therefore highly scalable, does not impose non-negativity constraints or require stringent identifiability assumptions (leading to significantly more robust results compared to NMF), and estimates all parameters directly from the data, so no hand-tuning is required. We have applied the method to a wide range of functional imaging data (including one-photon, two-photon, three-photon, widefield, somatic, axonal, dendritic, calcium, and voltage imaging datasets): in all cases, we observe ~2-4x increases in SNR and compression rates of 20-300x with minimal visible loss of signal, with no adjustment of hyperparameters; this in turn facilitates the process of demixing the observed activity into contributions from individual neurons. We focus on two challenging applications: dendritic calcium imaging data and voltage imaging data in the context of optogenetic stimulation. In both cases, we show that our new approach leads to faster and much more robust extraction of activity from the data.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
ClassSpy: Java Object Pattern Visualization Tool
Authors:
Tufail Muhammad,
Zahid Halim,
Majid Ali Khan
Abstract:
Modern java programs consist of large number of classes as well as vast amount of objects instantiated during program execution. Software developers are always keen to know the number of objects created for each class. This information is helpful for a developer in understanding the packages/classes of a program and optimizing their code. However, understanding such a vast amount of information is…
▽ More
Modern java programs consist of large number of classes as well as vast amount of objects instantiated during program execution. Software developers are always keen to know the number of objects created for each class. This information is helpful for a developer in understanding the packages/classes of a program and optimizing their code. However, understanding such a vast amount of information is not a trivial task. Visualization helps to depict this information on a single screen and to comprehend it efficiently. This paper presents a visualization approach that depicts information about all the objects instantiated during the program execution. The proposed technique is more space efficient and scalable to handle vast datasets, at the same time helpful to identify the key program components. This easy to use interface provides user an environment to glimpse the entire objects on a single screen. The proposed approach allows sorting objects at class, thread and method levels. Effectiveness and usability of the proposed approach is shown through case studies.
△ Less
Submitted 9 June, 2014;
originally announced June 2014.