-
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs
Authors:
Tamzeed Mahfuz,
Satak Kumar Dey,
Ruwad Naswan,
Hasnaen Adil,
Khondker Salman Sayeed,
Haz Sameen Shahgir
Abstract:
Each new generation of English-oriented Large Language Models (LLMs) exhibits enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low-resource languages. This prompts the question: Is there a need for LLMs dedicated to a particular low-resource language? We aim to explore this question for Bengali, a low-to-moderate resource Indo-Aryan language native to the Be…
▽ More
Each new generation of English-oriented Large Language Models (LLMs) exhibits enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low-resource languages. This prompts the question: Is there a need for LLMs dedicated to a particular low-resource language? We aim to explore this question for Bengali, a low-to-moderate resource Indo-Aryan language native to the Bengal region of South Asia.
We compare the performance of open-weight and closed-source LLMs such as LLaMA-3 and GPT-4 against fine-tuned encoder-decoder models across a diverse set of Bengali downstream tasks, including translation, summarization, paraphrasing, question-answering, and natural language inference. Our findings reveal that while LLMs generally excel in reasoning tasks, their performance in tasks requiring Bengali script generation is inconsistent. Key challenges include inefficient tokenization of Bengali script by existing LLMs, leading to increased computational costs and potential performance degradation. Additionally, we highlight biases in machine-translated datasets commonly used for Bengali NLP tasks. We conclude that there is a significant need for a Bengali-oriented LLM, but the field currently lacks the high-quality pretraining and instruction-tuning datasets necessary to develop a highly effective model.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Authors:
Haz Sameen Shahgir,
Khondker Salman Sayeed,
Abhik Bhattacharjee,
Wasi Uddin Ahmad,
Yue Dong,
Rifat Shahriyar
Abstract:
The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present Illusi…
▽ More
The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present IllusionVQA: a diverse dataset of challenging optical illusions and hard-to-interpret scenes to test the capability of VLMs in two distinct multiple-choice VQA tasks - comprehension and soft localization. GPT4V, the best-performing VLM, achieves 62.99% accuracy (4-shot) on the comprehension task and 49.7% on the localization task (4-shot and Chain-of-Thought). Human evaluation reveals that humans achieve 91.03% and 100% accuracy in comprehension and localization. We discover that In-Context Learning (ICL) and Chain-of-Thought reasoning substantially degrade the performance of GeminiPro on the localization task. Tangentially, we discover a potential weakness in the ICL capabilities of VLMs: they fail to locate optical illusions even when the correct answer is in the context window as a few-shot example.
△ Less
Submitted 30 March, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition
Authors:
Haz Sameen Shahgir,
Khondker Salman Sayeed,
Md Toki Tahmid,
Tanjeem Azwad Zaman,
Md. Zarif Ul Alam
Abstract:
Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially…
▽ More
Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially Bangla Sign Language - has lagged significantly. In this research paper, we present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words, along with two different approaches: one with a 3D Convolutional Neural Network model and another with a novel Graph Neural Network approach for the classification of BdSL40 dataset. This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997). The proposed GNN model achieved an F1 score of 89%. The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature. We release the dataset and source code to stimulate further research.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
cryptoRAN: A review on cryptojacking and ransomware attacks w.r.t. banking industry -- threats, challenges, & problems
Authors:
Naresh Kshetri,
Mir Mehedi Rahman,
Sayed Abu Sayeed,
Irin Sultana
Abstract:
In the banking industry, ransomware is a well-known threat, but since the beginning of 2022, cryptojacking, an emerging threat is posing a considerable challenge to the banking industry. Ransomware has variants, and the attackers keep changing the nature of these variants. This review paper studies the complex background of these two threats and scrutinizes the actual challenges, and problems that…
▽ More
In the banking industry, ransomware is a well-known threat, but since the beginning of 2022, cryptojacking, an emerging threat is posing a considerable challenge to the banking industry. Ransomware has variants, and the attackers keep changing the nature of these variants. This review paper studies the complex background of these two threats and scrutinizes the actual challenges, and problems that the banking industry and financial institutions face. These threats, though distinct in nature, share commonalities, such as financial motivations and sophisticated techniques. We focus on examining the newly emerged variants of ransomware while we provide a comprehensive idea of cryptojacking and its nature. This paper involves a detailed breakdown of the specific threats posed by cryptojacking and ransomware. It explores the techniques cybercriminals use, the variabilities they look for, and the potential consequences for financial institutions and their customers. This paper also finds out how cybercriminals change their techniques following the security upgrades, and why financial firms including banks need to be proactive about cyber threats. Additionally, this paper reviews the background study of some existing papers, finds the research gaps that need to be addressed, and provides suggestions including a conclusion and future scope on those disputes. Lastly, we introduce a Digital Forensics and Incident Response (DFIR) approach for up-to-date cyber threat hunting processes for minimizing both cryptojacking and ransomware attacks in the banking industry.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Ophthalmic Biomarker Detection Using Ensembled Vision Transformers -- Winning Solution to IEEE SPS VIP Cup 2023
Authors:
H. A. Z. Sameen Shahgir,
Khondker Salman Sayeed,
Tanjeem Azwad Zaman,
Md. Asif Haider,
Sheikh Saifur Rahman Jony,
M. Sohel Rahman
Abstract:
This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensemb…
▽ More
This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensembled them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for the detection of local features while EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features. Ours was the best-performing solution in the competition, achieving a patient-wise F1 score of 0.814 in the first phase and 0.8527 in the second and final phase of VIP Cup 2023, scoring 3.8% higher than the next-best solution.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Bangla Grammatical Error Detection Using T5 Transformer Model
Authors:
H. A. Z. Sameen Shahgir,
Khondker Salman Sayeed
Abstract:
This paper presents a method for detecting grammatical errors in Bangla using a Text-to-Text Transfer Transformer (T5) Language Model, using the small variant of BanglaT5, fine-tuned on a corpus of 9385 sentences where errors were bracketed by the dedicated demarcation symbol. The T5 model was primarily designed for translation and is not specifically designed for this task, so extensive post-proc…
▽ More
This paper presents a method for detecting grammatical errors in Bangla using a Text-to-Text Transfer Transformer (T5) Language Model, using the small variant of BanglaT5, fine-tuned on a corpus of 9385 sentences where errors were bracketed by the dedicated demarcation symbol. The T5 model was primarily designed for translation and is not specifically designed for this task, so extensive post-processing was necessary to adapt it to the task of error detection. Our experiments show that the T5 model can achieve low Levenshtein Distance in detecting grammatical errors in Bangla, but post-processing is essential to achieve optimal performance. The final average Levenshtein Distance after post-processing the output of the fine-tuned model was 1.0394 on a test set of 5000 sentences. This paper also presents a detailed analysis of the errors detected by the model and discusses the challenges of adapting a translation model for grammar. Our approach can be extended to other languages, demonstrating the potential of T5 models for detecting grammatical errors in a wide range of languages.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Towards The Creation Of The Future Fish Farm
Authors:
Pavlos Papadopoulos,
William J Buchanan,
Sarwar Sayeed,
Nikolaos Pitropakis
Abstract:
A fish farm is an area where fish raise and bred for food. Fish farm environments support the care and management of seafood within a controlled environment. Over the past few decades, there has been a remarkable increase in the calorie intake of protein attributed to seafood. Along with this, there are significant opportunities within the fish farming industry for economic development. Determinin…
▽ More
A fish farm is an area where fish raise and bred for food. Fish farm environments support the care and management of seafood within a controlled environment. Over the past few decades, there has been a remarkable increase in the calorie intake of protein attributed to seafood. Along with this, there are significant opportunities within the fish farming industry for economic development. Determining the fish diseases, monitoring the aquatic organisms, and examining the imbalance in the water element are some key factors that require precise observation to determine the accuracy of the acquired data. Similarly, due to the rapid expansion of aquaculture, new technologies are constantly being implemented in this sector to enhance efficiency. However, the existing approaches have often failed to provide an efficient method of farming fish. This work has kept aside the traditional approaches and opened up new dimensions to perform accurate analysis by adopting a distributed ledger technology. Our work analyses the current state-of-the-art of fish farming and proposes a fish farm ecosystem that relies on a private-by-design architecture based on the Hyperledger Fabric private-permissioned distributed ledger technology. The proposed method puts forward accurate and secure storage of the retrieved data from multiple sensors across the ecosystem so that the adhering entities can exercise their decision based on the acquired data. This study demonstrates a proof-of-concept to signify the efficiency and usability of the future fish farm.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Transforming EU Governance: The Digital Integration through EBSI and GLASS
Authors:
Dimitrios Kasimatis,
William J Buchanan,
Mwarwan Abubakar,
Owen Lo,
Christos Chrysoulas,
Nikolaos Pitropakis,
Pavlos Papadopoulos,
Sarwar Sayeed,
Marc Sel
Abstract:
Traditionally, government systems managed citizen identities through disconnected data systems, using simple identifiers and paper-based processes, limiting digital trust and requiring citizens to request identity verification documents. The digital era offers a shift towards unique digital identifiers for each citizen, enabling a 'citizen wallet' for easier access to personal documents like acade…
▽ More
Traditionally, government systems managed citizen identities through disconnected data systems, using simple identifiers and paper-based processes, limiting digital trust and requiring citizens to request identity verification documents. The digital era offers a shift towards unique digital identifiers for each citizen, enabling a 'citizen wallet' for easier access to personal documents like academic records and licences, with enhanced security through digital signatures. The European Commission's initiative for a digital wallet for every EU citizen aims to improve mobility and integration, leveraging the European Blockchain Services Infrastructure (EBSI) for harmonised citizen integration. This paper discusses how EBSI and the GLASS project can advance governance and streamline access to identity documents.
△ Less
Submitted 19 April, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset
Authors:
H. A. Z. Sameen Shahgir,
Khondker Salman Sayeed,
Tanjeem Azwad Zaman
Abstract:
Speech is inherently continuous, where discrete words, phonemes and other units are not clearly segmented, and so speech recognition has been an active research problem for decades. In this work we have fine-tuned wav2vec 2.0 to recognize and transcribe Bengali speech -- training it on the Bengali Common Voice Speech Dataset. After training for 71 epochs, on a training set consisting of 36919 mp3…
▽ More
Speech is inherently continuous, where discrete words, phonemes and other units are not clearly segmented, and so speech recognition has been an active research problem for decades. In this work we have fine-tuned wav2vec 2.0 to recognize and transcribe Bengali speech -- training it on the Bengali Common Voice Speech Dataset. After training for 71 epochs, on a training set consisting of 36919 mp3 files, we achieved a training loss of 0.3172 and WER of 0.2524 on a validation set of size 7,747. Using a 5-gram language model, the Levenshtein Distance was 2.6446 on a test set of size 7,747. Then the training set and validation set were combined, shuffled and split into 85-15 ratio. Training for 7 more epochs on this combined dataset yielded an improved Levenshtein Distance of 2.60753 on the test set. Our model was the best performing one, achieving a Levenshtein Distance of 6.234 on a hidden dataset, which was 1.1049 units lower than other competing submissions.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
GLASS: A Citizen-Centric Distributed Data-Sharing Model within an e-Governance Architecture
Authors:
Owen Lo,
William J. Buchanan,
Sarwar Sayeed,
Pavlos Papadopoulos,
Nikolaos Pitropakis,
Christos Chrysoulas
Abstract:
E-governance is a process that aims to enhance a government's ability to simplify all the processes that may involve government, citizens, businesses, and so on. The rapid evolution of digital technologies has often created the necessity for the establishment of an e-Governance model. There is often a need for an inclusive e-governance model with integrated multiactor governance services and where…
▽ More
E-governance is a process that aims to enhance a government's ability to simplify all the processes that may involve government, citizens, businesses, and so on. The rapid evolution of digital technologies has often created the necessity for the establishment of an e-Governance model. There is often a need for an inclusive e-governance model with integrated multiactor governance services and where a single market approach can be adopted. e-Governance often aims to minimise bureaucratic processes, while at the same time including a digital-by-default approach to public services. This aims at administrative efficiency and the reduction of bureaucratic processes. It can also improve government capabilities, and enhances trust and security, which brings confidence in governmental transactions. However, solid implementations of a distributed data sharing model within an e-governance architecture is far from a reality; hence, citizens of European countries often go through the tedious process of having their confidential information verified. This paper focuses on the sinGLe sign-on e-GovernAnce Paradigm based on a distributed file-exchange network for security, transparency, cost-effectiveness and trust (GLASS) model, which aims to ensure that a citizen can control their relationship with governmental agencies. The paper thus proposes an approach that integrates a permissioned blockchain with the InterPlanetary File System (IPFS). This method demonstrates how we may encrypt and store verifiable credentials of the GLASS ecosystem, such as academic awards, ID documents and so on, within IPFS in a secure manner and thus only allow trusted users to read a blockchain record, and obtain the encryption key. This allows for the decryption of a given verifiable credential that stored on IPFS. This paper outlines the creation of a demonstrator that proves the principles of the GLASS approach.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
PAN-DOMAIN: Privacy-preserving Sharing and Auditing of Infection Identifier Matching
Authors:
William Abramson,
William J. Buchanan,
Sarwar Sayeed,
Nikolaos Pitropakis,
Owen Lo
Abstract:
The spread of COVID-19 has highlighted the need for a robust contact tracing infrastructure that enables infected individuals to have their contacts traced, and followed up with a test. The key entities involved within a contact tracing infrastructure may include the Citizen, a Testing Centre (TC), a Health Authority (HA), and a Government Authority (GA). Typically, these different domains need to…
▽ More
The spread of COVID-19 has highlighted the need for a robust contact tracing infrastructure that enables infected individuals to have their contacts traced, and followed up with a test. The key entities involved within a contact tracing infrastructure may include the Citizen, a Testing Centre (TC), a Health Authority (HA), and a Government Authority (GA). Typically, these different domains need to communicate with each other about an individual. A common approach is when a citizen discloses his personally identifiable information to both the HA a TC, if the test result comes positive, the information is used by the TC to alert the HA. Along with this, there can be other trusted entities that have other key elements of data related to the citizen. However, the existing approaches comprise severe flaws in terms of privacy and security. Additionally, the aforementioned approaches are not transparent and often being questioned for the efficacy of the implementations. In order to overcome the challenges, this paper outlines the PAN-DOMAIN infrastructure that allows for citizen identifiers to be matched amongst the TA, the HA and the GA. PAN-DOMAIN ensures that the citizen can keep control of the map** between the trusted entities using a trusted converter, and has access to an audit log.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Performance Analysis of TLS for Quantum Robust Cryptography on a Constrained Device
Authors:
Jon Barton,
William J Buchanan,
Nikolaos Pitropakis,
Sarwar Sayeed,
Will Abramson
Abstract:
Advances in quantum computing make Shor's algorithm for factorising numbers ever more tractable. This threatens the security of any cryptographic system which often relies on the difficulty of factorisation. It also threatens methods based on discrete logarithms, such as with the Diffie-Hellman key exchange method. For a cryptographic system to remain secure against a quantum adversary, we need to…
▽ More
Advances in quantum computing make Shor's algorithm for factorising numbers ever more tractable. This threatens the security of any cryptographic system which often relies on the difficulty of factorisation. It also threatens methods based on discrete logarithms, such as with the Diffie-Hellman key exchange method. For a cryptographic system to remain secure against a quantum adversary, we need to build methods based on a hard mathematical problem, which are not susceptible to Shor's algorithm and which create Post Quantum Cryptography (PQC). While high-powered computing devices may be able to run these new methods, we need to investigate how well these methods run on limited powered devices. This paper outlines an evaluation framework for PQC within constrained devices, and contributes to the area by providing benchmarks of the front-running algorithms on a popular single-board low-power device.
△ Less
Submitted 7 February, 2022; v1 submitted 20 September, 2019;
originally announced December 2019.
-
A Framework for Providing E-Services to the Rural Areas using Wireless Ad Hoc and Sensor Networks
Authors:
Al-Sakib Khan Pathan,
Humayun Kadir Islam,
Sabit Anjum Sayeed,
Farruk Ahmed,
Choong Seon Hong
Abstract:
In recent years, the proliferation of mobile computing devices has driven a revolutionary change in the computing world. The nature of ubiquitous devices makes wireless networks the easiest solution for their interconnection. This has led to the rapid growth of several wireless systems like wireless ad hoc networks, wireless sensor networks etc. In this paper we have proposed a framework for rur…
▽ More
In recent years, the proliferation of mobile computing devices has driven a revolutionary change in the computing world. The nature of ubiquitous devices makes wireless networks the easiest solution for their interconnection. This has led to the rapid growth of several wireless systems like wireless ad hoc networks, wireless sensor networks etc. In this paper we have proposed a framework for rural development by providing various e-services to the rural areas with the help of wireless ad hoc and sensor networks. We have discussed how timely and accurate information could be collected from the rural areas using wireless technologies. In addition to this, we have also mentioned the technical and operational challenges that could hinder the implementation of such a framework in the rural areas in the develo** countries.
△ Less
Submitted 26 December, 2007;
originally announced December 2007.