-
Enhancing Bangla Fake News Detection Using Bidirectional Gated Recurrent Units and Deep Learning Techniques
Authors:
Utsha Roy,
Mst. Sazia Tahosin,
Md. Mahedi Hassan,
Taminul Islam,
Fahim Imtiaz,
Md Rezwane Sadik,
Yassine Maleh,
Rejwan Bin Sulaiman,
Md. Simul Hasan Talukder
Abstract:
The rise of fake news has made the need for effective detection methods, including in languages other than English, increasingly important. The study aims to address the challenges of Bangla which is considered a less important language. To this end, a complete dataset containing about 50,000 news items is proposed. Several deep learning models have been tested on this dataset, including the bidir…
▽ More
The rise of fake news has made the need for effective detection methods, including in languages other than English, increasingly important. The study aims to address the challenges of Bangla which is considered a less important language. To this end, a complete dataset containing about 50,000 news items is proposed. Several deep learning models have been tested on this dataset, including the bidirectional gated recurrent unit (GRU), the long short-term memory (LSTM), the 1D convolutional neural network (CNN), and hybrid architectures. For this research, we assessed the efficacy of the model utilizing a range of useful measures, including recall, precision, F1 score, and accuracy. This was done by employing a big application. We carry out comprehensive trials to show the effectiveness of these models in identifying bogus news in Bangla, with the Bidirectional GRU model having a stunning accuracy of 99.16%. Our analysis highlights the importance of dataset balance and the need for continual improvement efforts to a substantial degree. This study makes a major contribution to the creation of Bangla fake news detecting systems with limited resources, thereby setting the stage for future improvements in the detection process.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Could We Generate Cytology Images from Histopathology Images? An Empirical Study
Authors:
Soumyajyoti Dey,
Sukanta Chakraborty,
Utso Guha Roy,
Nibaran Das
Abstract:
Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implemen…
▽ More
Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implementation. To mitigate the shortage of data, different generative models are proposed for data augmentation purposes which can boost the classification performances. For this, different synthetic medical image data generation models are developed to increase the dataset. Unpaired image-to-image translation models here shift the source domain to the target domain. In the breast malignancy identification domain, FNAC is one of the low-cost low-invasive modalities normally used by medical practitioners. But availability of public datasets in this domain is very poor. Whereas, for automation of cytology images, we need a large amount of annotated data. Therefore synthetic cytology images are generated by translating breast histopathology samples which are publicly available. In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer. Further, it is observed that the generated cytology images are quite similar to real breast cytology samples by measuring FID and KID scores.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Fuzzy Rank-based Late Fusion Technique for Cytology image Segmentation
Authors:
Soumyajyoti Dey,
Sukanta Chakraborty,
Utso Guha Roy,
Nibaran Das
Abstract:
Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlap** regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-…
▽ More
Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlap** regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-based late fusion techniques for cytology image segmentation. This fusion rule integrates three traditional semantic segmentation models UNet, SegNet, and PSPNet. The technique is applied on two cytology image datasets, i.e., cervical cytology(HErlev) and breast cytology(JUCYT-v1) image datasets. We have achieved maximum MeanIoU score 84.27% and 83.79% on the HErlev dataset and JUCYT-v1 dataset after the proposed late fusion technique, respectively which are better than that of the traditional fusion rules such as average probability, geometric mean, Borda Count, etc. The codes of the proposed model are available on GitHub.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
LAReQA: Language-agnostic answer retrieval from a multilingual pool
Authors:
Uma Roy,
Noah Constant,
Rami Al-Rfou,
Aditya Barua,
Aaron Phillips,
Yinfei Yang
Abstract:
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strateg…
▽ More
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, the embedding baseline that performs the best on LAReQA falls short of competing baselines on zero-shot variants of our task that only target "weak" alignment. This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
Online Multi-Armed Bandit
Authors:
Uma Roy,
Ashwath Thirmulai,
Joe Zurier
Abstract:
We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this…
▽ More
We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber($p_i$) for some underlying mean $p_i$) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.
-
Stable Drug Designing By Minimizing Drug Protein Interaction Energy Using PSO
Authors:
Anupam Ghosh,
Mainak Talukdar,
Uttam Kumar Roy
Abstract:
Each and every biological function in living organism happens as a result of protein-protein interactions.The diseases are no exception to this. Identifying one or more proteins for a particular disease and then designing a suitable chemical compound (known as drug) to destroy these proteins has been an interesting topic of research in bio-informatics. In previous methods, drugs were designed usin…
▽ More
Each and every biological function in living organism happens as a result of protein-protein interactions.The diseases are no exception to this. Identifying one or more proteins for a particular disease and then designing a suitable chemical compound (known as drug) to destroy these proteins has been an interesting topic of research in bio-informatics. In previous methods, drugs were designed using only seven chemical components and were represented as a fixed-length tree. But in reality, a drug contains many chemical groups collectively known as pharmacophore. Moreover, the chemical length of the drug cannot be determined before designing the drug.In the present work, a Particle Swarm Optimization (PSO) based methodology has been proposed to find out a suitable drug for a particular disease so that the drug-protein interaction becomes stable. In the proposed algorithm, the drug is represented as a variable length tree and essential functional groups are arranged in different positions of that drug. Finally, the structure of the drug is obtained and its docking energy is minimized simultaneously. Also, the orientation of chemical groups in the drug is tested so that it can bind to a particular active site of a target protein and the drug fits well inside the active site of target protein. Here, several inter-molecular forces have been considered for accuracy of the docking energy. Results show that PSO performs better than the earlier methods.
△ Less
Submitted 30 July, 2015;
originally announced July 2015.