Search | arXiv e-print repository

HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models

Authors: Tanmay Sen, Ansuman Das, Mrinmay Sen

Abstract: Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups based on sensitive characteristics. Automated hate speech detection plays a crucial role in curbing its propagation, especially across social media platforms. Various methods, including recent advancements in deep learning, have been devised to addre… ▽ More Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups based on sensitive characteristics. Automated hate speech detection plays a crucial role in curbing its propagation, especially across social media platforms. Various methods, including recent advancements in deep learning, have been devised to address this challenge. In this study, we introduce HateTinyLLM, a novel framework based on fine-tuned decoder-only tiny large language models (tinyLLMs) for efficient hate speech detection. Our experimental findings demonstrate that the fine-tuned HateTinyLLM outperforms the pretrained mixtral-7b model by a significant margin. We explored various tiny LLMs, including PY007/TinyLlama-1.1B-step-50K-105b, Microsoft/phi-2, and facebook/opt-1.3b, and fine-tuned them using LoRA and adapter methods. Our observations indicate that all LoRA-based fine-tuned models achieved over 80\% accuracy. △ Less

Submitted 26 April, 2024; originally announced May 2024.

arXiv:2403.11041 [pdf, other]

FAGH: Accelerating Federated Learning with Approximated Global Hessian

Authors: Mrinmay Sen, A. K. Qin, Krishna Mohan C

Abstract: In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the exi… ▽ More In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the existing Newton-based FL training methods suffer from either memory inefficiency or high computational costs for local clients or the server. To address this issue, we propose an FL with approximated global Hessian (FAGH) method to accelerate FL training. FAGH leverages the first moment of the approximated global Hessian and the first moment of the global gradient to train the global model. By harnessing the approximated global Hessian curvature, FAGH accelerates the convergence of global model training, leading to the reduced number of communication rounds and thus the shortened training time. Experimental results verify FAGH's effectiveness in decreasing the number of communication rounds and the time required to achieve the pre-specified objectives of the global model performance in terms of training and test losses as well as test accuracy. Notably, FAGH outperforms several state-of-the-art FL training methods. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.02833 [pdf, other]

SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix

Authors: Mrinmay Sen, A. K. Qin, Gayathri C, Raghu Kishore N, Yen-Wei Chen, Balasubramanian Raman

Abstract: This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and… ▽ More This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and calculating the full FIM is addressed through making use of the regularized FIM and directly finding the gradient update direction via Sherman-Morrison matrix inversion. Additionally, like the popular Adam method, SOFIM uses the first moment of the gradient to address the issue of non-stationary objectives across mini-batches due to heterogeneous data. The utilization of the regularized FIM and Sherman-Morrison matrix inversion leads to the improved convergence rate with the same space and time complexities as stochastic gradient descent (SGD) with momentum. The extensive experiments on training deep learning models using several benchmark image classification datasets demonstrate that the proposed SOFIM outperforms SGD with momentum and several state-of-the-art Newton optimization methods in term of the convergence speed for achieving the pre-specified objectives of training and test losses as well as test accuracy. △ Less

Submitted 1 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.15945 [pdf]

Attention-GAN for Anomaly Detection: A Cutting-Edge Approach to Cybersecurity Threat Management

Authors: Mohammed Abo Sen

Abstract: This paper proposes an innovative Attention-GAN framework for enhancing cybersecurity, focusing on anomaly detection. In response to the challenges posed by the constantly evolving nature of cyber threats, the proposed approach aims to generate diverse and realistic synthetic attack scenarios, thereby enriching the dataset and improving threat identification. Integrating attention mechanisms with… ▽ More This paper proposes an innovative Attention-GAN framework for enhancing cybersecurity, focusing on anomaly detection. In response to the challenges posed by the constantly evolving nature of cyber threats, the proposed approach aims to generate diverse and realistic synthetic attack scenarios, thereby enriching the dataset and improving threat identification. Integrating attention mechanisms with Generative Adversarial Networks (GANs) is a key feature of the proposed method. The attention mechanism enhances the model's ability to focus on relevant features, essential for detecting subtle and complex attack patterns. In addition, GANs address the issue of data scarcity by generating additional varied attack data, encompassing known and emerging threats. This dual approach ensures that the system remains relevant and effective against the continuously evolving cyberattacks. The KDD Cup and CICIDS2017 datasets were used to validate this model, which exhibited significant improvements in anomaly detection. It achieved an accuracy of 99.69% on the KDD dataset and 97.93% on the CICIDS2017 dataset, with precision, recall, and F1-scores above 97%, demonstrating its effectiveness in recognizing complex attack patterns. This study contributes significantly to cybersecurity by providing a scalable and adaptable solution for anomaly detection in the face of sophisticated and dynamic cyber threats. The exploration of GANs for data augmentation highlights a promising direction for future research, particularly in situations where data limitations restrict the development of cybersecurity systems. The attention-GAN framework has emerged as a pioneering approach, setting a new benchmark for advanced cyber-defense strategies. △ Less

Submitted 27 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2207.11782 [pdf, other]

Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish

Authors: Büşra Marşan, Salih Furkan Akkurt, Muhammet Şen, Merve Gürbüz, Onur Güngör, Şaziye Betül Özateş, Suzan Üsküdarlı, Arzucan Özgür, Tunga Güngör, Balkız Öztürk

Abstract: In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemma… ▽ More In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: This is a peer reviewed article that has been presented in The International Conference on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP) 2022

arXiv:2105.08506 [pdf, other]

COVID-19 Detection in Computed Tomography Images with 2D and 3D Approaches

Authors: Sara Atito Ali Ahmed, Mehmet Can Yavuz, Mehmet Umut Sen, Fatih Gulsen, Onur Tutar, Bora Korkmazer, Cesur Samanci, Sabri Sirolu, Rauf Hamid, Ali Ergun Eryurekli, Toghrul Mammadov, Berrin Yanikoglu

Abstract: Detecting COVID-19 in computed tomography (CT) or radiography images has been proposed as a supplement to the definitive RT-PCR test. We present a deep learning ensemble for detecting COVID-19 infection, combining slice-based (2D) and volume-based (3D) approaches. The 2D system detects the infection on each CT slice independently, combining them to obtain the patient-level decision via different m… ▽ More Detecting COVID-19 in computed tomography (CT) or radiography images has been proposed as a supplement to the definitive RT-PCR test. We present a deep learning ensemble for detecting COVID-19 infection, combining slice-based (2D) and volume-based (3D) approaches. The 2D system detects the infection on each CT slice independently, combining them to obtain the patient-level decision via different methods (averaging and long-short term memory networks). The 3D system takes the whole CT volume to arrive to the patient-level decision in one step. A new high resolution chest CT scan dataset, called the IST-C dataset, is also collected in this work. The proposed ensemble, called IST-CovNet, obtains 90.80% accuracy and 0.95 AUC score overall on the IST-C dataset in detecting COVID-19 among normal controls and other types of lung pathologies; and 93.69% accuracy and 0.99 AUC score on the publicly available MosMed dataset that consists of COVID-19 scans and normal controls only. The system is deployed at Istanbul University Cerrahpasa School of Medicine. △ Less

Submitted 20 May, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

arXiv:1607.02922 [pdf, other]

Characterization and recognition of proper tagged probe interval graphs

Authors: Sanchita Paul, Shamik Ghosh, Sourav Chakraborty, Malay Sen

Abstract: Interval graphs were used in the study of genomics by the famous molecular biologist Benzer. Later on probe interval graphs were introduced by Zhang as a generalization of interval graphs for the study of cosmid contig map** of DNA. A tagged probe interval graph (briefly, TPIG) is motivated by similar applications to genomics, where the set of vertices is partitioned into two sets, namely, pro… ▽ More Interval graphs were used in the study of genomics by the famous molecular biologist Benzer. Later on probe interval graphs were introduced by Zhang as a generalization of interval graphs for the study of cosmid contig map** of DNA. A tagged probe interval graph (briefly, TPIG) is motivated by similar applications to genomics, where the set of vertices is partitioned into two sets, namely, probes and nonprobes and there is an interval on the real line corresponding to each vertex. The graph has an edge between two probe vertices if their corresponding intervals intersect, has an edge between a probe vertex and a nonprobe vertex if the interval corresponding to a nonprobe vertex contains at least one end point of the interval corresponding to a probe vertex and the set of non-probe vertices is an independent set. This class of graphs have been defined nearly two decades ago, but till today there is no known recognition algorithm for it. In this paper, we consider a natural subclass of TPIG, namely, the class of proper tagged probe interval graphs (in short PTPIG). We present characterization and a linear time recognition algorithm for PTPIG. To obtain this characterization theorem we introduce a new concept called canonical sequence for proper interval graphs, which, we belief, has an independent interest in the study of proper interval graphs. Also to obtain the recognition algorithm for PTPIG, we introduce and solve a variation of consecutive $1$'s problem, namely, oriented consecutive $1$'s problem and some variations of PQ-tree algorithm. We also discuss the interrelations between the classes of PTPIG and TPIG with probe interval graphs and probe proper interval graphs. △ Less

Submitted 27 July, 2018; v1 submitted 11 July, 2016; originally announced July 2016.

Comments: 40 pages, 3 figures

MSC Class: 05C62; 05C75; 05C85

arXiv:1311.2746 [pdf, other]

Deep neural networks for single channel source separation

Authors: Emad M. Grais, Mehmet Umut Sen, Hakan Erdogan

Abstract: In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stag… ▽ More In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage, the training data for the source signals are used to train a DNN. In the separation stage, the trained DNN is utilized to aid in estimation of each source in the mixed signal. Single channel source separation problem is formulated as an energy minimization problem where each source spectra estimate is encouraged to fit the trained DNN model and the mixed signal spectrum is encouraged to be written as a weighted sum of the estimated source spectra. The proposed approach works regardless of the energy scale differences between the source signals in the training and separation stages. Nonnegative matrix factorization (NMF) is used to initialize the DNN estimate for each source. The experimental results show that using DNN initialized by NMF for source separation improves the quality of the separated signal compared with using NMF for source separation. △ Less

Submitted 12 November, 2013; originally announced November 2013.

Comments: 5 pages, 2 figures, 2 tables, submitted to ICASSP2014

arXiv:1106.1684 [pdf, other]

Max-Margin Stacking and Sparse Regularization for Linear Classifier Combination and Selection

Authors: Mehmet Umut Sen, Hakan Erdogan

Abstract: The main principle of stacked generalization (or Stacking) is using a second-level generalizer to combine the outputs of base classifiers in an ensemble. In this paper, we investigate different combination types under the stacking framework; namely weighted sum (WS), class-dependent weighted sum (CWS) and linear stacked generalization (LSG). For learning the weights, we propose using regularized e… ▽ More The main principle of stacked generalization (or Stacking) is using a second-level generalizer to combine the outputs of base classifiers in an ensemble. In this paper, we investigate different combination types under the stacking framework; namely weighted sum (WS), class-dependent weighted sum (CWS) and linear stacked generalization (LSG). For learning the weights, we propose using regularized empirical risk minimization with the hinge loss. In addition, we propose using group sparsity for regularization to facilitate classifier selection. We performed experiments using two different ensemble setups with differing diversities on 8 real-world datasets. Results show the power of regularized learning with the hinge loss function. Using sparse regularization, we are able to reduce the number of selected classifiers of the diverse ensemble without sacrificing accuracy. With the non-diverse ensembles, we even gain accuracy on average by using sparse regularization. △ Less

Submitted 8 June, 2011; originally announced June 2011.

Comments: 8 pages, 3 figures, 6 tables, journal

arXiv:0811.2675 [pdf, ps, other]

Characterizations of probe interval graphs

Authors: Shamik Ghosh, Maitry Podder, Malay K. Sen

Abstract: In this paper we obtain several characterizations of the adjacency matrix of a probe interval graph. In course of this study we describe an easy method of obtaining interval representation of an interval bipartite graph from its adjacency matrix. Finally, we note that if we add a loop at every probe vertex of a probe interval graph, then the Ferrers dimension of the corresponding symmetric bipar… ▽ More In this paper we obtain several characterizations of the adjacency matrix of a probe interval graph. In course of this study we describe an easy method of obtaining interval representation of an interval bipartite graph from its adjacency matrix. Finally, we note that if we add a loop at every probe vertex of a probe interval graph, then the Ferrers dimension of the corresponding symmetric bipartite graph is at most 3. △ Less

Submitted 17 November, 2008; originally announced November 2008.

Showing 1–10 of 10 results for author: Şen, M