-
LEIA: Linguistic Embeddings for the Identification of Affect
Authors:
Segun Taofeek Aroyehun,
Lukas Malik,
Hannah Metzler,
Nikolas Haimerl,
Anna Di Natale,
David Garcia
Abstract:
The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the pr…
▽ More
The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA's robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer. The models produced for this article are publicly available at https://huggingface.co/LEIA
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices
Authors:
Amit Chaulwar,
Lukas Malik,
Maciej Krajewski,
Felix Reichel,
Leif-Nissen Lundbæk,
Michael Huth,
Bartlomiej Matejczyk
Abstract:
Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model.…
▽ More
Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.
△ Less
Submitted 29 June, 2022;
originally announced July 2022.
-
Accurate, Fast and Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes
Authors:
Avi Srivastava,
Hirak Sarkar,
Laraib Malik,
Rob Patro
Abstract:
Motivation: De novo transcriptome assembly of non-model organisms is the first major step for many RNA-seq analysis tasks. Current methods for de novo assembly often report a large number of contiguous sequences (contigs), which may be fractured and incomplete sequences instead of full-length transcripts. Dealing with a large number of such contigs can slow and complicate downstream analysis.
Re…
▽ More
Motivation: De novo transcriptome assembly of non-model organisms is the first major step for many RNA-seq analysis tasks. Current methods for de novo assembly often report a large number of contiguous sequences (contigs), which may be fractured and incomplete sequences instead of full-length transcripts. Dealing with a large number of such contigs can slow and complicate downstream analysis.
Results :We present a method for clustering contigs from de novo transcriptome assemblies based upon the relationships exposed by multi-map** sequencing fragments. Specifically, we cast the problem of clustering contigs as one of clustering a sparse graph that is induced by equivalence classes of fragments that map to subsets of the transcriptome. Leveraging recent developments in efficient read map** and transcript quantification, we have developed RapClust, a tool implementing this approach that is capable of accurately clustering most large de novo transcriptomes in a matter of minutes, while simultaneously providing accurate estimates of expression for the resulting clusters. We compare RapClust against a number of tools commonly used for de novo transcriptome clustering. Using de novo assemblies of organisms for which reference genomes are available, we assess the accuracy of these different methods in terms of the quality of the resulting clusterings, and the concordance of differential expression tests with those based on ground truth clusters. We find that RapClust produces clusters of comparable or better quality than existing state-of-the-art approaches, and does so substantially faster. RapClust also confers a large benefit in terms of space usage, as it produces only succinct intermediate files - usually on the order of a few megabytes - even when processing hundreds of millions of reads.
△ Less
Submitted 12 April, 2016;
originally announced April 2016.
-
PDD Crawler: A focused web crawler using link and content analysis for relevance prediction
Authors:
Prashant Dahiwale,
M M Raghuwanshi,
Latesh malik
Abstract:
Majority of the computer or mobile phone enthusiasts make use of the web for searching activity. Web search engines are used for the searching; The results that the search engines get are provided to it by a software module known as the Web Crawler. The size of this web is increasing round-the-clock. The principal problem is to search this huge database for specific information. To state whether a…
▽ More
Majority of the computer or mobile phone enthusiasts make use of the web for searching activity. Web search engines are used for the searching; The results that the search engines get are provided to it by a software module known as the Web Crawler. The size of this web is increasing round-the-clock. The principal problem is to search this huge database for specific information. To state whether a web page is relevant to a search topic is a dilemma. This paper proposes a crawler called as PDD crawler which will follow both a link based as well as a content based approach. This crawler follows a completely new crawling strategy to compute the relevance of the page. It analyses the content of the page based on the information contained in various tags within the HTML source code and then computes the total weight of the page. The page with the highest weight, thus has the maximum content and highest relevance.
△ Less
Submitted 17 November, 2014;
originally announced November 2014.
-
Design and Implementation of Hierarchical Visual cryptography with Expansion less Shares
Authors:
Pallavi Vijay Chavan,
Dr. Mohammad Atique,
Dr. Latesh Malik
Abstract:
Novel idea of hierarchical visual cryptography is stated in this paper. The key concept of hierarchical visual cryptography is based upon visual cryptography. Visual cryptography encrypts secret information into two pieces called as shares. These two shares are stacked together by logical XOR operation to reveal the original secret. Hierarchical visual cryptography encrypts the secret in various l…
▽ More
Novel idea of hierarchical visual cryptography is stated in this paper. The key concept of hierarchical visual cryptography is based upon visual cryptography. Visual cryptography encrypts secret information into two pieces called as shares. These two shares are stacked together by logical XOR operation to reveal the original secret. Hierarchical visual cryptography encrypts the secret in various levels. The encryption in turn is expansionless. The original secret size is retained in the shares at all levels. In this paper secret is encrypted at two different levels. Four shares are generated out of hierarchical visual cryptography. Any three shares are collectively taken to form the key share. All shares generated are meaningless giving no information by visual inspection. Performance analysis is also obtained based upon various categories of secrets. The greying effect is completely removed while revealing the secret Removal of greying effect do not change the meaning of secret.
△ Less
Submitted 12 February, 2014;
originally announced February 2014.
-
Classification Of Gradient Change Features Using MLP For Handwritten Character Recognition
Authors:
Sandhya Arora,
Latesh Malik,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
A novel, generic scheme for off-line handwritten English alphabets character images is proposed. The advantage of the technique is that it can be applied in a generic manner to different applications and is expected to perform better in uncertain and noisy environments. The recognition scheme is using a multilayer perceptron(MLP) neural networks. The system was trained and tested on a database of…
▽ More
A novel, generic scheme for off-line handwritten English alphabets character images is proposed. The advantage of the technique is that it can be applied in a generic manner to different applications and is expected to perform better in uncertain and noisy environments. The recognition scheme is using a multilayer perceptron(MLP) neural networks. The system was trained and tested on a database of 300 samples of handwritten characters. For improved generalization and to avoid overtraining, the whole available dataset has been divided into two subsets: training set and test set. We achieved 99.10% and 94.15% correct recognition rates on training and test sets respectively. The purposed scheme is robust with respect to various writing styles and size as well as presence of considerable noise.
△ Less
Submitted 30 June, 2010;
originally announced June 2010.
-
A novel approach for handwritten Devnagari character recognition
Authors:
Sandhya Arora,
Latesh Malik,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
In this paper a method for recognition of handwritten devanagari characters is described. Here, feature vector is constituted by accumulated directional gradient changes in different segments, number of intersections points for the character, type of spine present and type of shirorekha present in the character. One Multi-layer Perceptron with conjugate-gradient training is used to classify these…
▽ More
In this paper a method for recognition of handwritten devanagari characters is described. Here, feature vector is constituted by accumulated directional gradient changes in different segments, number of intersections points for the character, type of spine present and type of shirorekha present in the character. One Multi-layer Perceptron with conjugate-gradient training is used to classify these feature vectors. This method is applied to a database with 1000 sample characters and the recognition rate obtained is 88.12%
△ Less
Submitted 30 June, 2010;
originally announced June 2010.
-
A Two Stage Classification Approach for Handwritten Devanagari Characters
Authors:
Sandhya Arora,
Debotosh Bhattacharjee,
Mita Nasipuri,
Latesh Malik
Abstract:
The paper presents a two stage classification approach for handwritten devanagari characters The first stage is using structural properties like shirorekha, spine in character and second stage exploits some intersection features of characters which are fed to a feedforward neural network. Simple histogram based method does not work for finding shirorekha, vertical bar (Spine) in handwritten devnag…
▽ More
The paper presents a two stage classification approach for handwritten devanagari characters The first stage is using structural properties like shirorekha, spine in character and second stage exploits some intersection features of characters which are fed to a feedforward neural network. Simple histogram based method does not work for finding shirorekha, vertical bar (Spine) in handwritten devnagari characters. So we designed a differential distance based technique to find a near straight line for shirorekha and spine. This approach has been tested for 50000 samples and we got 89.12% success
△ Less
Submitted 30 June, 2010;
originally announced June 2010.
-
Performance Comparison of SVM and ANN for Handwritten Devnagari Character Recognition
Authors:
Sandhya Arora,
Debotosh Bhattacharjee,
Mita Nasipuri,
L. Malik,
M. Kundu,
D. K. Basu
Abstract:
Classification methods based on learning from examples have been widely applied to character recognition from the 1990s and have brought forth significant improvements of recognition accuracies. This class of methods includes statistical methods, artificial neural networks, support vector machines (SVM), multiple classifier combination, etc. In this paper, we discuss the characteristics of the som…
▽ More
Classification methods based on learning from examples have been widely applied to character recognition from the 1990s and have brought forth significant improvements of recognition accuracies. This class of methods includes statistical methods, artificial neural networks, support vector machines (SVM), multiple classifier combination, etc. In this paper, we discuss the characteristics of the some classification methods that have been successfully applied to handwritten Devnagari character recognition and results of SVM and ANNs classification method, applied on Handwritten Devnagari characters. After preprocessing the character image, we extracted shadow features, chain code histogram features, view based features and longest run features. These features are then fed to Neural classifier and in support vector machine for classification. In neural classifier, we explored three ways of combining decisions of four MLP's designed for four different features.
△ Less
Submitted 30 June, 2010;
originally announced June 2010.