-
An End-to-End System for Reproducibility Assessment of Source Code Repositories via Their Readmes
Authors:
Eyüp Kaan Akdeniz,
Selma Tekir,
Malik Nizar Asad Al Hinnawi
Abstract:
Increased reproducibility of machine learning research has been a driving force for dramatic improvements in learning performances. The scientific community further fosters this effort by including reproducibility ratings in reviewer forms and considering them as a crucial factor for the overall evaluation of papers. Accompanying source code is not sufficient to make a work reproducible. The share…
▽ More
Increased reproducibility of machine learning research has been a driving force for dramatic improvements in learning performances. The scientific community further fosters this effort by including reproducibility ratings in reviewer forms and considering them as a crucial factor for the overall evaluation of papers. Accompanying source code is not sufficient to make a work reproducible. The shared codes should meet the ML reproducibility checklist as well. This work aims to support reproducibility evaluations of papers with source codes. We propose an end-to-end system that operates on the Readme file of the source code repositories. The system checks the compliance of a given Readme to a template proposed by a widely used platform for sharing source codes of research. Our system generates scores based on a custom function to combine section scores. We also train a hierarchical transformer model to assign a class label to a given Readme. The experimental results show that the section similarity-based system performs better than the hierarchical transformer. Moreover, it has an advantage regarding explainability since one can directly relate the score to the sections of Readme files.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
A Survey On Neural Word Embeddings
Authors:
Erhan Sezerer,
Selma Tekir
Abstract:
Understanding human language has been a sub-challenge on the way of intelligent machines. The study of meaning in natural language processing (NLP) relies on the distributional hypothesis where language elements get meaning from the words that co-occur within contexts. The revolutionary idea of distributed representation for a concept is close to the working of a human mind in that the meaning of…
▽ More
Understanding human language has been a sub-challenge on the way of intelligent machines. The study of meaning in natural language processing (NLP) relies on the distributional hypothesis where language elements get meaning from the words that co-occur within contexts. The revolutionary idea of distributed representation for a concept is close to the working of a human mind in that the meaning of a word is spread across several neurons, and a loss of activation will only slightly affect the memory retrieval process.
Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks. In this survey, we provide a comprehensive literature review on neural word embeddings. We give theoretical foundations and describe existing work by an interplay between word embeddings and language modelling. We provide broad coverage on neural word embeddings, including early word embeddings, embeddings targeting specific semantic relations, sense embeddings, morpheme embeddings, and finally, contextual representations. Finally, we describe benchmark datasets in word embeddings' performance evaluation and downstream tasks along with the performance results of/due to word embeddings.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Leveraging Commonsense Knowledge on Classifying False News and Determining Checkworthiness of Claims
Authors:
Ipek Baris Schlicht,
Erhan Sezerer,
Selma Tekir,
Oul Han,
Zeyd Boukhers
Abstract:
Widespread and rapid dissemination of false news has made fact-checking an indispensable requirement. Given its time-consuming and labor-intensive nature, the task calls for an automated support to meet the demand. In this paper, we propose to leverage commonsense knowledge for the tasks of false news classification and check-worthy claim detection. Arguing that commonsense knowledge is a factor i…
▽ More
Widespread and rapid dissemination of false news has made fact-checking an indispensable requirement. Given its time-consuming and labor-intensive nature, the task calls for an automated support to meet the demand. In this paper, we propose to leverage commonsense knowledge for the tasks of false news classification and check-worthy claim detection. Arguing that commonsense knowledge is a factor in human believability, we fine-tune the BERT language model with a commonsense question answering task and the aforementioned tasks in a multi-task learning environment. For predicting fine-grained false news types, we compare the proposed fine-tuned model's performance with the false news classification models on a public dataset as well as a newly collected dataset. We compare the model's performance with the single-task BERT model and a state-of-the-art check-worthy claim detection tool to evaluate the check-worthy claim detection. Our experimental analysis demonstrates that commonsense knowledge can improve performance in both tasks.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
Automatic Story Construction from News Articles in an Online Fashion
Authors:
Özgür Can,
Selma Tekir
Abstract:
This paper presents a novel story construction system to track the evolution of stories in an online fashion. The proposed system uses a novel sliding window solution, named Inching Window, allowing the processing of each new data element on-the-fly. To assign a new data element into a community in a fast and memory-efficient manner, we apply the modularity maximization idea of Louvain method on-t…
▽ More
This paper presents a novel story construction system to track the evolution of stories in an online fashion. The proposed system uses a novel sliding window solution, named Inching Window, allowing the processing of each new data element on-the-fly. To assign a new data element into a community in a fast and memory-efficient manner, we apply the modularity maximization idea of Louvain method on-the-fly. As part of the experimental validation, we provide step by step construction of a meaningful news story and support the case with a set of visualizations.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Gender Prediction from Tweets: Improving Neural Representations with Hand-Crafted Features
Authors:
Erhan Sezerer,
Ozan Polatbilek,
Selma Tekir
Abstract:
Author profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn 'where to look'. This model (https://github.com/Darg-Iztech/gender-prediction-from-tweets) is im…
▽ More
Author profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn 'where to look'. This model (https://github.com/Darg-Iztech/gender-prediction-from-tweets) is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic.
△ Less
Submitted 6 September, 2019; v1 submitted 22 August, 2019;
originally announced August 2019.