CompText: Visualizing, Comparing & Understanding Text Corpus
Authors:
Suvi Varshney,
Divjeet Singh Jas
Abstract:
A common practice in Natural Language Processing (NLP) is to visualize the text corpus without reading through the entire literature, still gras** the central idea and key points described. For a long time, researchers focused on extracting topics from the text and visualizing them based on their relative significance in the corpus. However, recently, researchers started coming up with more comp…
▽ More
A common practice in Natural Language Processing (NLP) is to visualize the text corpus without reading through the entire literature, still gras** the central idea and key points described. For a long time, researchers focused on extracting topics from the text and visualizing them based on their relative significance in the corpus. However, recently, researchers started coming up with more complex systems that not only expose the topics of the corpus but also word closely related to the topic to give users a holistic view. These detailed visualizations spawned research on comparing text corpora based on their visualization. Topics are often compared to idealize the difference between corpora. However, to capture greater semantics from different corpora, researchers have started to compare texts based on the sentiment of the topics related to the text. Comparing the words carrying the most weightage, we can get an idea about the important topics for corpus. There are multiple existing texts comparing methods present that compare topics rather than sentiments but we feel that focusing on sentiment-carrying words would better compare the two corpora. Since only sentiments can explain the real feeling of the text and not just the topic, topics without sentiments are just nouns. We aim to differentiate the corpus with a focus on sentiment, as opposed to comparing all the words appearing in the two corpora. The rationale behind this is, that the two corpora do not many have identical words for side-by-side comparison, so comparing the sentiment words gives us an idea of how the corpora are appealing to the emotions of the reader. We can argue that the entropy or the unexpectedness and divergence of topics should also be of importance and help us to identify key pivot points and the importance of certain topics in the corpus alongside relative sentiment.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
Forking Around: Correlation of forking practices with the success of a project
Authors:
Anurag Dhasmana,
Arindaam Roy,
Divjeet Singh Jas,
Kiranpreet Kaur,
Pinn Prugsanapan
Abstract:
Forking-based development has made it easier and straightforward for developers to contribute to open-source software (OSS). Developers can fork an existing project and add changes in their local version without interrupting the development process in the main project. Despite the efficiency of OSS, more than 80% of the projects are not sustainable. Identifying the elements related to OSS success…
▽ More
Forking-based development has made it easier and straightforward for developers to contribute to open-source software (OSS). Developers can fork an existing project and add changes in their local version without interrupting the development process in the main project. Despite the efficiency of OSS, more than 80% of the projects are not sustainable. Identifying the elements related to OSS success can enlighten developers regarding the sustainability of a project. In our study, we explore whether or not the inefficiencies which arise due to forking-based development like redundant development, fragmented communities, lack of modularity, etc. have any relation to the outcome of a project in terms of sustainability. We formulate eight metrics to quantify attributes for projects in the ASFI dataset. To find the correlation between the metrics and the success of a project, we built a logistic regression model to metrics with significant p-values and performed backward stepwise regression analysis, using the stepAIC function in R to cross-check our findings. The findings show that modularity, centralized management index, and hard forks are consequential for the success of a project. Developers can use the outcomes of our research to plan and structure their projects to increase the probability of their success.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.