Skip to main content

Showing 1–16 of 16 results for author: Sokolova, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02349  [pdf

    cs.LG

    Explainable Multi-Label Classification of MBTI Types

    Authors: Siana Kong, Marina Sokolova

    Abstract: In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achi… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 22 pages, 12 tables, 2 figure

    ACM Class: I.2.6

  2. arXiv:2401.13805  [pdf

    cs.SI cs.IR

    Longitudinal Sentiment Topic Modelling of Reddit Posts

    Authors: Fabian Nwaoha, Ziyad Gaffar, Ho Joon Chun, Marina Sokolova

    Abstract: In this study, we analyze texts of Reddit posts written by students of four major Canadian universities. We gauge the emotional tone and uncover prevailing themes and discussions through longitudinal topic modeling of posts textual data. Our study focuses on four years, 2020-2023, covering COVID-19 pandemic and after pandemic years. Our results highlight a gradual uptick in discussions related to… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 21 pages, 4 figures, 13 tables. arXiv admin note: text overlap with arXiv:2401.12382

    ACM Class: I.2.7

  3. arXiv:2401.12382  [pdf

    cs.CL cs.LG cs.SI

    Longitudinal Sentiment Classification of Reddit Posts

    Authors: Fabian Nwaoha, Ziyad Gaffar, Ho Joon Chun, Marina Sokolova

    Abstract: We report results of a longitudinal sentiment classification of Reddit posts written by students of four major Canadian universities. We work with the texts of the posts, concentrating on the years 2020-2023. By finely tuning a sentiment threshold to a range of [-0.075,0.075], we successfully built classifiers proficient in categorizing post sentiments into positive and negative categories. Notice… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 11 pages, 10 figures, 4 tables

    ACM Class: I.2.6

  4. arXiv:2205.06863  [pdf

    cs.CL cs.LG cs.SI

    Sentiment Analysis of Covid-related Reddits

    Authors: Yilin Yang, Tomas Fieg, Marina Sokolova

    Abstract: This paper focuses on Sentiment Analysis of Covid-19 related messages from the r/Canada and r/Unitedkingdom subreddits of Reddit. We apply manual annotation and three Machine Learning algorithms to analyze sentiments conveyed in those messages. We use VADER and TextBlob to label messages for Machine Learning experiments. Our results show that removal of shortest and longest messages improves VADER… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 10 pages, 1 figure, 5 tables

    ACM Class: I.2.7; I.2.6

  5. arXiv:2108.06215  [pdf

    cs.IR cs.CL cs.LG cs.SI

    Sentiment Analysis of the COVID-related r/Depression Posts

    Authors: Zihan Chen, Marina Sokolova

    Abstract: Reddit.com is a popular social media platform among young people. Reddit users share their stories to seek support from other users, especially during the Covid-19 pandemic. Messages posted on Reddit and their content have provided researchers with opportunity to analyze public concerns. In this study, we analyzed sentiments of COVID-related messages posted on r/Depression. Our study poses the fol… ▽ More

    Submitted 28 July, 2021; originally announced August 2021.

    Comments: 16 pages, 7 figures, 5 tables, 1 appendix

    ACM Class: I.2; I.2.7

  6. arXiv:2105.13430  [pdf

    cs.LG cs.CY

    Explainable Multi-class Classification of the CAMH COVID-19 Mental Health Data

    Authors: YuanZheng Hu, Marina Sokolova

    Abstract: Application of Machine Learning algorithms to the medical domain is an emerging trend that helps to advance medical knowledge. At the same time, there is a significant a lack of explainable studies that promote informed, transparent, and interpretable use of Machine Learning algorithms. In this paper, we present explainable multi-class classification of the Covid-19 mental health data. In Machine… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: 22 pages, including Appendixes; 7 tables and 5 figures in the main text

    ACM Class: I.2

  7. arXiv:2012.14059  [pdf

    cs.LG

    Convolutional Neural Networks in Multi-Class Classification of Medical Data

    Authors: YuanZheng Hu, Marina Sokolova

    Abstract: We report applications of Convolutional Neural Networks (CNN) to multi-classification classification of a large medical data set. We discuss in detail how changes in the CNN model and the data pre-processing impact the classification results. In the end, we introduce an ensemble model that consists of both deep learning (CNN) and shallow learning models (Gradient Boosting). The method achieves Acc… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

    Comments: 13 pages; 14 tables

    ACM Class: I.2; J.3

  8. arXiv:2012.13796  [pdf

    cs.LG cs.PF

    Explainable Multi-class Classification of Medical Data

    Authors: YuanZheng Hu, Marina Sokolova

    Abstract: Machine Learning applications have brought new insights into a secondary analysis of medical data. Machine Learning helps to develop new drugs, define populations susceptible to certain illnesses, identify predictors of many common diseases. At the same time, Machine Learning results depend on convolution of many factors, including feature selection, class (im)balance, algorithm preference, and pe… ▽ More

    Submitted 26 December, 2020; originally announced December 2020.

    Comments: 21 pages; 23 tables; 2 appendixes

    ACM Class: I.2; J.3

  9. arXiv:2010.09574  [pdf

    cs.LG

    Machine Learning Evaluation of the Echo-Chamber Effect in Medical Forums

    Authors: Marina Sokolova, Victoria Bobicev

    Abstract: We propose the Echo-Chamber Effect assessment of an online forum. Sentiments perceived by the forum readers are at the core of the analysis; a complete message is the unit of the study. We build 14 models and apply those to represent discussions gathered from an online medical forum. We use four multi-class sentiment classification applications and two Machine Learning algorithms to evaluate prowe… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: 17 pages, including Appendix; 6 figures in the main text; 5 tables in the main text and 7 tables in Appendix

    ACM Class: I.2.6; I.2.m

  10. arXiv:1805.00352  [pdf

    cs.CL cs.LG

    Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries

    Authors: Qufei Chen, Marina Sokolova

    Abstract: In this study, we explored application of Word2Vec and Doc2Vec for sentiment analysis of clinical discharge summaries. We applied unsupervised learning since the data sets did not have sentiment annotations. Note that unsupervised learning is a more realistic scenario than supervised learning which requires an access to a training set of sentiment-annotated data. We aim to detect if there exists a… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: 23 pages, 3 figures, 16 tables

    MSC Class: 68T05; 68T50

  11. arXiv:1803.06390  [pdf

    cs.CL cs.IR cs.LG

    Corpus Statistics in Text Classification of Online Data

    Authors: Marina Sokolova, Victoria Bobicev

    Abstract: Transformation of Machine Learning (ML) from a boutique science to a generally accepted technology has increased importance of reproduction and transportability of ML studies. In the current work, we investigate how corpus characteristics of textual data sets correspond to text classification results. We work with two data sets gathered from sub-forums of an online health-related forum. Our empiri… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: 12 pages, 6 tables, 1 figure

    MSC Class: 68T05; 68T50

  12. arXiv:1802.09059  [pdf, other

    cs.LG cs.CL cs.IR stat.ML

    One Single Deep Bidirectional LSTM Network for Word Sense Disambiguation of Text Data

    Authors: Ahmad Pesaranghader, Ali Pesaranghader, Stan Matwin, Marina Sokolova

    Abstract: Due to recent technical and scientific advances, we have a wealth of information hidden in unstructured text data such as offline/online narratives, research articles, and clinical reports. To mine these data properly, attributable to their innate ambiguity, a Word Sense Disambiguation (WSD) algorithm can avoid numbers of difficulties in Natural Language Processing (NLP) pipeline. However, conside… ▽ More

    Submitted 25 February, 2018; originally announced February 2018.

    Comments: 12 pages, 1 figure, to appear in the Proceedings of the 31st Canadian Conference on Artificial Intelligence, 8-11 May, 2018, Toronto, Canada

  13. arXiv:1702.08866  [pdf

    cs.CL

    Studying Positive Speech on Twitter

    Authors: Marina Sokolova, Vera Sazonova, Kanyi Huang, Rudraneel Chakraboty, Stan Matwin

    Abstract: We present results of empirical studies on positive speech on Twitter. By positive speech we understand speech that works for the betterment of a given situation, in this case relations between different communities in a conflict-prone country. We worked with four Twitter data sets. Through semi-manual opinion mining, we found that positive speech accounted for < 1% of the data . In fully automate… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

    Comments: 13 pages, 6 tables

    ACM Class: I.2.6; I.2.7

  14. arXiv:1608.02519  [pdf

    cs.SI cs.CL

    Topic Modelling and Event Identification from Twitter Textual Data

    Authors: Marina Sokolova, Kanyi Huang, Stan Matwin, Joshua Ramisch, Vera Sazonova, Renee Black, Chris Orwa, Sidney Ochieng, Nanjira Sambuli

    Abstract: The tremendous growth of social media content on the Internet has inspired the development of the text analytics to understand and solve real-life problems. Leveraging statistical topic modelling helps researchers and practitioners in better comprehension of textual content as well as provides useful information for further analysis. Statistical topic modelling becomes especially important when we… ▽ More

    Submitted 8 August, 2016; originally announced August 2016.

    Comments: 17 pages, 2 figures, 5 tables

    ACM Class: D.4.8; H.1.2; H.2.8; I.2.7

  15. arXiv:1602.01937  [pdf

    cs.CR cs.CY cs.IR cs.SI

    YOURPRIVACYPROTECTOR, A recommender system for privacy settings in social networks

    Authors: Kambiz Ghazinour, Stan Matwin, Marina Sokolova

    Abstract: Ensuring privacy of users of social networks is probably an unsolvable conundrum. At the same time, an informed use of the existing privacy options by the social network participants may alleviate - or even prevent - some of the more drastic privacy-averse incidents. Unfortunately, recent surveys show that an average user is either not aware of these options or does not use them, probably due to t… ▽ More

    Submitted 5 February, 2016; originally announced February 2016.

    Comments: 15 pages, International journal of security, privacy and trust management. (IJSPTM) Volume 2, No 4, Aug. 2013

    Journal ref: International journal of security, privacy and trust management. (IJSPTM) Volume 2, No 4, Aug. 2013

  16. arXiv:1503.07795  [pdf

    cs.LG

    Multi-Labeled Classification of Demographic Attributes of Patients: a case study of diabetics patients

    Authors: Naveen Kumar Parachur Cotha, Marina Sokolova

    Abstract: Automated learning of patients demographics can be seen as multi-label problem where a patient model is based on different race and gender groups. The resulting model can be further integrated into Privacy-Preserving Data Mining, where it can be used to assess risk of identification of different patient groups. Our project considers relations between diabetes and demographics of patients as a mult… ▽ More

    Submitted 26 March, 2015; originally announced March 2015.

    Comments: 16 pages, 9 tables