Skip to main content

Showing 1–5 of 5 results for author: van der Heijden, P G M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20895  [pdf, other

    cs.CL

    A comparison of correspondence analysis with PMI-based word embedding methods

    Authors: Qianqian Qi, David J. Hessen, Peter G. M. van der Heijden

    Abstract: Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  2. arXiv:2404.01176  [pdf, other

    cs.IR

    Using Chao's Estimator as a Stop** Criterion for Technology-Assisted Review

    Authors: Michiel P. Bron, Peter G. M. van der Heijden, Ad J. Feelders, Arno P. J. M. Siebes

    Abstract: Technology-Assisted Review (TAR) aims to reduce the human effort required for screening processes such as abstract screening for systematic literature reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers' previous decisions. After each model update, the system proposes new documents… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  3. Improving information retrieval through correspondence analysis instead of latent semantic analysis

    Authors: Qianqian Qi, David J. Hessen, Peter G. M. van der Heijden

    Abstract: Both latent semantic analysis (LSA) and correspondence analysis (CA) are dimensionality reduction techniques that use singular value decomposition (SVD) for information retrieval. Theoretically, the results of LSA display both the association between documents and terms, and marginal effects; in comparison, CA only focuses on the associations between documents and terms. Marginal effects are usual… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Journal ref: Journal of Intelligent Information Systems 2023

  4. arXiv:2109.04875  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Neural Networks for Latent Budget Analysis of Compositional Data

    Authors: Zhenwei Yang, Ayoub Bagheri, P. G. M van der Heijden

    Abstract: Compositional data are non-negative data collected in a rectangular matrix with a constant row sum. Due to the non-negativity the focus is on conditional proportions that add up to 1 for each row. A row of conditional proportions is called an observed budget. Latent budget analysis (LBA) assumes a mixture of latent budgets that explains the observed budgets. LBA is usually fitted to a contingency… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  5. A comparison of latent semantic analysis and correspondence analysis of document-term matrices

    Authors: Qianqian Qi, David J. Hessen, Tejaswini Deoskar, Peter G. M. van der Heijden

    Abstract: Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition (SVD) for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-… ▽ More

    Submitted 25 November, 2022; v1 submitted 25 July, 2021; originally announced August 2021.

    Journal ref: Natural Language Engineering (2023) 1-31