Search | arXiv e-print repository

Anytime-Valid Tests of Group Invariance through Conformal Prediction

Authors: Tyron Lardy, Muriel Felipe Pérez-Ortiz

Abstract: We develop anytime-valid tests of invariance under the action of compact groups. The resulting test statistics are optimal in a logarithmic-growth sense. We apply our method to extend recent anytime-valid tests of independence and to construct tests of normality. We develop anytime-valid tests of invariance under the action of compact groups. The resulting test statistics are optimal in a logarithmic-growth sense. We apply our method to extend recent anytime-valid tests of independence and to construct tests of normality. △ Less

Submitted 23 May, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.05424 [pdf, other]

A Toolbox for Modelling Engagement with Educational Videos

Authors: Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Emine Yilmaz, John Shawe-Taylor, Sahan Bulathwela

Abstract: With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLear… ▽ More With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLearn family of models was designed following the "open learner" concept, using humanly-intuitive user representations. This family of scalable, online models also help end-users visualise the learner models, which may in the future facilitate user interaction with their models/recommenders. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytics practitioners. The experiments show the utility of both the dataset and the library with predictive performance significantly exceeding comparative baseline models. The dataset contains a large amount of AI-related educational videos, which are of interest for building and validating AI-specific educational recommenders. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2024. arXiv admin note: text overlap with arXiv:2309.11527

ACM Class: H.3.3; J.1; I.2.0

arXiv:2309.11527 [pdf, other]

TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback

Authors: Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Sahan Bulathwela

Abstract: This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library… ▽ More This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: To be presented at the ORSUM workshop at RecSys 2023

ACM Class: H.3.3; J.1; I.2.0

arXiv:2208.07610 [pdf, ps, other]

E-Statistics, Group Invariance and Anytime Valid Testing

Authors: Muriel Felipe Pérez-Ortiz, Tyron Lardy, Rianne de Heide, Peter Grünwald

Abstract: We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in th… ▽ More We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in the relative sense, and that an anytime-valid test can be based on it. The GROW e-statistic is equal to a Bayes factor with a right Haar prior on G. Our treatment avoids nonuniqueness issues that sometimes arise for such priors in Bayesian contexts. A crucial assumption on the group G is its amenability, a well-known group-theoretical condition, which holds, for instance, in scale-location families. Our results also apply to finite-dimensional linear regression. △ Less

Submitted 17 October, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: 30 pages. Major rewrite of previous version. Submitted to the Annals of Statistics

arXiv:2207.01504 [pdf, other]

Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments

Authors: Sahan Bulathwela, Meghana Verma, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

Abstract: This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to pre… ▽ More This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly. △ Less

Submitted 22 June, 2022; originally announced July 2022.

Comments: To be presented at International Conference for Educational Data Mining 2022

ACM Class: H.3.3; J.1; I.2.0

arXiv:2112.04368 [pdf, other]

Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems

Authors: Sahan Bulathwela, María Pérez-Ortiz, Emine Yilmaz, John Shawe-Taylor

Abstract: In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model… ▽ More In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of the educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: Presented at the First International Workshop on Joint Use of Probabilistic Graphical Models and Ontology at Conference on Knowledge Graph and Semantic Web 2021

ACM Class: H.3.3; J.1; I.2.0

arXiv:2112.02034 [pdf, ps, other]

Could AI Democratise Education? Socio-Technical Imaginaries of an EdTech Revolution

Authors: Sahan Bulathwela, María Pérez-Ortiz, Catherine Holloway, John Shawe-Taylor

Abstract: Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first de… ▽ More Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first delivery of AI in Education could be greater educational inequality, along with a global misallocation of educational resources motivated by the current technological determinism narrative. In this paper, we focus on speculating and posing questions around the future of AI in Education, with the aim of starting the pressing conversation that would set the right foundations for the new generation of education that is permeated by technology. This paper starts by synthesising how AI might change how we learn and teach, focusing specifically on the case of personalised learning companions, and then move to discuss some socio-technical features that will be crucial for avoiding the perils of these AI systems worldwide (and perhaps ensuring their success). This paper also discusses the potential of using AI together with free, participatory and democratic resources, such as Wikipedia, Open Educational Resources and open-source tools. We also emphasise the need for collectively designing human-centered, transparent, interactive and collaborative AI-based algorithms that empower and give complete agency to stakeholders, as well as support new emerging pedagogies. Finally, we ask what would it take for this educational revolution to provide egalitarian and empowering access to education, beyond any political, cultural, language, geographical and learning ability barriers. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: To be presented at Workshop on Machine Learning for the Develo** World (ML4D) at the Conference on Neural Information Processing Systems 2021

ACM Class: K.3.1

arXiv:2012.03780 [pdf, other]

A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

Authors: Théophile Cantelobre, Benjamin Guedj, María Pérez-Ortiz, John Shawe-Taylor

Abstract: Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of produ… ▽ More Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of producing tight risk bounds for predictor distributions. This work proposes a novel PAC-Bayes perspective on the ILE Structured prediction framework. We present two generalization bounds, on the risk and excess risk, which yield insights into the behavior of ILE predictors. Two learning algorithms are derived from these bounds. The algorithms are implemented and their behavior analyzed, with source code available at \url{https://github.com/theophilec/PAC-Bayes-ILE-Structured-Prediction}. △ Less

Submitted 21 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: 38 pages

arXiv:2011.06931 [pdf, other]

The Anytime-Valid Logrank Test: Error Control Under Continuous Monitoring with Unlimited Horizon

Authors: J. ter Schure, M. F. Perez-Ortiz, A. Ly, P. Grunwald

Abstract: We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stop** and optional continuation. The test is sequential without the need to specify a maximum sample size or stop** rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals.… ▽ More We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stop** and optional continuation. The test is sequential without the need to specify a maximum sample size or stop** rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals. The logrank test is an instance of the martingale tests based on E-variables that have been recently developed. We demonstrate type-I error guarantees for the test in a semiparametric setting of proportional hazards and show how to extend it to ties, Cox' regression and confidence sequences. Using a Gaussian approximation on the logrank statistic, we show that the AV logrank test (which itself is always exact) has a similar rejection region to O'Brien-Fleming alpha-spending but with the potential to achieve 100% power by optional continuation. Although our approach to study design requires a larger sample size, the *expected* sample size is competitive by optional stop**. △ Less

Submitted 1 May, 2023; v1 submitted 13 November, 2020; originally announced November 2020.

arXiv:2011.02273 [pdf, other]

VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

Abstract: With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, dema… ▽ More With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, demanding for more efficient, scalable and automatic solutions for managing learning resources. Although a few datasets related to engagement with educational videos exist, there is still an important need for data and research aimed at understanding learner engagement with scientific video lectures. This paper introduces VLEngagement, a novel dataset that consists of content-based and video-specific features extracted from publicly available scientific video lectures and several metrics related to user engagement. We introduce several novel tasks related to predicting and understanding context-agnostic engagement in video lectures, providing preliminary baselines. This is the largest and most diverse publicly available dataset to our knowledge that deals with such tasks. The extraction of Wikipedia topic-based features also allows associating more sophisticated Wikipedia based features to the dataset to improve the performance in these tasks. The dataset, helper tools and example code snippets are available publicly at https://github.com/sahanbull/context-agnostic-engagement △ Less

Submitted 2 November, 2020; originally announced November 2020.

ACM Class: K.3.1; H.3.1

arXiv:2007.12911 [pdf, other]

Tighter risk certificates for neural networks

Authors: María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári

Abstract: This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training ob… ▽ More This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data. △ Less

Submitted 22 September, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: New version includes: i) experiment showing the potential of the risk certificate for neural architecture search (Fig. 2); ii) experiments spanning uncertainty quantification and analysis of prior/posterior (Section 7.8); iii) an outline of the strengths of probabilistic neural networks trained by PBB (Section 7.9) and iv) a strengthened discussion on the connection to Bayesian learning

Journal ref: Journal of Machine Learning Research, 2021

arXiv:2004.05691 [pdf, other]

Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization

Authors: Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, Rafal Mantiuk

Abstract: Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e… ▽ More Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e.g. videos) or expensive (e.g. medical imaging). This motivates the use of an active sampling algorithm that chooses only the most informative pairs for comparison. In this paper we propose ASAP, an active sampling algorithm based on approximate message passing and expected information gain maximization. Unlike most existing methods, which rely on partial updates of the posterior distribution, we are able to perform full updates and therefore much improve the accuracy of the inferred scores. The algorithm relies on three techniques for reducing computational cost: inference based on approximate message passing, selective evaluations of the information gain, and selecting pairs in a batch that forms a minimum spanning tree of the inverse of information gain. We demonstrate, with real and synthetic data, that ASAP offers the highest accuracy of inferred scores compared to the existing methods. We also provide an open-source GPU implementation of ASAP for large-scale experiments. △ Less

Submitted 12 April, 2020; originally announced April 2020.

arXiv:1912.01592 [pdf, other]

Towards an Integrative Educational Recommender for Lifelong Learners

Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

Abstract: One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for… ▽ More One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2020

MSC Class: H.3.3; J.1; I.2.0 ACM Class: H.3.3; J.1; I.2.0

arXiv:1911.09471 [pdf, other]

TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources

Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

Abstract: The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient, high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique chal… ▽ More The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient, high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2020

ACM Class: H.3.3; J.1; I.2.0

arXiv:1903.10022 [pdf, other]

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Authors: Maria Perez-Ortiz, Peter Tino, Rafal Mantiuk, Cesar Hervas-Martinez

Abstract: Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised informat… ▽ More Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems. △ Less

Submitted 24 March, 2019; originally announced March 2019.

Comments: Published in the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

arXiv:1903.10012 [pdf, other]

A mixture of experts model for predicting persistent weather patterns

Authors: Maria Perez-Ortiz, Pedro A. Gutierrez, Peter Tino, Carlos Casanova-Mateo, Sancho Salcedo-Sanz

Abstract: Weather and atmospheric patterns are often persistent. The simplest weather forecasting method is the so-called persistence model, which assumes that the future state of a system will be similar (or equal) to the present state. Machine learning (ML) models are widely used in different weather forecasting applications, but they need to be compared to the persistence model to analyse whether they pr… ▽ More Weather and atmospheric patterns are often persistent. The simplest weather forecasting method is the so-called persistence model, which assumes that the future state of a system will be similar (or equal) to the present state. Machine learning (ML) models are widely used in different weather forecasting applications, but they need to be compared to the persistence model to analyse whether they provide a competitive solution to the problem at hand. In this paper, we devise a new model for predicting low-visibility in airports using the concepts of mixture of experts. Visibility level is coded as two different ordered categorical variables: cloud height and runway visual height. The underlying system in this application is stagnant approximately in 90% of the cases, and standard ML models fail to improve on the performance of the persistence model. Because of this, instead of trying to simply beat the persistence model using ML, we use this persistence as a baseline and learn an ordinal neural network model that refines its results by focusing on learning weather fluctuations. The results show that the proposal outperforms persistence and other ordinal autoregressive models, especially for longer time horizon predictions and for the runway visual height variable. △ Less

Submitted 24 March, 2019; originally announced March 2019.

Comments: Published in IEEE International Joint Conference on Neural Networks (IJCNN) 2018

arXiv:1712.03686 [pdf, other]

A practical guide and software for analysing pairwise comparison experiments

Authors: Maria Perez-Ortiz, Rafal K. Mantiuk

Abstract: Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense,… ▽ More Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment. △ Less

Submitted 15 December, 2017; v1 submitted 11 December, 2017; originally announced December 2017.

Comments: Code available at https://github.com/mantiuk/pwcmp

Showing 1–17 of 17 results for author: Perez-Ortiz, M