Skip to main content

Showing 1–17 of 17 results for author: Perez-Ortiz, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2401.15461  [pdf, ps, other

    stat.ME math.ST

    Anytime-Valid Tests of Group Invariance through Conformal Prediction

    Authors: Tyron Lardy, Muriel Felipe Pérez-Ortiz

    Abstract: We develop anytime-valid tests of invariance under the action of compact groups. The resulting test statistics are optimal in a logarithmic-growth sense. We apply our method to extend recent anytime-valid tests of independence and to construct tests of normality.

    Submitted 23 May, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

  2. arXiv:2401.05424  [pdf, other

    cs.CY cs.IR cs.LG stat.AP

    A Toolbox for Modelling Engagement with Educational Videos

    Authors: Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Emine Yilmaz, John Shawe-Taylor, Sahan Bulathwela

    Abstract: With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLear… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2024. arXiv admin note: text overlap with arXiv:2309.11527

    ACM Class: H.3.3; J.1; I.2.0

  3. arXiv:2309.11527  [pdf, other

    cs.IR cs.AI cs.CY cs.LG stat.ML

    TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback

    Authors: Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Sahan Bulathwela

    Abstract: This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: To be presented at the ORSUM workshop at RecSys 2023

    ACM Class: H.3.3; J.1; I.2.0

  4. arXiv:2208.07610  [pdf, ps, other

    math.ST stat.ME

    E-Statistics, Group Invariance and Anytime Valid Testing

    Authors: Muriel Felipe Pérez-Ortiz, Tyron Lardy, Rianne de Heide, Peter Grünwald

    Abstract: We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in th… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 30 pages. Major rewrite of previous version. Submitted to the Annals of Statistics

  5. arXiv:2207.01504  [pdf, other

    cs.CY cs.AI cs.DL stat.AP stat.ML

    Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments

    Authors: Sahan Bulathwela, Meghana Verma, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

    Abstract: This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to pre… ▽ More

    Submitted 22 June, 2022; originally announced July 2022.

    Comments: To be presented at International Conference for Educational Data Mining 2022

    ACM Class: H.3.3; J.1; I.2.0

  6. arXiv:2112.04368  [pdf, other

    cs.IR cs.AI cs.CY stat.AP stat.ML

    Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems

    Authors: Sahan Bulathwela, María Pérez-Ortiz, Emine Yilmaz, John Shawe-Taylor

    Abstract: In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Presented at the First International Workshop on Joint Use of Probabilistic Graphical Models and Ontology at Conference on Knowledge Graph and Semantic Web 2021

    ACM Class: H.3.3; J.1; I.2.0

  7. arXiv:2112.02034  [pdf, ps, other

    cs.CY cs.AI stat.ML

    Could AI Democratise Education? Socio-Technical Imaginaries of an EdTech Revolution

    Authors: Sahan Bulathwela, María Pérez-Ortiz, Catherine Holloway, John Shawe-Taylor

    Abstract: Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first de… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: To be presented at Workshop on Machine Learning for the Develo** World (ML4D) at the Conference on Neural Information Processing Systems 2021

    ACM Class: K.3.1

  8. arXiv:2012.03780  [pdf, other

    cs.LG math.ST stat.ML

    A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

    Authors: Théophile Cantelobre, Benjamin Guedj, María Pérez-Ortiz, John Shawe-Taylor

    Abstract: Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of produ… ▽ More

    Submitted 21 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: 38 pages

  9. arXiv:2011.06931  [pdf, other

    stat.ME math.ST

    The Anytime-Valid Logrank Test: Error Control Under Continuous Monitoring with Unlimited Horizon

    Authors: J. ter Schure, M. F. Perez-Ortiz, A. Ly, P. Grunwald

    Abstract: We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stop** and optional continuation. The test is sequential without the need to specify a maximum sample size or stop** rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals.… ▽ More

    Submitted 1 May, 2023; v1 submitted 13 November, 2020; originally announced November 2020.

  10. arXiv:2011.02273  [pdf, other

    cs.CY cs.IR cs.LG stat.ML

    VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

    Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

    Abstract: With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, dema… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    ACM Class: K.3.1; H.3.1

  11. arXiv:2007.12911  [pdf, other

    cs.LG cs.CV stat.ML

    Tighter risk certificates for neural networks

    Authors: María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári

    Abstract: This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training ob… ▽ More

    Submitted 22 September, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

    Comments: New version includes: i) experiment showing the potential of the risk certificate for neural architecture search (Fig. 2); ii) experiments spanning uncertainty quantification and analysis of prior/posterior (Section 7.8); iii) an outline of the strengths of probabilistic neural networks trained by PBB (Section 7.9) and iv) a strengthened discussion on the connection to Bayesian learning

    Journal ref: Journal of Machine Learning Research, 2021

  12. arXiv:2004.05691  [pdf, other

    cs.LG stat.ML

    Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization

    Authors: Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, Rafal Mantiuk

    Abstract: Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e… ▽ More

    Submitted 12 April, 2020; originally announced April 2020.

  13. arXiv:1912.01592  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Towards an Integrative Educational Recommender for Lifelong Learners

    Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

    Abstract: One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2020

    MSC Class: H.3.3; J.1; I.2.0 ACM Class: H.3.3; J.1; I.2.0

  14. arXiv:1911.09471  [pdf, other

    cs.AI cs.IR cs.LG stat.AP stat.ML

    TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources

    Authors: Sahan Bulathwela, Maria Perez-Ortiz, Emine Yilmaz, John Shawe-Taylor

    Abstract: The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient, high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique chal… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2020

    ACM Class: H.3.3; J.1; I.2.0

  15. arXiv:1903.10022  [pdf, other

    cs.LG cs.AI stat.ML

    Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

    Authors: Maria Perez-Ortiz, Peter Tino, Rafal Mantiuk, Cesar Hervas-Martinez

    Abstract: Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised informat… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Comments: Published in the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

  16. arXiv:1903.10012  [pdf, other

    cs.LG cs.AI stat.ML

    A mixture of experts model for predicting persistent weather patterns

    Authors: Maria Perez-Ortiz, Pedro A. Gutierrez, Peter Tino, Carlos Casanova-Mateo, Sancho Salcedo-Sanz

    Abstract: Weather and atmospheric patterns are often persistent. The simplest weather forecasting method is the so-called persistence model, which assumes that the future state of a system will be similar (or equal) to the present state. Machine learning (ML) models are widely used in different weather forecasting applications, but they need to be compared to the persistence model to analyse whether they pr… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Comments: Published in IEEE International Joint Conference on Neural Networks (IJCNN) 2018

  17. arXiv:1712.03686  [pdf, other

    stat.AP cs.CV cs.GR

    A practical guide and software for analysing pairwise comparison experiments

    Authors: Maria Perez-Ortiz, Rafal K. Mantiuk

    Abstract: Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense,… ▽ More

    Submitted 15 December, 2017; v1 submitted 11 December, 2017; originally announced December 2017.

    Comments: Code available at https://github.com/mantiuk/pwcmp