-
A Bayesian Active Learning Approach to Comparative Judgement
Authors:
Andy Gray,
Alma Rahat,
Tom Crick,
Stephen Lindsay
Abstract:
Assessment is a crucial part of education. Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessors. An approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items and is asked to select the better one. Following a series of comparisons, a rank is derived using a ranking model,…
▽ More
Assessment is a crucial part of education. Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessors. An approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items and is asked to select the better one. Following a series of comparisons, a rank is derived using a ranking model, for example, the BTM, based on the results. While CJ is considered a reliable method for marking, there are concerns around transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used. As a result, a random selection approach is usually deployed.
We propose a novel Bayesian approach to CJ (BCJ) for determining the ranks of compared items alongside a new way to select the pairs to present to the marker(s) using active learning (AL), addressing the key shortcomings of traditional CJ. Furthermore, we demonstrate how the entire approach may provide transparency by providing the user insights into how it is making its decisions and, at the same time, being more efficient. Results from our experiments confirm that the proposed BCJ combined with entropy-driven AL pair-selection method is superior to other alternatives. We also find that the more comparisons done, the more accurate BCJ becomes, which solves the issue the current method has of the model deteriorating if too many comparisons are performed. As our approach can generate the complete predicted rank distribution for an item, we also show how this can be utilised in devising a predicted grade, guided by the assessor.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
Authors:
Andy Gray,
Alma Rahat,
Tom Crick,
Stephen Lindsay,
Darren Wallace
Abstract:
Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Ther…
▽ More
Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected.
In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK's political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall's tau score of 0.96 and a p-value of 1.5x10^(-5). We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
A UK Case Study on Cybersecurity Education and Accreditation
Authors:
Tom Crick,
James H. Davenport,
Alastair Irons,
Tom Prickett
Abstract:
This paper presents a national case study-based analysis of the numerous dimensions to cybersecurity education and how they are prioritised, implemented and accredited; from understanding the interaction of hardware and software, moving from theory to practice (and vice versa), to human factors, policy and politics (as well as various other important facets). A multitude of model curricula and rec…
▽ More
This paper presents a national case study-based analysis of the numerous dimensions to cybersecurity education and how they are prioritised, implemented and accredited; from understanding the interaction of hardware and software, moving from theory to practice (and vice versa), to human factors, policy and politics (as well as various other important facets). A multitude of model curricula and recommendations have been presented and discussed in international fora in recent years, with varying levels of impact on education, policy and practice. This paper address three key questions: i) what is taught and what should be taught for cybersecurity to general computer science students; ii) should cybersecurity be taught stand-alone or in an integrated manner to general computer science students; and iii) can accreditation by national professional, statutory and regulatory bodies enhance the provision of cybersecurity within a body's jurisdiction?
Evaluating how cybersecurity is taught in all aspects of computer science is clearly a task of considerable size, one that is beyond the scope of this paper. Instead a case study-based research approach -- primarily focusing on the UK -- has been adopted to evaluate the evidence of the teaching of cybersecurity within general computer science to university-level students. Thus, in the context of widespread international computer science/engineering curriculum reform, what does this need to embed cybersecurity knowledge and skills mean more generally for institutions and educators, and how can we teach this subject more effectively? Through this UK case study, and by contrasting with related initiatives in the US, we demonstrate the positive effect that national accreditation requirements can have, and offer some recommendations both for future research and curriculum developments.
△ Less
Submitted 30 July, 2019; v1 submitted 23 June, 2019;
originally announced June 2019.
-
An Analysis of Introductory Programming Courses at UK Universities
Authors:
Ellen Murphy,
Tom Crick,
James H. Davenport
Abstract:
Context: In the context of exploring the art, science and engineering of programming, the question of which programming languages should be taught first has been fiercely debated since computer science teaching started in universities. Failure to grasp programming readily almost certainly implies failure to progress in computer science. Inquiry: What first programming languages are being taught? T…
▽ More
Context: In the context of exploring the art, science and engineering of programming, the question of which programming languages should be taught first has been fiercely debated since computer science teaching started in universities. Failure to grasp programming readily almost certainly implies failure to progress in computer science. Inquiry: What first programming languages are being taught? There have been regular national-scale surveys in Australia and New Zealand, with the only US survey reporting on a small subset of universities. This the first such national survey of universities in the UK. Approach: We report the results of the first survey of introductory programming courses (N=80) taught at UK universities as part of their first year computer science (or related) degree programmes, conducted in the first half of 2016. We report on student numbers, programming paradigm, programming languages and environment/tools used, as well as the underpinning rationale for these choices. Knowledge: The results in this first UK survey indicate a dominance of Java at a time when universities are still generally teaching students who are new to programming (and computer science), despite the fact that Python is perceived, by the same respondents, to be both easier to teach as well as to learn. Grounding: We compare the results of this survey with a related survey conducted since 2010 (as well as earlier surveys from 2001 and 2003) in Australia and New Zealand. Importance: This survey provides a starting point for valuable pedagogic baseline data for the analysis of the art, science and engineering of programming, in the context of substantial computer science curriculum reform in UK schools, as well as increasing scrutiny of teaching excellence and graduate employability for UK universities.
△ Less
Submitted 31 March, 2017; v1 submitted 21 September, 2016;
originally announced September 2016.
-
Incorporating Emotion and Personality-Based Analysis in User-Centered Modelling
Authors:
Mohamed Mostafa,
Tom Crick,
Ana C. Calderon,
Giles Oatley
Abstract:
Understanding complex user behaviour under various conditions, scenarios and journeys can be fundamental to the improvement of the user-experience for a given system. Predictive models of user reactions, responses -- and in particular, emotions -- can aid in the design of more intuitive and usable systems. Building on this theme, the preliminary research presented in this paper correlates events a…
▽ More
Understanding complex user behaviour under various conditions, scenarios and journeys can be fundamental to the improvement of the user-experience for a given system. Predictive models of user reactions, responses -- and in particular, emotions -- can aid in the design of more intuitive and usable systems. Building on this theme, the preliminary research presented in this paper correlates events and interactions in an online social network against user behaviour, focusing on personality traits. Emotional context and tone is analysed and modelled based on varying types of sentiments that users express in their language using the IBM Watson Developer Cloud tools. The data collected in this study thus provides further evidence towards supporting the hypothesis that analysing and modelling emotions, sentiments and personality traits provides valuable insight into improving the user experience of complex social computer systems.
△ Less
Submitted 10 August, 2016;
originally announced August 2016.
-
Lymphangiogenesis and carcinoma in the uterine cervix: Joint and hierarchical models for random cluster sizes and continuous outcomes
Authors:
T. R. Fanshawe,
C. M. Chapman,
T. Crick
Abstract:
Although the lymphatic system is clearly linked to the metastasis of most human carcinomas, the mechanisms by which lymphangiogenesis occurs in response to the presence of carcinoma remain unclear. Hierarchical models are presented to investigate the properties of lymphatic vessel production in 2997 fields taken from 20 individuals with invasive carcinoma, 21 individuals with cervical intraepithel…
▽ More
Although the lymphatic system is clearly linked to the metastasis of most human carcinomas, the mechanisms by which lymphangiogenesis occurs in response to the presence of carcinoma remain unclear. Hierarchical models are presented to investigate the properties of lymphatic vessel production in 2997 fields taken from 20 individuals with invasive carcinoma, 21 individuals with cervical intraepithelial neoplasia and 21 controls. Such data demonstrate a high degree of correlation within tumour samples from the same individual. Joint hierarchical models utilising shared random effects are discussed and fitted in a Bayesian framework to allow for the correlation between two key outcome measures: a random cluster size (the number of lymphatic vessels in a tissue sample) and a continuous outcome (vessel size). Results show that invasive carcinoma samples are associated with increased production of smaller and more irregularly-shaped lymphatic vessels and suggest a mechanistic link between carcinoma of the cervix and lymphangiogenesis.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.
-
Reproducibility as a Technical Specification
Authors:
Tom Crick,
Benjamin A. Hall,
Samin Ishtiaq
Abstract:
Reproducibility of computationally-derived scientific discoveries should be a certainty. As the product of several person-years' worth of effort, results -- whether disseminated through academic journals, conferences or exploited through commercial ventures -- should at some level be expected to be repeatable by other researchers. While this stance may appear to be obvious and trivial, a variety o…
▽ More
Reproducibility of computationally-derived scientific discoveries should be a certainty. As the product of several person-years' worth of effort, results -- whether disseminated through academic journals, conferences or exploited through commercial ventures -- should at some level be expected to be repeatable by other researchers. While this stance may appear to be obvious and trivial, a variety of factors often stand in the way of making it commonplace. Whilst there has been detailed cross-disciplinary discussions of the various social, cultural and ideological drivers and (potential) solutions, one factor which has had less focus is the concept of reproducibility as a technical challenge. Specifically, that the definition of an unambiguous and measurable standard of reproducibility would offer a significant benefit to the wider computational science community.
In this paper, we propose a high-level technical specification for a service for reproducibility, presenting cyberinfrastructure and associated workflow for a service which would enable such a specification to be verified and validated. In addition to addressing a pressing need for the scientific community, we further speculate on the potential contribution to the wider software development community of services which automate de novo compilation and testing of code from source. We illustrate our proposed specification and workflow by using the BioModelAnalyzer tool as a running example.
△ Less
Submitted 15 June, 2015; v1 submitted 6 April, 2015;
originally announced April 2015.
-
Top Tips to Make Your Research Irreproducible
Authors:
Neil P. Chue Hong,
Tom Crick,
Ian P. Gent,
Lars Kotthoff,
Kenji Takeda
Abstract:
It is an unfortunate convention of science that research should pretend to be reproducible; our top tips will help you mitigate this fussy conventionality, enabling you to enthusiastically showcase your irreproducible work.
It is an unfortunate convention of science that research should pretend to be reproducible; our top tips will help you mitigate this fussy conventionality, enabling you to enthusiastically showcase your irreproducible work.
△ Less
Submitted 8 April, 2015; v1 submitted 31 March, 2015;
originally announced April 2015.
-
Reproducibility in Research: Systems, Infrastructure, Culture
Authors:
Tom Crick,
Benjamin A. Hall,
Samin Ishtiaq
Abstract:
The reproduction and replication of research results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the challenges closely revolve around the ability to implement (and exploit) novel algorithms and models. Taking a new approach from the literature and applying it to a new codebase frequently requir…
▽ More
The reproduction and replication of research results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the challenges closely revolve around the ability to implement (and exploit) novel algorithms and models. Taking a new approach from the literature and applying it to a new codebase frequently requires local knowledge missing from the published manuscripts and transient project websites. Alongside this issue, benchmarking, and the lack of open, transparent and fair benchmark sets present another barrier to the verification and validation of claimed results.
In this paper, we outline several recommendations to address these issues, driven by specific examples from a range of scientific domains. Based on these recommendations, we propose a high-level prototype open automated platform for scientific software development which effectively abstracts specific dependencies from the individual researcher and their workstation, allowing easy sharing and reproduction of results. This new e-infrastructure for reproducible computational science offers the potential to incentivise a culture change and drive the adoption of new techniques to improve the quality and efficiency -- and thus reproducibility -- of scientific exploration.
△ Less
Submitted 28 July, 2017; v1 submitted 9 March, 2015;
originally announced March 2015.
-
Dear CAV, We Need to Talk About Reproducibility
Authors:
Tom Crick,
Benjamin A. Hall,
Samin Ishtiaq
Abstract:
How many times have you tried to re-implement a past CAV tool paper, and failed?
Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper…
▽ More
How many times have you tried to re-implement a past CAV tool paper, and failed?
Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper we propose an infrastructure for enabling reproducibility in our community, by automating the build, unit testing and benchmarking of research software.
△ Less
Submitted 9 February, 2015;
originally announced February 2015.
-
"Share and Enjoy": Publishing Useful and Usable Scientific Models
Authors:
Tom Crick,
Benjamin A. Hall,
Samin Ishtiaq,
Kenji Takeda
Abstract:
The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing code, data and models often requires extra effo…
▽ More
The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing code, data and models often requires extra effort; this is currently seen as a significant overhead that may not be worth the time investment.
Automated systems, which allow easy reproduction of results, offer the potential to incentivise a culture change and drive the adoption of new techniques to improve the efficiency of scientific exploration. In this paper, we discuss the value of improved access and sharing of the two key types of results arising from work done in the computational sciences: models and algorithms. We propose the development of an integrated cloud-based system underpinning computational science, linking together software and data repositories, toolchains, workflows and outputs, providing a seamless automated infrastructure for the verification and validation of scientific models and in particular, performance benchmarks.
△ Less
Submitted 14 October, 2014; v1 submitted 1 September, 2014;
originally announced September 2014.
-
"Can I Implement Your Algorithm?": A Model for Reproducible Research Software
Authors:
Tom Crick,
Benjamin A. Hall,
Samin Ishtiaq
Abstract:
The reproduction and replication of novel results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it to a new codebase frequently requires local knowledge m…
▽ More
The reproduction and replication of novel results has become a major issue for a number of scientific disciplines. In computer science and related computational disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it to a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair --- and widely available --- benchmark sets present another barrier.
In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.
△ Less
Submitted 16 September, 2014; v1 submitted 22 July, 2014;
originally announced July 2014.