-
Clustering MOOC Programming Solutions to Diversify Their Presentation to Students
Authors:
Elizaveta Artser,
Anastasiia Birillo,
Yaroslav Golubev,
Maria Tigina,
Hieke Keuning,
Nikolay Vyahhi,
Timofey Bryksin
Abstract:
In many MOOCs, whenever a student completes a programming task, they can see previous solutions of other students to find potentially different ways of solving the problem and learn new coding constructs. However, a lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality.
To solve this novel problem, we adapted the existing plagiarism detection tool JPlag to P…
▽ More
In many MOOCs, whenever a student completes a programming task, they can see previous solutions of other students to find potentially different ways of solving the problem and learn new coding constructs. However, a lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality.
To solve this novel problem, we adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform. However, due to the tool's inner algorithm, it fully processed only 46 out of 867 studied tasks. Therefore, we developed our own tool called Rhubarb. This tool first standardizes solutions that are algorithmically the same, then calculates the structure-aware edit distance between them, and then applies clustering. Finally, it selects one example from each of the largest clusters, taking into account their code quality. Rhubarb was able to handle all 867 tasks successfully.
We compared approaches on a set of 59 tasks that both tools could process. Eight experts rated the selected solutions based on diversity, code quality, and usefulness. The default platform approach of selecting recent submissions received on average 3.12 out of 5, JPlag - 3.77, Rhubarb - 3.50. Since in the real MOOC, it is imperative to process everything, we created a system that uses JPlag on the 5.3% of tasks it fully processes and Rhubarb on the remaining 94.7%.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Next-Step Hint Generation for Introductory Programming Using Large Language Models
Authors:
Lianne Roest,
Hieke Keuning,
Johan Jeuring
Abstract:
Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices th…
▽ More
Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student's code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
What Skills Do You Need When Develo** Software Using ChatGPT? (Discussion Paper)
Authors:
Johan Jeuring,
Roel Groot,
Hieke Keuning
Abstract:
Since the release of LLM-based tools such as GitHub Copilot and ChatGPT the media and popular scientific literature, but also journals such as the Communications of the ACM, have been flooded with opinions how these tools will change programming. The opinions range from ``machines will program themselves'', to ``AI does not help programmers''. Of course, these statements are meant to to stir up a…
▽ More
Since the release of LLM-based tools such as GitHub Copilot and ChatGPT the media and popular scientific literature, but also journals such as the Communications of the ACM, have been flooded with opinions how these tools will change programming. The opinions range from ``machines will program themselves'', to ``AI does not help programmers''. Of course, these statements are meant to to stir up a discussion, and should be taken with a grain of salt, but we argue that such unfounded statements are potentially harmful. Instead, we propose to investigate which skills are required to develop software using LLM-based tools.
In this paper we report on an experiment in which we explore if Computational Thinking (CT) skills predict the ability to develop software using LLM-based tools. Our results show that the ability to develop software using LLM-based tools can indeed be predicted by the score on a CT assessment. There are many limitations to our experiment, and this paper is also a call to discuss how to approach, preferably experimentally, the question of which skills are required to develop software using LLM-based tools. We propose to rephrase this question to include by what kind of people/programmers, to develop what kind of software using what kind of LLM-based tools.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
The Robots are Here: Navigating the Generative AI Revolution in Computing Education
Authors:
James Prather,
Paul Denny,
Juho Leinonen,
Brett A. Becker,
Ibrahim Albluwi,
Michelle Craig,
Hieke Keuning,
Natalie Kiesler,
Tobias Kohn,
Andrew Luxton-Reilly,
Stephen MacNeil,
Andrew Peterson,
Raymond Pettit,
Brent N. Reeves,
Jaromir Savelka
Abstract:
Recent advancements in artificial intelligence (AI) are fundamentally resha** computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and t…
▽ More
Recent advancements in artificial intelligence (AI) are fundamentally resha** computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Exploring the Potential of Large Language Models to Generate Formative Programming Feedback
Authors:
Natalie Kiesler,
Dominic Lohr,
Hieke Keuning
Abstract:
Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim…
▽ More
Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim at (1) exploring how an LLM like ChatGPT responds to students seeking help with their introductory programming tasks, and (2) identifying feedback types in its responses. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT along with questions required to elicit feedback and correct solutions. The results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors, which means that students can potentially benefit. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
A Systematic Map** Study of Code Quality in Education -- with Complete Bibliography
Authors:
Hieke Keuning,
Johan Jeuring,
Bastiaan Heeren
Abstract:
While functionality and correctness of code has traditionally been the main focus of computing educators, quality aspects of code are getting increasingly more attention. High-quality code contributes to the maintainability of software systems, and should therefore be a central aspect of computing education. We have conducted a systematic map** study to give a broad overview of the research cond…
▽ More
While functionality and correctness of code has traditionally been the main focus of computing educators, quality aspects of code are getting increasingly more attention. High-quality code contributes to the maintainability of software systems, and should therefore be a central aspect of computing education. We have conducted a systematic map** study to give a broad overview of the research conducted in the field of code quality in an educational context. The study investigates paper characteristics, topics, research methods, and the targeted programming languages. We found 195 publications (1976-2022) on the topic in multiple databases, which we systematically coded to answer the research questions. This paper reports on the results and identifies developments, trends, and new opportunities for research in the field of code quality in computing education.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Detecting Code Quality Issues in Pre-written Templates of Programming Tasks in Online Courses
Authors:
Anastasiia Birillo,
Elizaveta Artser,
Yaroslav Golubev,
Maria Tigina,
Hieke Keuning,
Nikolay Vyahhi,
Timofey Bryksin
Abstract:
In this work, we developed an algorithm for detecting code quality issues in the templates of online programming tasks, validated it, and conducted an empirical study on the dataset of student solutions. The algorithm consists of analyzing recurring unfixed issues in solutions of different students, matching them with the code of the template, and then filtering the results. Our manual validation…
▽ More
In this work, we developed an algorithm for detecting code quality issues in the templates of online programming tasks, validated it, and conducted an empirical study on the dataset of student solutions. The algorithm consists of analyzing recurring unfixed issues in solutions of different students, matching them with the code of the template, and then filtering the results. Our manual validation on a subset of tasks demonstrated a precision of 80.8% and a recall of 73.3%. We used the algorithm on 415 Java tasks from the JetBrains Academy platform and discovered that as much as 14.7% of tasks have at least one issue in their template, thus making it harder for students to learn good code quality practices. We describe our results in detail, provide several motivating examples and specific cases, and share the feedback of the developers of the platform, who fixed 51 issues based on the output of our approach.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Analyzing the Quality of Submissions in Online Programming Courses
Authors:
Maria Tigina,
Anastasiia Birillo,
Yaroslav Golubev,
Hieke Keuning,
Nikolay Vyahhi,
Timofey Bryksin
Abstract:
Programming education should aim to provide students with a broad range of skills that they will later use while develo** software. An important aspect in this is their ability to write code that is not only correct but also of high quality. Unfortunately, this is difficult to control in the setting of a massive open online course. In this paper, we carry out an analysis of the code quality of s…
▽ More
Programming education should aim to provide students with a broad range of skills that they will later use while develo** software. An important aspect in this is their ability to write code that is not only correct but also of high quality. Unfortunately, this is difficult to control in the setting of a massive open online course. In this paper, we carry out an analysis of the code quality of submissions from JetBrains Academy - a platform for studying programming in an industry-like project-based setting with an embedded code quality assessment tool called Hyperstyle. We analyzed more than a million Java submissions and more than 1.3 million Python submissions, studied the most prevalent types of code quality issues and the dynamics of how students fix them. We provide several case studies of different issues, as well as an analysis of why certain issues remain unfixed even after several attempts. Also, we studied abnormally long sequences of submissions, in which students attempted to fix code quality issues after passing the task. Our results point the way towards the improvement of online courses, such as making sure that the task itself does not incentivize students to write code poorly.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.