Search | arXiv e-print repository

Prompts First, Finally

Authors: Brent N. Reeves, James Prather, Paul Denny, Juho Leinonen, Stephen MacNeil, Brett A. Becker, Andrew Luxton-Reilly

Abstract: Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain unsolved, it may be useful in the present moment t… ▽ More Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain unsolved, it may be useful in the present moment to step back and examine the overall trajectory of Computer Science writ large. Since the very beginning, our discipline has sought to increase the level of abstraction in each new representation. We have progressed from hardware dip switches, through special purpose languages and visual representations like flow charts, all the way now to ``natural language.'' With the advent of GenAI, students can finally change the abstraction level of a problem to the ``language'' they've been ``problem solving'' with all their lives. In this paper, we argue that our programming abstractions were always headed here -- to natural language. Now is the time to adopt a ``Prompts First'' approach to Computer Science Education. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 4 pages

arXiv:2405.17739 [pdf, other]

The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers

Authors: James Prather, Brent Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S. Randrianasolo, Brett Becker, Bailey Kimmel, Jared Wright, Ben Briggs

Abstract: Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are now programming with generative AI (GenAI), which… ▽ More Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are now programming with generative AI (GenAI), which can provide complete solutions to most introductory programming problems, code suggestions, hints for next steps when stuck, and explain cryptic error messages. Its impact on novice metacognition has only started to be explored. Here we replicate a previous study that examined novice programming problem solving behavior and extend it by incorporating GenAI tools. Through 21 lab sessions consisting of participant observation, interview, and eye tracking, we explore how novices are coding with GenAI tools. Although 20 of 21 students completed the assigned programming problem, our findings show an unfortunate divide in the use of GenAI tools between students who accelerated and students who struggled. Students who accelerated were able to use GenAI to create code they already intended to make and were able to ignore unhelpful or incorrect inline code suggestions. But for students who struggled, our findings indicate that previously known metacognitive difficulties persist, and that GenAI unfortunately can compound them and even introduce new metacognitive difficulties. Furthermore, struggling students often expressed cognitive dissonance about their problem solving ability, thought they performed better than they did, and finished with an illusion of competence. Based on our observations from both groups, we propose ways to scaffold the novice GenAI experience and make suggestions for future work. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted to ICER 2024

arXiv:2405.14178 [pdf, other]

doi 10.1145/3649217.3653574

Desirable Characteristics for AI Teaching Assistants in Programming Education

Authors: Paul Denny, Stephen MacNeil, Jaromir Savelka, Leo Porter, Andrew Luxton-Reilly

Abstract: Providing timely and personalized feedback to large numbers of students is a long-standing challenge in programming courses. Relying on human teaching assistants (TAs) has been extensively studied, revealing a number of potential shortcomings. These include inequitable access for students with low confidence when needing support, as well as situations where TAs provide direct solutions without hel… ▽ More Providing timely and personalized feedback to large numbers of students is a long-standing challenge in programming courses. Relying on human teaching assistants (TAs) has been extensively studied, revealing a number of potential shortcomings. These include inequitable access for students with low confidence when needing support, as well as situations where TAs provide direct solutions without hel** students to develop their own problem-solving skills. With the advent of powerful large language models (LLMs), digital teaching assistants configured for programming contexts have emerged as an appealing and scalable way to provide instant, equitable, round-the-clock support. Although digital TAs can provide a variety of help for programming tasks, from high-level problem solving advice to direct solution generation, the effectiveness of such tools depends on their ability to promote meaningful learning experiences. If students find the guardrails implemented in digital TAs too constraining, or if other expectations are not met, they may seek assistance in ways that do not help them learn. Thus, it is essential to identify the features that students believe make digital teaching assistants valuable. We deployed an LLM-powered digital assistant in an introductory programming course and collected student feedback ($n=813$) on the characteristics of the tool they perceived to be most important. Our results highlight that students value such tools for their ability to provide instant, engaging support, particularly during peak times such as before assessment deadlines. They also expressed a strong preference for features that enable them to retain autonomy in their learning journey, such as scaffolding that helps to guide them through problem-solving steps rather than simply being shown direct solutions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted to ITiCSE 2024

arXiv:2405.04412 [pdf, other]

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Authors: Lena Armstrong, Abbey Liu, Stephen MacNeil, Danaë Metaxa

Abstract: Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in on… ▽ More Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, in particular when used in workplace contexts. △ Less

Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2403.09409 [pdf, ps, other]

"Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students using Large Language Models

Authors: Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Littlefield, Sami Sarsa, Stephen MacNeil

Abstract: Gras** complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analogies is difficult even for experienced instructors.… ▽ More Gras** complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between unfamiliar concepts and familiar ones, providing an engaging way to aid understanding. However, creating effective educational analogies is difficult even for experienced instructors. We investigate to what extent large language models (LLMs), specifically ChatGPT, can provide access to personally relevant analogies on demand. Focusing on recursion, a challenging threshold concept, we conducted an investigation analyzing the analogies generated by more than 350 first-year computing students. They were provided with a code snippet and tasked to generate their own recursion-based analogies using ChatGPT, optionally including personally relevant topics in their prompts. We observed a great deal of diversity in the analogies produced with student-prescribed topics, in contrast to the otherwise generic analogies, highlighting the value of student creativity when working with LLMs. Not only did students enjoy the activity and report an improved understanding of recursion, but they described more easily remembering analogies that were personally and culturally relevant. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 7 pages, 2 figures, ITiCSE 2024 preprint

arXiv:2401.10759 [pdf, other]

Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models

Authors: James Prather, Paul Denny, Juho Leinonen, David H. Smith IV, Brent N. Reeves, Stephen MacNeil, Brett A. Becker, Andrew Luxton-Reilly, Thezyrie Amarouche, Bailey Kimmel

Abstract: Large Language Models (LLMs) have upended decades of pedagogy in computing education. Students previously learned to code through \textit{writing} many small problems with less emphasis on code reading and comprehension. Recent research has shown that free code generation tools powered by LLMs can solve introductory programming problems presented in natural language with ease. In this paper, we pr… ▽ More Large Language Models (LLMs) have upended decades of pedagogy in computing education. Students previously learned to code through \textit{writing} many small problems with less emphasis on code reading and comprehension. Recent research has shown that free code generation tools powered by LLMs can solve introductory programming problems presented in natural language with ease. In this paper, we propose a new way to teach programming with Prompt Problems. Students receive a problem visually, indicating how input should be transformed to output, and must translate that to a prompt for an LLM to decipher. The problem is considered correct when the code that is generated by the student prompt can pass all test cases. In this paper we present the design of this tool, discuss student interactions with it as they learn, and provide insights into this new class of programming problems as well as the design tools that integrate LLMs. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: accepted for CHI 2024

arXiv:2401.10184 [pdf, other]

doi 10.1145/3627508.3638305

Comparing Traditional and LLM-based Search for Image Geolocation

Authors: Albatool Wazzan, Stephen MacNeil, Richard Souvenir

Abstract: Web search engines have long served as indispensable tools for information retrieval; user behavior and query formulation strategies have been well studied. The introduction of search engines powered by large language models (LLMs) suggested more conversational search and new types of query strategies. In this paper, we compare traditional and LLM-based search for the task of image geolocation, i.… ▽ More Web search engines have long served as indispensable tools for information retrieval; user behavior and query formulation strategies have been well studied. The introduction of search engines powered by large language models (LLMs) suggested more conversational search and new types of query strategies. In this paper, we compare traditional and LLM-based search for the task of image geolocation, i.e., determining the location where an image was captured. Our work examines user interactions, with a particular focus on query formulation strategies. In our study, 60 participants were assigned either traditional or LLM-based search engines as assistants for geolocation. Participants using traditional search more accurately predicted the location of the image compared to those using the LLM-based search. Distinct strategies emerged between users depending on the type of assistant. Participants using the LLM-based search issued longer, more natural language queries, but had shorter search sessions. When reformulating their search queries, traditional search participants tended to add more terms to their initial queries, whereas participants using the LLM-based search consistently rephrased their initial queries. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.04601 [pdf, ps, other]

Imagining Computing Education Assessment after Generative AI

Authors: Stephen MacNeil, Scott Spurlock, Ian Applebaum

Abstract: In the contemporary landscape of computing education, the ubiquity of Generative Artificial Intelligence has significantly disrupted traditional assessment methods, rendering them obsolete and prompting educators to seek innovative alternatives. This research paper explores the challenges posed by Generative AI in the assessment domain and the persistent attempts to circumvent its impact. Despite… ▽ More In the contemporary landscape of computing education, the ubiquity of Generative Artificial Intelligence has significantly disrupted traditional assessment methods, rendering them obsolete and prompting educators to seek innovative alternatives. This research paper explores the challenges posed by Generative AI in the assessment domain and the persistent attempts to circumvent its impact. Despite various efforts to devise workarounds, the academic community is yet to find a comprehensive solution. Amidst this struggle, ungrading emerges as a potential yet under-appreciated solution to the assessment dilemma. Ungrading, a pedagogical approach that involves moving away from traditional grading systems, has faced resistance due to its perceived complexity and the reluctance of educators to depart from conventional assessment practices. However, as the inadequacies of current assessment methods become increasingly evident in the face of Generative AI, the time is ripe to reconsider and embrace ungrading. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.02262 [pdf, other]

The Effects of Generative AI on Computing Students' Help-Seeking Preferences

Authors: Irene Hou, Sophia Metille, Zhuo Li, Owen Man, Cynthia Zastudil, Stephen MacNeil

Abstract: Help-seeking is a critical way for students to learn new concepts, acquire new skills, and get unstuck when problem-solving in their computing courses. The recent proliferation of generative AI tools, such as ChatGPT, offers students a new source of help that is always available on-demand. However, it is unclear how this new resource compares to existing help-seeking resources along dimensions of… ▽ More Help-seeking is a critical way for students to learn new concepts, acquire new skills, and get unstuck when problem-solving in their computing courses. The recent proliferation of generative AI tools, such as ChatGPT, offers students a new source of help that is always available on-demand. However, it is unclear how this new resource compares to existing help-seeking resources along dimensions of perceived quality, latency, and trustworthiness. In this paper, we investigate the help-seeking preferences and experiences of computing students now that generative AI tools are available to them. We collected survey data (n=47) and conducted interviews (n=8) with computing students. Our results suggest that although these models are being rapidly adopted, they have not yet fully eclipsed traditional help resources. The help-seeking resources that students rely on continue to vary depending on the task and other factors. Finally, we observed preliminary evidence about how help-seeking with generative AI is a skill that needs to be developed, with disproportionate benefits for those who are better able to harness the capabilities of LLMs. We discuss potential implications for integrating generative AI into computing classrooms and the future of help-seeking in the era of generative AI. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2311.16017 [pdf, other]

Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models

Authors: Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth Bernstein, Arto Hellas, Sami Sarsa, Joanne Kim

Abstract: Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard t… ▽ More Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior -- in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard to spot when reading the code, and they can also at times be missed by automated tests. There is great educational potential in automatically detecting logic errors, especially when paired with suitable feedback for novices. Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks, including generating and explaining code. These capabilities are closely linked to code syntax, which aligns with the next token prediction behavior of LLMs. On the other hand, logic errors relate to the runtime performance of code and thus may not be as well suited to analysis by LLMs. To explore this, we investigate the performance of two popular LLMs, GPT-3 and GPT-4, for detecting and providing a novice-friendly explanation of logic errors. We compare LLM performance with a large cohort of introductory computing students $(n=964)$ solving the same error detection task. Through a mixed-methods analysis of student and model responses, we observe significant improvement in logic error identification between the previous and current generation of LLMs, and find that both LLM generations significantly outperform students. We outline how such models could be integrated into computing education tools, and discuss their potential for supporting students when learning programming. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.04926 [pdf, other]

More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems

Authors: Irene Hou, Owen Man, Sophie Mettille, Sebastian Gutierrez, Kenneth Angelikas, Stephen MacNeil

Abstract: The advent of large language models is resha** computing education. Recent research has demonstrated that these models can produce better explanations than students, answer multiple-choice questions at or above the class average, and generate code that can pass automated tests in introductory courses. These capabilities have prompted instructors to rapidly adapt their courses and assessment meth… ▽ More The advent of large language models is resha** computing education. Recent research has demonstrated that these models can produce better explanations than students, answer multiple-choice questions at or above the class average, and generate code that can pass automated tests in introductory courses. These capabilities have prompted instructors to rapidly adapt their courses and assessment methods to accommodate changes in learning objectives and the potential for academic integrity violations. While some scholars have advocated for the integration of visual problems as a safeguard against the capabilities of language models, new multimodal language models now have vision and language capabilities that may allow them to analyze and solve visual problems. In this paper, we evaluate the performance of two large multimodal models on visual assignments, with a specific focus on Parsons problems presented across diverse visual representations. Our results show that GPT-4V solved 96.7\% of these visual problems, struggling minimally with a single Parsons problem. Conversely, Bard performed poorly by only solving 69.2\% of problems, struggling with common issues like hallucinations and refusals. These findings suggest that merely transitioning to visual programming problems might not be a panacea to issues of academic integrity in the generative AI era. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.00658 [pdf, other]

doi 10.1145/3623762.3633499

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

Authors: James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, Jaromir Savelka

Abstract: Recent advancements in artificial intelligence (AI) are fundamentally resha** computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and t… ▽ More Recent advancements in artificial intelligence (AI) are fundamentally resha** computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 39 pages of content + 12 pages of references and appendices

arXiv:2308.04309 [pdf, other]

Generative AI in Computing Education: Perspectives of Students and Instructors

Authors: Cynthia Zastudil, Magdalena Rogalska, Christine Kapp, Jennifer Vaughn, Stephen MacNeil

Abstract: Generative models are now capable of producing natural language text that is, in some cases, comparable in quality to the text produced by people. In the computing education context, these models are being used to generate code, code explanations, and programming exercises. The rapid adoption of these models has prompted multiple position papers and workshops which discuss the implications of thes… ▽ More Generative models are now capable of producing natural language text that is, in some cases, comparable in quality to the text produced by people. In the computing education context, these models are being used to generate code, code explanations, and programming exercises. The rapid adoption of these models has prompted multiple position papers and workshops which discuss the implications of these models for computing education, both positive and negative. This paper presents results from a series of semi-structured interviews with 12 students and 6 instructors about their awareness, experiences, and preferences regarding the use of tools powered by generative AI in computing classrooms. The results suggest that Generative AI (GAI) tools will play an increasingly significant role in computing education. However, students and instructors also raised numerous concerns about how these models should be integrated to best support the needs and learning goals of students. We also identified interesting tensions and alignments that emerged between how instructors and students prefer to engage with these models. We discuss these results and provide recommendations related to curriculum development, assessment methods, and pedagogical practice. As GAI tools become increasingly prevalent, it's important to understand educational stakeholders' preferences and values to ensure that these tools can be used for good and that potential harms can be mitigated. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 10 pages, 1 figure

arXiv:2308.01542 [pdf, other]

Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents

Authors: Ziheng Huang, Sebastian Gutierrez, Hemanth Kamana, Stephen MacNeil

Abstract: The recent advent of large language models (LLM) has resulted in high-performing conversational agents such as chatGPT. These agents must remember key information from an ongoing conversation to provide responses that are contextually relevant to the user. However, these agents have limited memory and can be distracted by irrelevant parts of the conversation. While many strategies exist to manage… ▽ More The recent advent of large language models (LLM) has resulted in high-performing conversational agents such as chatGPT. These agents must remember key information from an ongoing conversation to provide responses that are contextually relevant to the user. However, these agents have limited memory and can be distracted by irrelevant parts of the conversation. While many strategies exist to manage conversational memory, users currently lack affordances for viewing and controlling what the agent remembers, resulting in a poor mental model and conversational breakdowns. In this paper, we present Memory Sandbox, an interactive system and design probe that allows users to manage the conversational memory of LLM-powered agents. By treating memories as data objects that can be viewed, manipulated, recorded, summarized, and shared across conversations, Memory Sandbox provides interaction affordances for users to manage how the agent should `see' the conversation. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2307.01142 [pdf, other]

Prompt Middleware: Map** Prompts for Large Language Models to UI Affordances

Authors: Stephen MacNeil, Andrew Tran, Joanne Kim, Ziheng Huang, Seth Bernstein, Dan Mogil

Abstract: To help users do complex work, researchers have developed techniques to integrate AI and human intelligence into user interfaces (UIs). With the recent introduction of large language models (LLMs), which can generate text in response to a natural language prompt, there are new opportunities to consider how to integrate LLMs into UIs. We present Prompt Middleware, a framework for generating prompts… ▽ More To help users do complex work, researchers have developed techniques to integrate AI and human intelligence into user interfaces (UIs). With the recent introduction of large language models (LLMs), which can generate text in response to a natural language prompt, there are new opportunities to consider how to integrate LLMs into UIs. We present Prompt Middleware, a framework for generating prompts for LLMs based on UI affordances. These include prompts that are predefined by experts (static prompts), generated from templates with fill-in options in the UI (template-based prompts), or created from scratch (free-form prompts). We demonstrate this framework with FeedbackBuffet, a writing assistant that automatically generates feedback based on a user's text input. Inspired by prior research showing how templates can help non-experts perform more like experts, FeedbackBuffet leverages template-based prompt middleware to enable feedback seekers to specify the types of feedback they want to receive as options in a UI. These options are composed using a template to form a feedback request prompt to GPT-3. We conclude with a discussion about how Prompt Middleware can help developers integrate LLMs into UIs. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2305.00937 [pdf, other]

doi 10.1145/3591196.3593337

Freeform Templates: Combining Freeform Curation with Structured Templates

Authors: Stephen MacNeil, Ziheng Huang, Kenneth Chen, Zijian Ding, Alex Yu, Kendall Nakai, Steven P. Dow

Abstract: Online whiteboards are becoming a popular way to facilitate collaborative design work, providing a free-form environment to curate ideas. However, as templates are increasingly being used to scaffold contributions from non-experts designers, it is crucial to understand their impact on the creative process. In this paper, we present the results from a study with 114 students in a large introductory… ▽ More Online whiteboards are becoming a popular way to facilitate collaborative design work, providing a free-form environment to curate ideas. However, as templates are increasingly being used to scaffold contributions from non-experts designers, it is crucial to understand their impact on the creative process. In this paper, we present the results from a study with 114 students in a large introductory design course. Our results confirm prior findings that templates benefit students by providing a starting point, a shared process, and the ability to access their own work from previous steps. While prior research has criticized templates for being too rigid, we discovered that using templates within a free-form environment resulted in visual patterns of free-form curation where concepts were spatially organized, clustered, color-coded, and connected using arrows and lines. We introduce the concept of "Free-form Templates" to illustrate how templates and free-form curation can be synergistic. △ Less

Submitted 1 May, 2023; originally announced May 2023.

ACM Class: H.5.0

arXiv:2304.03938 [pdf, other]

doi 10.1145/3587102.3588785

Comparing Code Explanations Created by Students and Large Language Models

Authors: Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, Arto Hellas

Abstract: Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs cor… ▽ More Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, develo** the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course ($n \approx 1000$) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 8 pages, 3 figures. To be published in Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

arXiv:2302.12832 [pdf, other]

Fluid Transformers and Creative Analogies: Exploring Large Language Models' Capacity for Augmenting Cross-Domain Analogical Creativity

Authors: Zijian Ding, Arvind Srinivasan, Stephen MacNeil, Joel Chan

Abstract: Cross-domain analogical reasoning is a core creative ability that can be challenging for humans. Recent work has shown some proofs-of concept of Large language Models' (LLMs) ability to generate cross-domain analogies. However, the reliability and potential usefulness of this capacity for augmenting human creative work has received little systematic exploration. In this paper, we systematically ex… ▽ More Cross-domain analogical reasoning is a core creative ability that can be challenging for humans. Recent work has shown some proofs-of concept of Large language Models' (LLMs) ability to generate cross-domain analogies. However, the reliability and potential usefulness of this capacity for augmenting human creative work has received little systematic exploration. In this paper, we systematically explore LLMs capacity to augment cross-domain analogical reasoning. Across three studies, we found: 1) LLM-generated cross-domain analogies were frequently judged as helpful in the context of a problem reformulation task (median 4 out of 5 helpfulness rating), and frequently (~80% of cases) led to observable changes in problem formulations, and 2) there was an upper bound of 25% of outputs bring rated as potentially harmful, with a majority due to potentially upsetting content, rather than biased or toxic content. These results demonstrate the potential utility -- and risks -- of LLMs for augmenting cross-domain analogical creativity. △ Less

Submitted 1 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2212.05113 [pdf, other]

doi 10.1145/3545947.3569630

Automatically Generating CS Learning Materials with Large Language Models

Authors: Stephen MacNeil, Andrew Tran, Juho Leinonen, Paul Denny, Joanne Kim, Arto Hellas, Seth Bernstein, Sami Sarsa

Abstract: Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advances may enable students to interact with code in ne… ▽ More Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential for LLMs to generate code explanations and programming assignments using carefully crafted prompts. These advances may enable students to interact with code in new ways while hel** instructors scale their learning materials. However, LLMs also introduce new implications for academic integrity, curriculum design, and software engineering careers. This workshop will demonstrate the capabilities of LLMs to help attendees evaluate whether and how LLMs might be integrated into their pedagogy and research. We will also engage attendees in brainstorming to consider how LLMs will impact our field. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: In Proceedings of the 54th ACM Technical Symposium on Computing Science Education

arXiv:2211.02265 [pdf, other]

Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book

Authors: Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, Juho Leinonen

Abstract: Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations. In this paper, we report on our experiences genera… ▽ More Advances in natural language processing have resulted in large language models (LLMs) that are capable of generating understandable and sensible written text. Recent versions of these models, such as OpenAI Codex and GPT-3, can generate code and code explanations. However, it is unclear whether and how students might engage with such explanations. In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. We modified the e-book to make LLM-generated code explanations accessible through buttons next to code snippets in the materials, which allowed us to track the use of the explanations as well as to ask for feedback on their utility. Three different types of explanations were available for students for each explainable code snippet; a line-by-line explanation, a list of important concepts, and a high-level summary of the code. Our preliminary results show that all varieties of explanations were viewed by students and that the majority of students perceived the code explanations as helpful to them. However, student engagement appeared to vary by code snippet complexity, explanation type, and code snippet length. Drawing on our experiences, we discuss future directions for integrating explanations generated by LLMs into existing computer science classrooms. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Showing 1–20 of 20 results for author: MacNeil, S