Search | arXiv e-print repository

doi 10.1145/3657604.3662028

Gaining Insights into Group-Level Course Difficulty via Differential Course Functioning

Authors: Frederik Baucks, Robin Schmucker, Conrad Borchers, Zachary A. Pardos, Laurenz Wiskott

Abstract: Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations ar… ▽ More Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations are scarce, and existing approaches do not adequately decouple course-specific difficulty from students' general performance levels. The present study introduces Differential Course Functioning (DCF) as an Item Response Theory (IRT)-based CA methodology. DCF controls for student performance levels and examines whether significant differences exist in how distinct student groups succeed in a given course. Leveraging data from over 20,000 students at a large public university, we demonstrate DCF's ability to detect inequities in undergraduate course difficulty across student groups described by grade achievement. We compare major pairs with high co-enrollment and transfer students to their non-transfer peers. For the former, our findings suggest a link between DCF effect sizes and the alignment of course content to student home department motivating interventions targeted towards improving course preparedness. For the latter, results suggest minor variations in course-specific difficulty between transfer and non-transfer students. While this is desirable, it also suggests that interventions targeted toward mitigating grade achievement gaps in transfer students should encompass comprehensive support beyond enhancing preparedness for individual courses. By providing more nuanced and equitable assessments of academic performance and difficulties experienced by diverse student populations, DCF could support policymakers, course articulation officers, and student advisors. △ Less

Submitted 7 May, 2024; originally announced June 2024.

arXiv:2406.04302 [pdf, other]

Representational Alignment Supports Effective Machine Teaching

Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

Abstract: A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representatio… ▽ More A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2404.00712 [pdf, other]

Survey of Computerized Adaptive Testing: A Machine Learning Perspective

Authors: Qi Liu, Yan Zhuang, Haoyang Bi, Zhenya Huang, Weizhe Huang, Jiatong Li, Junhao Yu, Zirui Liu, Zirui Hu, Yuting Hong, Zachary A. Pardos, Hai** Ma, Mengxiao Zhu, Shi** Wang, Enhong Chen

Abstract: Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees, by dynamically adjusting test questions based on their performance. Widely adopted across diverse fields like education, healthcare, sports, and sociology, CAT has revolutionized testing practices. While traditional methods rely on psychometrics and statistics, the increasing c… ▽ More Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees, by dynamically adjusting test questions based on their performance. Widely adopted across diverse fields like education, healthcare, sports, and sociology, CAT has revolutionized testing practices. While traditional methods rely on psychometrics and statistics, the increasing complexity of large-scale testing has spurred the integration of machine learning techniques. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method. By examining the test question selection algorithm at the heart of CAT's adaptivity, we shed light on its functionality. Furthermore, we delve into cognitive diagnosis models, question bank construction, and test control within CAT, exploring how machine learning can optimize these components. Through an analysis of current methods, strengths, limitations, and challenges, we strive to develop robust, fair, and efficient CAT systems. By bridging psychometric-driven CAT research with machine learning, this survey advocates for a more inclusive and interdisciplinary approach to the future of adaptive testing. △ Less

Submitted 4 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2302.06871 [pdf, other]

Learning gain differences between ChatGPT and human tutor generated algebra hints

Authors: Zachary A. Pardos, Shreya Bhandari

Abstract: Large Language Models (LLMs), such as ChatGPT, are quickly advancing AI to the frontiers of practical consumer use and leading industries to re-evaluate how they allocate resources for content production. Authoring of open educational resources and hint content within adaptive tutoring systems is labor intensive. Should LLMs like ChatGPT produce educational content on par with human-authored conte… ▽ More Large Language Models (LLMs), such as ChatGPT, are quickly advancing AI to the frontiers of practical consumer use and leading industries to re-evaluate how they allocate resources for content production. Authoring of open educational resources and hint content within adaptive tutoring systems is labor intensive. Should LLMs like ChatGPT produce educational content on par with human-authored content, the implications would be significant for further scaling of computer tutoring system approaches. In this paper, we conduct the first learning gain evaluation of ChatGPT by comparing the efficacy of its hints with hints authored by human tutors with 77 participants across two algebra topic areas, Elementary Algebra and Intermediate Algebra. We find that 70% of hints produced by ChatGPT passed our manual quality checks and that both human and ChatGPT conditions produced positive learning gains. However, gains were only statistically significant for human tutor created hints. Learning gains from human-created hints were substantially and statistically significantly higher than ChatGPT hints in both topic areas, though ChatGPT participants in the Intermediate Algebra experiment were near ceiling and not even with the control at pre-test. We discuss the limitations of our study and suggest several future directions for the field. Problem and hint content used in the experiment is provided for replicability. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2212.09974 [pdf, other]

doi 10.1145/3576050.3576081

Insights into undergraduate pathways using course load analytics

Authors: Conrad Borchers, Zachary A. Pardos

Abstract: Course load analytics (CLA) inferred from LMS and enrollment features can offer a more accurate representation of course workload to students than credit hours and potentially aid in their course selection decisions. In this study, we produce and evaluate the first machine-learned predictions of student course load ratings and generalize our model to the full 10,000 course catalog of a large publi… ▽ More Course load analytics (CLA) inferred from LMS and enrollment features can offer a more accurate representation of course workload to students than credit hours and potentially aid in their course selection decisions. In this study, we produce and evaluate the first machine-learned predictions of student course load ratings and generalize our model to the full 10,000 course catalog of a large public university. We then retrospectively analyze longitudinal differences in the semester load of student course selections throughout their degree. CLA by semester shows that a student's first semester at the university is among their highest load semesters, as opposed to a credit hour-based analysis, which would indicate it is among their lowest. Investigating what role predicted course load may play in program retention, we find that students who maintain a semester load that is low as measured by credit hours but high as measured by CLA are more likely to leave their program of study. This discrepancy in course load is particularly pertinent in STEM and associated with high prerequisite courses. Our findings have implications for academic advising, institutional handling of the freshman experience, and student-facing analytics to help students better plan, anticipate, and prepare for their selected courses. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted to Learning Analytics and Knowledge (LAK 2023)

arXiv:2105.06604 [pdf, other]

doi 10.1145/3461702.3462623

Towards Equity and Algorithmic Fairness in Student Grade Prediction

Authors: Weijie Jiang, Zachary A. Pardos

Abstract: Equity of educational outcome and fairness of AI with respect to race have been topics of increasing importance in education. In this work, we address both with empirical evaluations of grade prediction in higher education, an important task to improve curriculum design, plan interventions for academic support, and offer course guidance to students. With fairness as the aim, we trial several strat… ▽ More Equity of educational outcome and fairness of AI with respect to race have been topics of increasing importance in education. In this work, we address both with empirical evaluations of grade prediction in higher education, an important task to improve curriculum design, plan interventions for academic support, and offer course guidance to students. With fairness as the aim, we trial several strategies for both label and instance balancing to attempt to minimize differences in algorithm performance with respect to race. We find that an adversarial learning approach, combined with grade label balancing, achieved by far the fairest results. With equity of educational outcome as the aim, we trial strategies for boosting predictive performance on historically underserved groups and find success in sampling those groups in inverse proportion to their historic outcomes. With AI-infused technology supports increasingly prevalent on campuses, our methodologies fill a need for frameworks to consider performance trade-offs with respect to sensitive student attributes and allow institutions to instrument their AI resources in ways that are attentive to equity and fairness. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: Accepted to the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES '21)

arXiv:2103.00163 [pdf]

Unsupervised Representations Predict Popularity of Peer-Shared Artifacts in an Online Learning Environment

Authors: Renzhe Yu, John Scott, Zachary A. Pardos

Abstract: In online collaborative learning environments, students create content and construct their own knowledge through complex interactions over time. To facilitate effective social learning and inclusive participation in this context, insights are needed into the correspondence between student-contributed artifacts and their subsequent popularity among peers. In this study, we represent student artifac… ▽ More In online collaborative learning environments, students create content and construct their own knowledge through complex interactions over time. To facilitate effective social learning and inclusive participation in this context, insights are needed into the correspondence between student-contributed artifacts and their subsequent popularity among peers. In this study, we represent student artifacts by their (a) contextual action logs (b) textual content, and (c) set of instructor-specified features, and use these representations to predict artifact popularity measures. Through a mixture of predictive analysis and visual exploration, we find that the neural embedding representation, learned from contextual action logs, has the strongest predictions of popularity, ahead of instructor's knowledge, which includes academic value and creativity ratings. Because this representation can be learnt without extensive human labeling effort, it opens up possibilities for sha** more inclusive student interactions on the fly in collaboration with instructors and students alike. △ Less

Submitted 27 February, 2021; originally announced March 2021.

arXiv:2102.09377 [pdf, other]

doi 10.1145/3448139.3448173

Learning Skill Equivalencies Across Platform Taxonomies

Authors: Zhi Li, Cheng Ren, Xianyou Li, Zachary A. Pardos

Abstract: Assessment and reporting of skills is a central feature of many digital learning platforms. With students often using multiple platforms, cross-platform assessment has emerged as a new challenge. While technologies such as Learning Tools Interoperability (LTI) have enabled communication between platforms, reconciling the different skill taxonomies they employ has not been solved at scale. In this… ▽ More Assessment and reporting of skills is a central feature of many digital learning platforms. With students often using multiple platforms, cross-platform assessment has emerged as a new challenge. While technologies such as Learning Tools Interoperability (LTI) have enabled communication between platforms, reconciling the different skill taxonomies they employ has not been solved at scale. In this paper, we introduce and evaluate a methodology for finding and linking equivalent skills between platforms by utilizing problem content as well as the platform's clickstream data. We propose six models to represent skills as continuous real-valued vectors and leverage machine translation to map between skill spaces. The methods are tested on three digital learning platforms: ASSISTments, Khan Academy, and Cognitive Tutor. Our results demonstrate reasonable accuracy in skill equivalency prediction from a fine-grained taxonomy to a coarse-grained one, achieving an average recall@5 of 0.8 between the three platforms. Our skill translation approach has implications for aiding in the tedious, manual process of taxonomy to taxonomy map** work, also called crosswalks, within the tutoring as well as standardized testing worlds. △ Less

Submitted 24 February, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: Accepted to Learning Analytics and Knowledge (LAK-2021)

arXiv:1907.01591 [pdf, other]

Combating the Filter Bubble: Designing for Serendipity in a University Course Recommendation System

Authors: Zachary A. Pardos, Weijie Jiang

Abstract: Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a "filter bubble." In this paper, we grapple with the issue of the filter bubble in the c… ▽ More Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a "filter bubble." In this paper, we grapple with the issue of the filter bubble in the context of a course recommendation system in production at a public university. Most universities in the United States encourage students to explore develo** interests while simultaneously advising them to adhere to course taking norms which progress them towards graduation. These competing objectives, and the stakes involved for students, make this context a particularly meaningful one for investigating real-world recommendation strategies. We introduce a novel modification to the skip-gram model applied to nine years of historic course enrollment sequences to learn course vector representations used to diversify recommendations based on similarity to a student's specified favorite course. This model, which we call multifactor2vec, is intended to improve the semantics of the primary token embedding by also learning embeddings of potentially conflated factors of the token (e.g., instructor). Our offline testing found this model improved accuracy and recall on our course similarity and analogy validation sets over a standard skip-gram. Incorporating course catalog description text resulted in further improvements. We compare the performance of these models to the system's existing RNN-based recommendations with a user study of undergraduates (N = 70) rating six characteristics of their course recommendations. Results of the user study show a dramatic lack of novelty in RNN recommendations and depict the characteristic trade-offs that make serendipity difficult to achieve. △ Less

Submitted 2 July, 2019; originally announced July 2019.

arXiv:1812.10078 [pdf, other]

Goal-based Course Recommendation

Authors: Weijie Jiang, Zachary A. Pardos, Qiang Wei

Abstract: With cross-disciplinary academic interests increasing and academic advising resources over capacity, the importance of exploring data-assisted methods to support student decision making has never been higher. We build on the findings and methodologies of a quickly develo** literature around prediction and recommendation in higher education and develop a novel recurrent neural network-based recom… ▽ More With cross-disciplinary academic interests increasing and academic advising resources over capacity, the importance of exploring data-assisted methods to support student decision making has never been higher. We build on the findings and methodologies of a quickly develo** literature around prediction and recommendation in higher education and develop a novel recurrent neural network-based recommendation system for suggesting courses to help students prepare for target courses of interest, personalized to their estimated prior knowledge background and zone of proximal development. We validate the model using tests of grade prediction and the ability to recover prerequisite relationships articulated by the university. In the third validation, we run the fully personalized recommendation for students the semester before taking a historically difficult course and observe differential overlap with our would-be suggestions. While not proof of causal effectiveness, these three evaluation perspectives on the performance of the goal-based model build confidence and bring us one step closer to deployment of this personalized course preparation affordance in the wild. △ Less

Submitted 25 December, 2018; originally announced December 2018.

arXiv:1811.07974 [pdf, other]

A Map of Knowledge

Authors: Zachary A. Pardos, Andrew Joo Hun Nam

Abstract: Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational str… ▽ More Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure, comprised of elements from digital records of our interactions, can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel map** of the literal descriptions of elements onto this behaviorally informed representation. We use the course enrollment behaviors of 124,000 students at a public university to learn vector representations of its courses. From these behaviorally informed representations, a notable 88% of course attribute information were recovered (e.g., department and division), as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Math H1B as Physics 7B is to Physics H7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions. We find that the representations learned from enrollments resolved course vectors to a level of semantic fidelity exceeding that of their catalog descriptions, depicting a vector space of high conceptual rationality. We end with a discussion of the possible mechanisms by which this knowledge structure may be informed and its implications for data science. △ Less

Submitted 19 November, 2018; originally announced November 2018.

arXiv:1803.09535 [pdf, other]

Connectionist Recommendation in the Wild: On the utility and scrutability of neural networks for personalized course guidance

Authors: Zachary A. Pardos, Zihao Fan, Weijie Jiang

Abstract: The aggregate behaviors of users can collectively encode deep semantic information about the objects with which they interact. In this paper, we demonstrate novel ways in which the synthesis of these data can illuminate the terrain of users' environment and support them in their decision making and wayfinding. A novel application of Recurrent Neural Networks and skip-gram models, approaches popula… ▽ More The aggregate behaviors of users can collectively encode deep semantic information about the objects with which they interact. In this paper, we demonstrate novel ways in which the synthesis of these data can illuminate the terrain of users' environment and support them in their decision making and wayfinding. A novel application of Recurrent Neural Networks and skip-gram models, approaches popularized by their application to modeling language, are brought to bear on student university enrollment sequences to create vector representations of courses and map out traversals across them. We present demonstrations of how scrutability from these neural networks can be gained and how the combination of these techniques can be seen as an evolution of content tagging and a means for a recommender to balance user preferences inferred from data with those explicitly specified. From validation of the models to the development of a UI, we discuss additional requisite functionality informed by the results of a usability study leading to the ultimate deployment of the system at a university. △ Less

Submitted 31 December, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

arXiv:1710.06654 [pdf]

Analysis of Student Behaviour in Habitable Worlds Using Continuous Representation Visualization

Authors: Zachary A. Pardos, Lev Horodyskyj

Abstract: We introduce a novel approach to visualizing temporal clickstream behaviour in the context of a degree-satisfying online course, Habitable Worlds, offered through Arizona State University. The current practice for visualizing behaviour within a digital learning environment has been to generate plots based on hand engineered or coded features using domain knowledge. While this approach has been eff… ▽ More We introduce a novel approach to visualizing temporal clickstream behaviour in the context of a degree-satisfying online course, Habitable Worlds, offered through Arizona State University. The current practice for visualizing behaviour within a digital learning environment has been to generate plots based on hand engineered or coded features using domain knowledge. While this approach has been effective in relating behaviour to known phenomena, features crafted from domain knowledge are not likely well suited to make unfamiliar phenomena salient and thus can preclude discovery. We introduce a methodology for organically surfacing behavioural regularities from clickstream data, conducting an expert in-the-loop hyperparameter search, and identifying anticipated as well as newly discovered patterns of behaviour. While these visualization techniques have been used before in the broader machine learning community to better understand neural networks and relationships between word vectors, we apply them to online behavioural learner data and go a step further; exploring the impact of the parameters of the model on producing tangible, non-trivial observations of behaviour that are suggestive of pedagogical improvement to the course designers and instructors. The methodology introduced in this paper led to an improved understanding of passing and non-passing student behaviour in the course and is widely applicable to other datasets of clickstream activity where investigators and stakeholders wish to organically surface principal patterns of behaviour. △ Less

Submitted 25 December, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

arXiv:1608.04789 [pdf, other]

Modelling Student Behavior using Granular Large Scale Action Data from a MOOC

Authors: Steven Tang, Joshua C. Peterson, Zachary A. Pardos

Abstract: Digital learning environments generate a precise record of the actions learners take as they interact with learning materials and complete exercises towards comprehension. With this high quantity of sequential data comes the potential to apply time series models to learn about underlying behavioral patterns and trends that characterize successful learning based on the granular record of student ac… ▽ More Digital learning environments generate a precise record of the actions learners take as they interact with learning materials and complete exercises towards comprehension. With this high quantity of sequential data comes the potential to apply time series models to learn about underlying behavioral patterns and trends that characterize successful learning based on the granular record of student actions. There exist several methods for looking at longitudinal, sequential data like those recorded from learning environments. In the field of language modelling, traditional n-gram techniques and modern recurrent neural network (RNN) approaches have been applied to algorithmically find structure in language and predict the next word given the previous words in the sentence or paragraph as input. In this paper, we draw an analogy to this work by treating student sequences of resource views and interactions in a MOOC as the inputs and predicting students' next interaction as outputs. In this study, we train only on students who received a certificate of completion. In doing so, the model could potentially be used for recommendation of sequences eventually leading to success, as opposed to perpetuating unproductive behavior. Given that the MOOC used in our study had over 3,500 unique resources, predicting the exact resource that a student will interact with next might appear to be a difficult classification problem. We find that simply following the syllabus (built-in structure of the course) gives on average 23% accuracy in making this prediction, followed by the n-gram method with 70.4%, and RNN based methods with 72.2%. This research lays the ground work for recommendation in a MOOC and other digital learning environments where high volumes of sequential data exist. △ Less

Submitted 16 August, 2016; originally announced August 2016.

Comments: 15 pages, 7 tables, 3 figures

arXiv:1509.06163 [pdf]

The Utility of Clustering in Prediction Tasks

Authors: Shubhendu Trivedi, Zachary A. Pardos, Neil T. Heffernan

Abstract: We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number o… ▽ More We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number of datasets, run k-means at different scales and for each scale we train predictors. This produces k sets of predictions. These predictions are then combined by a naïve ensemble. We observed that this use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets. We believe this indicates the predictive utility of exploiting structure in the data and the data compression handed over by clustering. We also found that using this method improves upon the prediction of even a Random Forests predictor which suggests this method is providing a novel, and useful source of variance in the prediction process. △ Less

Submitted 21 September, 2015; originally announced September 2015.

Comments: An experimental research report, dated 11 September 2011

Showing 1–15 of 15 results for author: Pardos, Z A