Search | arXiv e-print repository

Investigating Automatic Scoring and Feedback using Large Language Models

Authors: Gloria Ashiya Katuka, Alexander Gain, Yen-Yun Yu

Abstract: Automatic grading and feedback have been long studied using traditional machine learning and deep learning techniques using language models. With the recent accessibility to high performing large language models (LLMs) like LLaMA-2, there is an opportunity to investigate the use of these LLMs for automatic grading and feedback generation. Despite the increase in performance, LLMs require significa… ▽ More Automatic grading and feedback have been long studied using traditional machine learning and deep learning techniques using language models. With the recent accessibility to high performing large language models (LLMs) like LLaMA-2, there is an opportunity to investigate the use of these LLMs for automatic grading and feedback generation. Despite the increase in performance, LLMs require significant computational resources for fine-tuning and additional specific adjustments to enhance their performance for such tasks. To address these issues, Parameter Efficient Fine-tuning (PEFT) methods, such as LoRA and QLoRA, have been adopted to decrease memory and computational requirements in model fine-tuning. This paper explores the efficacy of PEFT-based quantized models, employing classification or regression head, to fine-tune LLMs for automatically assigning continuous numerical grades to short answers and essays, as well as generating corresponding feedback. We conducted experiments on both proprietary and open-source datasets for our tasks. The results show that prediction of grade scores via finetuned LLMs are highly accurate, achieving less than 3% error in grade percentage on average. For providing graded feedback fine-tuned 4-bit quantized LLaMA-2 13B models outperform competitive base models and achieve high similarity with subject matter expert feedback in terms of high BLEU and ROUGE scores and qualitatively in terms of feedback. The findings from this study provide important insights into the impacts of the emerging capabilities of using quantization approaches to fine-tune LLMs for various downstream tasks, such as automatic short answer scoring and feedback generation at comparatively lower costs and latency. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2102.11343 [pdf, other]

Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Map**

Authors: Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille

Abstract: Catastrophic forgetting in neural networks is a significant problem for continual learning. A majority of the current methods replay previous data during training, which violates the constraints of an ideal continual learning system. Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i.e. the worsening ability to discriminate between data fro… ▽ More Catastrophic forgetting in neural networks is a significant problem for continual learning. A majority of the current methods replay previous data during training, which violates the constraints of an ideal continual learning system. Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i.e. the worsening ability to discriminate between data from different tasks. In our work, we introduce Relevance Map** Networks (RMNs) which are inspired by the Optimal Overlap Hypothesis. The map**s reflects the relevance of the weights for the task at hand by assigning large weights to essential parameters. We show that RMNs learn an optimized representational overlap that overcomes the twin problem of catastrophic forgetting and remembering. Our approach achieves state-of-the-art performance across all common continual learning datasets, even significantly outperforming data replay methods while not violating the constraints for an ideal continual learning system. Moreover, RMNs retain the ability to detect data from new tasks in an unsupervised manner, thus proving their resilience against catastrophic remembering. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:1910.05585 [pdf, other]

doi 10.1016/j.cma.2020.112930

Adaptive Mesh Refinement for Topology Optimization with Discrete Geometric Components

Authors: Shanglong Zhang, Arun L. Gain, Julian A. Norato

Abstract: This work introduces an Adaptive Mesh Refinement (AMR) strategy for the topology optimization of structures made of discrete geometric components using the geometry projection method. Practical structures made of geometric shapes such as bars and plates typically exhibit low volume fractions with respect to the volume of the design region they occupy. To maintain an accurate analysis and to ensure… ▽ More This work introduces an Adaptive Mesh Refinement (AMR) strategy for the topology optimization of structures made of discrete geometric components using the geometry projection method. Practical structures made of geometric shapes such as bars and plates typically exhibit low volume fractions with respect to the volume of the design region they occupy. To maintain an accurate analysis and to ensure well-defined sensitivities in the geometry projection, it is required that the element size is smaller than the smallest dimension of each component. For low-volume-fraction structures, this leads to finite element meshes with very large numbers of elements. To improve the efficiency of the analysis and optimization, we propose a strategy to adaptively refine the mesh and reduce the number of elements by having a finer mesh on the geometric components, and a coarser mesh away from them. The refinement indicator stems very naturally from the geometry projection and is thus straightforward to implement. We demonstrate the effectiveness of the proposed AMR method by performing topology optimization for the design of minimum-compliance and stress-constrained structures made of bars and plates. △ Less

Submitted 12 October, 2019; originally announced October 2019.

Comments: 21 pages, 21 figures

MSC Class: 74P05; 49Q10; 74S05 ACM Class: J.2; J.6

arXiv:1905.11515 [pdf, other]

Abstraction Mechanisms Predict Generalization in Deep Neural Networks

Authors: Alex Gain, Hava Siegelmann

Abstract: A longstanding problem for Deep Neural Networks (DNNs) is understanding their puzzling ability to generalize well. We approach this problem through the unconventional angle of \textit{cognitive abstraction mechanisms}, drawing inspiration from recent neuroscience work, allowing us to define the Cognitive Neural Activation metric (CNA) for DNNs, which is the correlation between information complexi… ▽ More A longstanding problem for Deep Neural Networks (DNNs) is understanding their puzzling ability to generalize well. We approach this problem through the unconventional angle of \textit{cognitive abstraction mechanisms}, drawing inspiration from recent neuroscience work, allowing us to define the Cognitive Neural Activation metric (CNA) for DNNs, which is the correlation between information complexity (entropy) of given input and the concentration of higher activation values in deeper layers of the network. The CNA is highly predictive of generalization ability, outperforming norm-and-margin-based generalization metrics on an extensive evaluation of over 100 dataset-and-network-architecture combinations, especially in cases where additive noise is present and/or training labels are corrupted. These strong empirical results show the usefulness of CNA as a generalization metric, and encourage further research on the connection between information complexity and representations in the deeper layers of networks in order to better understand the generalization capabilities of DNNs. △ Less

Submitted 16 April, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

arXiv:1902.00159 [pdf, other]

Compressing GANs using Knowledge Distillation

Authors: Angeline Aguinaldo, **-Yeh Chiang, Alex Gain, Ameya Patil, Kolten Pearson, Soheil Feizi

Abstract: Generative Adversarial Networks (GANs) have been used in several machine learning tasks such as domain transfer, super resolution, and synthetic data generation. State-of-the-art GANs often use tens of millions of parameters, making them expensive to deploy for applications in low SWAP (size, weight, and power) hardware, such as mobile devices, and for applications with real time capabilities. The… ▽ More Generative Adversarial Networks (GANs) have been used in several machine learning tasks such as domain transfer, super resolution, and synthetic data generation. State-of-the-art GANs often use tens of millions of parameters, making them expensive to deploy for applications in low SWAP (size, weight, and power) hardware, such as mobile devices, and for applications with real time capabilities. There has been no work found to reduce the number of parameters used in GANs. Therefore, we propose a method to compress GANs using knowledge distillation techniques, in which a smaller "student" GAN learns to mimic a larger "teacher" GAN. We show that the distillation methods used on MNIST, CIFAR-10, and Celeb-A datasets can compress teacher GANs at ratios of 1669:1, 58:1, and 87:1, respectively, while retaining the quality of the generated image. From our experiments, we observe a qualitative limit for GAN's compression. Moreover, we observe that, with a fixed parameter budget, compressed GANs outperform GANs trained using standard training methods. We conjecture that this is partially owing to the optimization landscape of over-parameterized GANs which allows efficient training using alternating gradient descent. Thus, training an over-parameterized GAN followed by our proposed compression scheme provides a high quality generative model with a small number of parameters. △ Less

Submitted 31 January, 2019; originally announced February 2019.

arXiv:1312.7016 [pdf]

doi 10.1016/j.cma.2015.05.007

Topology Optimization Using Polytopes

Authors: Arun L. Gain, Glaucio H. Paulino, Leonardo Duarte, Ivan F. M. Menezes

Abstract: Meshing complex engineering domains is a challenging task. Arbitrary polyhedral meshes can provide the much needed flexibility in automated discretization of such domains. The geometric property of the polyhedral meshes such as the unstructured nature and the facial connectivity between elements makes them specially attractive for topology optimization applications. Numerical anomalies in designs… ▽ More Meshing complex engineering domains is a challenging task. Arbitrary polyhedral meshes can provide the much needed flexibility in automated discretization of such domains. The geometric property of the polyhedral meshes such as the unstructured nature and the facial connectivity between elements makes them specially attractive for topology optimization applications. Numerical anomalies in designs such as the single node connections and checkerboard pattern, which are difficult to manufacture physically, are naturally alleviated with polyhedrons. Special interpolants such as Wachspress, mean value coordinates, maximum entropy shape functions are available to handle arbitrary shaped elements. But the finite elements approaches based on these shape functions face some challenges such as accurate and efficient computation of the shape functions and their derivatives for the numerical evaluation of the weak form integrals. In the current work, we solve the governing three-dimensional elasticity state equation using a Virtual Element Method (VEM) approach. The main characteristic difference between VEM and standard finite element methods (FEM) is that in VEM the canonical basis functions are not constructed explicitly. Rather the stiffness matrix is computed directly utilizing a projection map which extracts the linear component of the deformation. Such a construction guarantees the satisfaction of the patch test (used by engineers as an indicator of optimal convergence of numerical solutions under mesh refinement). Finally, the computations reduce to the evaluation of matrices which contain purely geometric surface facet quantities. The present work focuses on the first-order VEM in which the degrees of freedom associated with the vertices. Utilizing polyhedral elements for topology optimization, we show that the mesh bias in the member orientation is alleviated. △ Less

Submitted 25 December, 2013; originally announced December 2013.

Journal ref: Comput Methods Appl Mech Eng 293 (2015) 411-430

arXiv:1311.0932 [pdf]

doi 10.1016/j.cma.2014.05.005

On the Virtual Element Method for Three-Dimensional Elasticity Problems on Arbitrary Polyhedral Meshes

Authors: Arun L. Gain, Cameron Talischi, Glaucio H. Paulino

Abstract: We explore the recently-proposed Virtual Element Method (VEM) for numerical solution of boundary value problems on arbitrary polyhedral meshes. More specifically, we focus on the elasticity equations in three-dimensions and elaborate upon the key concepts underlying the first-order VEM. While the point of departure is a conforming Galerkin framework, the distinguishing feature of VEM is that it do… ▽ More We explore the recently-proposed Virtual Element Method (VEM) for numerical solution of boundary value problems on arbitrary polyhedral meshes. More specifically, we focus on the elasticity equations in three-dimensions and elaborate upon the key concepts underlying the first-order VEM. While the point of departure is a conforming Galerkin framework, the distinguishing feature of VEM is that it does not require an explicit computation of the trial and test spaces, thereby circumventing a barrier to standard finite element discretizations on arbitrary grids. At the heart of the method is a particular kinematic decomposition of element deformation states which, in turn, leads to a corresponding decomposition of strain energy. By capturing the energy of linear deformations exactly, one can guarantee satisfaction of the engineering patch test and optimal convergence of numerical solutions. The decomposition itself is enabled by local projection maps that appropriately extract the rigid body motion and constant strain components of the deformation. As we show, computing these projection maps and subsequently the local stiffness matrices, in practice, reduces to the computation of purely geometric quantities. In addition to discussing aspects of implementation of the method, we present several numerical studies in order to verify convergence of the VEM and evaluate its performance for various types of meshes. △ Less

Submitted 4 November, 2013; originally announced November 2013.

Journal ref: Comput Methods Appl Mech Eng 282 (2014) 132-160

arXiv:0709.3085 [pdf, ps, other]

doi 10.1086/523656

The Period Changes of the Cepheid RT Aurigae

Authors: David G. Turner, Ivan S. Bryukhanov, Igor I. Balyuk, Alexey M. Gain, Roman A. Grabovsky, Valery D. Grigorenko, Igor V. Klochko, Attila Kosa-Kiss, Alexey S. Kosinsky, Ivan J. Kushmar, Vyacheslav T. Mamedov, Natalya A. Narkevich, Andrey J. Pogosyants, Andrey S. Semenyuta, Ivan M. Sergey, Vladimir V. Schukin, Jury B. Strigelsky, Valentina G. Tamello, David J. Lane, Daniel J. Majaess

Abstract: Observations of the light curve for the 3.7-day Cepheid RT Aur both before and since 1980 indicate that the variable is undergoing an overall period increase, amounting to +0.082 +-0.012 s/yr, rather than a period decrease, as implied by all observations prior to 1980. Superposed on the star's O-C variations is a sinusoidal trend that cannot be attributed to random fluctuations in pulsation peri… ▽ More Observations of the light curve for the 3.7-day Cepheid RT Aur both before and since 1980 indicate that the variable is undergoing an overall period increase, amounting to +0.082 +-0.012 s/yr, rather than a period decrease, as implied by all observations prior to 1980. Superposed on the star's O-C variations is a sinusoidal trend that cannot be attributed to random fluctuations in pulsation period. Rather, it appears to arise from light travel time effects in a binary system. The derived orbital period for the system is P = 26,429 +-89 days (72.36 +-0.24 years). The inferred orbital parameters from the O-C residuals differ from those indicated by existing radial velocity data. The latter imply the most reasonable results, namely a1 sin i = 9.09 (+-1.81) x 10^8 km and a minimum secondary mass of M2 = 1.15 +-0.25 Msun. Continued monitoring of the brightness and radial velocity changes in the Cepheid are necessary to confirm the long-term trend and to provide data for a proper spectroscopic solution to the orbit. △ Less

Submitted 19 September, 2007; originally announced September 2007.

Comments: Accepted for publication in PASP (November 2007)

Showing 1–8 of 8 results for author: Gain, A