Search | arXiv e-print repository

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Authors: Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

Abstract: Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1… ▽ More Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 13%, with several families of models (e.g., Phi and Mistral) showing evidence of systematic overfitting across almost all model sizes. At the same time, many models, especially those on the frontier, (e.g., Gemini/GPT/Claude) show minimal signs of overfitting. Further analysis suggests a positive relationship (Spearman's r^2=0.32) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that many models may have partially memorized GSM8k. △ Less

Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2303.12748 [pdf, other]

Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models

Authors: Will LeVine, Benjamin Pikus, Pranav Raja, Fernando Amat Gil

Abstract: Calibration of deep learning models is crucial to their trustworthiness and safe usage, and as such, has been extensively studied in supervised classification models, with methods crafted to decrease miscalibration. However, there has yet to be a comprehensive study of the calibration of vision-language models that are used for zero-shot inference, like CLIP. We measure calibration across relevant… ▽ More Calibration of deep learning models is crucial to their trustworthiness and safe usage, and as such, has been extensively studied in supervised classification models, with methods crafted to decrease miscalibration. However, there has yet to be a comprehensive study of the calibration of vision-language models that are used for zero-shot inference, like CLIP. We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. Furthermore, we propose a modified version of temperature scaling that is aligned with the common use cases of CLIP as a zero-shot inference model, and show that a single learned temperature generalizes for each specific CLIP model (defined by a chosen pre-training dataset and architecture) across inference dataset and prompt choice. △ Less

Submitted 18 April, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

arXiv:2208.02394 [pdf, other]

doi 10.1016/j.compag.2022.107081

End-to-end deep learning for directly estimating grape yield from ground-based imagery

Authors: Alexander G. Olenskyj, Brent S. Sams, Zhenghao Fei, Vishal Singh, Pranav V. Raja, Gail M. Bornhorst, J. Mason Earles

Abstract: Yield estimation is a powerful tool in vineyard management, as it allows growers to fine-tune practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Continuous data collection usin… ▽ More Yield estimation is a powerful tool in vineyard management, as it allows growers to fine-tune practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Continuous data collection using a vehicle-mounted sensing kit combined with collection of ground truth yield data at harvest using a commercial yield monitor allowed for the generation of a large dataset of 23,581 yield points and 107,933 images. Moreover, this study was conducted in a mechanically managed commercial vineyard, representing a challenging environment for image analysis but a common set of conditions in the California Central Valley. Three model architectures were tested: object detection, CNN regression, and transformer models. The object detection model was trained on hand-labeled images to localize grape bunches, and either bunch count or pixel area was summed to correlate with grape yield. Conversely, regression models were trained end-to-end to predict grape yield from image data without the need for hand labeling. Results demonstrated that both a transformer as well as the object detection model with pixel area processing performed comparably, with a mean absolute percent error of 18% and 18.5%, respectively on a representative holdout dataset. Saliency map** was used to demonstrate the attention of the CNN model was localized near the predicted location of grape bunches, as well as on the top of the grapevine canopy. Overall, the study showed the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale. Additionally, the end-to-end modeling approach was able to perform comparably to the object detection approach while eliminating the need for hand-labeling. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Journal ref: Comput. Electron. Agric. 198 (2022)

arXiv:2203.09674 [pdf]

A workflow for segmenting soil and plant X-ray CT images with deep learning in Googles Colaboratory

Authors: Devin A. Rippner, Pranav Raja, J. Mason Earles, Alexander Buchko, Mina Momayyezi, Fiona Duong, Dilworth Parkinson, Elizabeth Forrestel, Ken Shackel, Jeffrey Neyhart, Andrew J. McElrone

Abstract: X-ray micro-computed tomography (X-ray microCT) has enabled the characterization of the properties and processes that take place in plants and soils at the micron scale. Despite the widespread use of this advanced technique, major limitations in both hardware and software limit the speed and accuracy of image processing and data analysis. Recent advances in machine learning, specifically the appli… ▽ More X-ray micro-computed tomography (X-ray microCT) has enabled the characterization of the properties and processes that take place in plants and soils at the micron scale. Despite the widespread use of this advanced technique, major limitations in both hardware and software limit the speed and accuracy of image processing and data analysis. Recent advances in machine learning, specifically the application of convolutional neural networks to image analysis, have enabled rapid and accurate segmentation of image data. Yet, challenges remain in applying convolutional neural networks to the analysis of environmentally and agriculturally relevant images. Specifically, there is a disconnect between the computer scientists and engineers, who build these AI/ML tools, and the potential end users in agricultural research, who may be unsure of how to apply these tools in their work. Additionally, the computing resources required for training and applying deep learning models are unique, more common to computer gaming systems or graphics design work, than to traditional computational systems. To navigate these challenges, we developed a modular workflow for applying convolutional neural networks to X-ray microCT images, using low-cost resources in Googles Colaboratory web application. Here we present the results of the workflow, illustrating how parameters can be optimized to achieve best results using example scans from walnut leaves, almond flower buds, and a soil aggregate. We expect that this framework will accelerate the adoption and use of emerging deep learning techniques within the plant and soil sciences. △ Less

Submitted 21 July, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: 58 pages, 9 figures, 2 Tables

arXiv:2112.03205 [pdf, other]

Simultaneously Predicting Multiple Plant Traits from Multiple Sensors via Deformable CNN Regression

Authors: Pranav Raja, Alex Olenskyj, Hamid Kamangir, Mason Earles

Abstract: Trait measurement is critical for the plant breeding and agricultural production pipeline. Typically, a suite of plant traits is measured using laborious manual measurements and then used to train and/or validate higher throughput trait estimation techniques. Here, we introduce a relatively simple convolutional neural network (CNN) model that accepts multiple sensor inputs and predicts multiple co… ▽ More Trait measurement is critical for the plant breeding and agricultural production pipeline. Typically, a suite of plant traits is measured using laborious manual measurements and then used to train and/or validate higher throughput trait estimation techniques. Here, we introduce a relatively simple convolutional neural network (CNN) model that accepts multiple sensor inputs and predicts multiple continuous trait outputs - i.e. a multi-input, multi-output CNN (MIMO-CNN). Further, we introduce deformable convolutional layers into this network architecture (MIMO-DCNN) to enable the model to adaptively adjust its receptive field, model complex variable geometric transformations in the data, and fine-tune the continuous trait outputs. We examine how the MIMO-CNN and MIMO-DCNN models perform on a multi-input (i.e. RGB and depth images), multi-trait output lettuce dataset from the 2021 Autonomous Greenhouse Challenge. Ablation studies were conducted to examine the effect of using single versus multiple inputs, and single versus multiple outputs. The MIMO-DCNN model resulted in a normalized mean squared error (NMSE) of 0.068 - a substantial improvement over the top 2021 leaderboard score of 0.081. Open-source code is provided. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:1112.2265 [pdf]

A Novel Approach for Password Authentication Using Bidirectional Associative Memory

Authors: A. S. N. Chakravarthy, Penmetsa V. Krishna Raja, Prof. P. S. Avadhani

Abstract: Password authentication is a very important system security procedure to gain access to user resources. In the Traditional password authentication methods a server has check the authenticity of the users. In our proposed method users can freely select their passwords from a predefined character set. They can also use a graphical image as password. The password may be a character or an image it wil… ▽ More Password authentication is a very important system security procedure to gain access to user resources. In the Traditional password authentication methods a server has check the authenticity of the users. In our proposed method users can freely select their passwords from a predefined character set. They can also use a graphical image as password. The password may be a character or an image it will be converted into binary form and the binary values will be normalized. Associative memories have been used recently for password authentication in order to overcome drawbacks of the traditional password authentication methods. In this paper we proposed a method using Bidirectional Associative Memory algorithm for both alphanumeric (Text) and graphical password. By doing so the amount of security what we provide for the user can be enhanced. This paper along with test results show that converting user password in to Probabilistic values and giving them as input for BAM improves the security of the system △ Less

Submitted 10 December, 2011; originally announced December 2011.

Comments: 13 pages

Journal ref: Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.6, November 2011

arXiv:1110.1502 [pdf]

Hilbert Matrix Based Cryptosystem using a Session Key

Authors: Penmetsa V. Krishna Raja, A. S. N. Chakravarthy, P. S. Avadhani

Abstract: Cryptography protects users by providing functionality for the encryption of data and authentication of other users. This technology lets the receiver of an electronic message verify the sender, ensures that a message can be read only by the intended person, and assures the recipient that a message has not be altered in transit. Classical cryptanalysis involves an interesting combination of analyt… ▽ More Cryptography protects users by providing functionality for the encryption of data and authentication of other users. This technology lets the receiver of an electronic message verify the sender, ensures that a message can be read only by the intended person, and assures the recipient that a message has not be altered in transit. Classical cryptanalysis involves an interesting combination of analytical reasoning, application of mathematical tools and pattern finding. The objectives of the proposed work are to propose a new cryptographic method based on the special matrix called the Hilbert matrix for authentication and confidentiality and to propose a model for confidentiality and authentication using shared key cryptosystems with the concept of digital envelo** using a session key. In the present work various algorithms are presented for encryption and authentication based on Hilbert matrix using a session key. △ Less

Submitted 7 October, 2011; originally announced October 2011.

Comments: five pages

Journal ref: International Journal of Engineering Research and Applications (IJERA) Vol. 1, Issue 3, 2011, pp.711-715

arXiv:1110.1498 [pdf]

A Cryptosystem Based on Hilbert Matrix using Cipher Block Chaining Mode

Authors: Penmetsa V. Krishna Raja, A. S. N. Chakravarthy, P. S. Avadhani

Abstract: Cryptography is the science of using mathematics to encrypt and decrypt data. Cryptography enables you to store sensitive information or transmit it across insecure networks so that it cannot be read by anyone except the intended recipient. While cryptography is the science of securing data, cryptanalysis is the science of analyzing and breaking secure communication. Classical cryptanalysis involv… ▽ More Cryptography is the science of using mathematics to encrypt and decrypt data. Cryptography enables you to store sensitive information or transmit it across insecure networks so that it cannot be read by anyone except the intended recipient. While cryptography is the science of securing data, cryptanalysis is the science of analyzing and breaking secure communication. Classical cryptanalysis involves an interesting combination of analytical reasoning, application of mathematical tools and pattern finding. The objectives of the proposed work are to propose a new cryptographic method based on the special matrix called the Hilbert matrix for authentication and confidentiality and to propose a model for confidentiality and authentication using a combination of symmetric and public cryptosystems. Further, it is extended to shared key cryptosystems with the concept of digital envelo** using a session key. In the present work an algorithm for shared key encryption is developed using Hilbert matrix cryptosystem. In this the block chaining modes of operation have been used to tackle the issues of confusion and diffusion. △ Less

Submitted 7 October, 2011; originally announced October 2011.

Comments: six pages; International Journal of Mathematics Trends and Technology- July to Aug Issue 2011

arXiv:1110.1490 [pdf]

A Novel Approach for Pass Word Authentication using Brain -State -In -A Box (BSB) Model

Authors: A. S. N. Chakravarthy, Penmetsa V. Krishna Raja, P. S Avadhani

Abstract: Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle a… ▽ More Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle attacks), and is often wrapped up with authenticating identity. Password authentication using Brain-State -In-A Box is presented in this paper. Here in this paper we discuss Brain-State -In-A Box Scheme for Textual and graphical passwords which will be converted in to probabilistic values Password. We observe how to get password authentication Probabilistic values for Text and Graphical image. This study proposes the use of a Brain-State -In-A Box technique for password authentication. In comparison to existing layered neural network techniques, the proposed method provides better accuracy and quicker response time to registration and password changes. △ Less

Submitted 7 October, 2011; originally announced October 2011.

Comments: five pages

Journal ref: International Journal of Computer Science and Information Technologies (IJCSIT), Volume 2 Issue 5 2011,2127-2131

arXiv:1110.1488 [pdf]

Handwritten Text Image Authentication using Back Propagation

Authors: A. S. N. Chakravarthy, Penmetsa V. Krishna Raja, P. S. Avadhani

Abstract: Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle a… ▽ More Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artefact, ensuring that a product is what it's packaging and labelling claims to be, or assuring that a computer program is a trusted one. The authentication of information can pose special problems (especially man-in-the-middle attacks), and is often wrapped up with authenticating identity. Literary can involve imitating the style of a famous author. If an original manuscript, typewritten text, or recording is available, then the medium itself (or its packaging - anything from a box to e-mail headers) can help prove or disprove the authenticity of the document. The use of digital images of handwritten historical documents has become more popular in recent years. Volunteers around the world now read thousands of these images as part of their indexing process. Handwritten text images of old documents are sometimes difficult to read or noisy due to the preservation of the document and quality of the image [1]. Handwritten text offers challenges that are rarely encountered in machine-printed text. In addition, most problems faced in reading machine- printed text (e.g., character recognition, word segmentation, letter segmentation, etc.) are more severe, in handwritten text. In this paper we Here in this paper we proposed a method for authenticating hand written text images using back propagation algorithm.. △ Less

Submitted 7 October, 2011; originally announced October 2011.

Comments: 10 pages pdf file

Journal ref: International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.5, Sep 2011

Showing 1–10 of 10 results for author: Raja, P