Search | arXiv e-print repository

Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

Abstract: Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as h… ▽ More Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at children from grades 1-12, that tests children's deeper mathematical abilities using puzzles that are appropriately gauged to their age and skills. Using the puzzles from MK, we created a dataset, dubbed SMART-840, consisting of 840 problems from years 2020-2024. With our dataset, we analyze LVLMs power on mathematical reasoning; their responses on our puzzles offer a direct way to compare against that of children. Our results show that modern LVLMs do demonstrate increasingly powerful reasoning skills in solving problems for higher grades, but lack the foundations to correctly answer problems designed for younger children. Further analysis shows that there is no significant correlation between the reasoning capabilities of AI models and that of young children, and their capabilities appear to be based on a different type of reasoning than the cumulative knowledge that underlies children's mathematics and logic skills. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2402.18182 [pdf, ps, other]

Handling Open Research Data within the Max Planck Society -- Looking Closer at the Year 2020

Authors: Martin Boosen, Michael Franke, Yves Vincent Grossmann, Sy Dat Ho, Larissa Leiminger, Jan Matthiesen

Abstract: This paper analyses the practice of publishing research data within the Max Planck Society in the year 2020. The central finding of the study is that up to 40\% of the empirical text publications had research data available. The aggregation of the available data is predominantly analysed. There are differences between the sections of the Max Planck Society but they are not as great as one might ex… ▽ More This paper analyses the practice of publishing research data within the Max Planck Society in the year 2020. The central finding of the study is that up to 40\% of the empirical text publications had research data available. The aggregation of the available data is predominantly analysed. There are differences between the sections of the Max Planck Society but they are not as great as one might expect. In the case of the journals, it is also apparent that a data policy can increase the availability of data related to textual publications. Finally, we found that the statement on data availability "upon (reasonable) request" does not work. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:0708.2010 [pdf, ps, other]

doi 10.1051/0004-6361:20078267

A search for solar-like oscillations in K giants in the globular cluster M4

Authors: S. Frandsen, H. Bruntt, F. Grundahl, G. Kopacki, H. Kjeldsen, T. Arentoft, D. Stello, T. R. Bedding, A. P. Jacob, R. L. Gilliland, P. D. Edmonds, E. Michel, J. Matthiesen

Abstract: To expand the range in the colour-magnitude diagram where asteroseismology can be applied, we organized a photometry campaign to find evidence for solar-like oscillations in giant stars in the globular cluster M4. The aim was to detect the comb-like p-mode structure characteristic for solar-like oscillations in the amplitude spectra. The two dozen main target stars are in the region of the bump… ▽ More To expand the range in the colour-magnitude diagram where asteroseismology can be applied, we organized a photometry campaign to find evidence for solar-like oscillations in giant stars in the globular cluster M4. The aim was to detect the comb-like p-mode structure characteristic for solar-like oscillations in the amplitude spectra. The two dozen main target stars are in the region of the bump stars and have luminosities in the range 50-140 Lsun. We collected 6160 CCD frames and light curves for about 14000 stars were extracted. We obtain high quality light curves for the K giants, but no clear oscillation signal is detected. High precision differential photometry is possible even in very crowded regions like the core of M4. Solar-like oscillations are probably present in K giants, but the amplitudes are lower than classical scaling laws predict. △ Less

Submitted 15 August, 2007; originally announced August 2007.

Comments: 14 pages, 16 figures, accepted for publication in A&A

Showing 1–3 of 3 results for author: Matthiesen, J