Search | arXiv e-print repository

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints

Authors: Divij Handa, Pavel Dolin, Shrinidhi Kumbhar, Chitta Baral, Tran Cao Son

Abstract: Reasoning about actions and change (RAC) has historically driven the development of many early AI challenges, such as the frame problem, and many AI disciplines, including non-monotonic and commonsense reasoning. The role of RAC remains important even now, particularly for tasks involving dynamic environments, interactive scenarios, and commonsense reasoning. Despite the progress of Large Language… ▽ More Reasoning about actions and change (RAC) has historically driven the development of many early AI challenges, such as the frame problem, and many AI disciplines, including non-monotonic and commonsense reasoning. The role of RAC remains important even now, particularly for tasks involving dynamic environments, interactive scenarios, and commonsense reasoning. Despite the progress of Large Language Models (LLMs) in various AI domains, their performance on RAC is underexplored. To address this gap, we introduce a new benchmark, ActionReasoningBench, encompassing 13 domains and rigorously evaluating LLMs across eight different areas of RAC. These include - Object Tracking, Fluent Tracking, State Tracking, Action Executability, Effects of Actions, Numerical RAC, Hallucination Detection, and Composite Questions. Furthermore, we also investigate the indirect effect of actions due to ramification constraints for every domain. Finally, we evaluate our benchmark using open-sourced and commercial state-of-the-art LLMs, including GPT-4o, Gemini-1.0-Pro, Llama2-7b-chat, Llama2-13b-chat, Llama3-8b-instruct, Gemma-2b-instruct, and Gemma-7b-instruct. Our findings indicate that these models face significant challenges across all categories included in our benchmark. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 54 pages, 11 figures

arXiv:2310.00836 [pdf, other]

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

Authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral

Abstract: Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge R… ▽ More Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there's a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. Utilizing LogiGLUE as a foundation, we have trained an instruction fine-tuned language model, resulting in LogiT5. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model across the different logical reasoning categories. We also assess various LLMs using LogiGLUE, and the findings indicate that LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field. △ Less

Submitted 30 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Work in progress

arXiv:2003.12476 [pdf, other]

doi 10.1038/s41597-020-00638-4

AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance

Authors: Sebastiaan. P. Huber, Spyros Zoupanos, Martin Uhrin, Leopold Talirz, Leonid Kahle, Rico Häuselmann, Dominik Gresch, Tiziano Müller, Aliaksandr V. Yakutovich, Casper W. Andersen, Francisco F. Ramirez, Carl S. Adorf, Fernando Gargiulo, Snehal Kumbhar, Elsa Passaro, Conrad Johnston, Andrius Merkys, Andrea Cepellotti, Nicolas Mounet, Nicola Marzari, Boris Kozinsky, Giovanni Pizzi

Abstract: The ever-growing availability of computing power and the sustained development of advanced computational methods have contributed much to recent scientific progress. These developments present new challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will harden these challenges, such that automated and scalable solutions become crucial.… ▽ More The ever-growing availability of computing power and the sustained development of advanced computational methods have contributed much to recent scientific progress. These developments present new challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will harden these challenges, such that automated and scalable solutions become crucial. In recent years, we have been develo** AiiDA (http://www.aiida.net), a robust open-source high-throughput infrastructure addressing the challenges arising from the needs of automated workflow management and data provenance recording. Here, we introduce developments and capabilities required to reach sustained performance, with AiiDA supporting throughputs of tens of thousands processes/hour, while automatically preserving and storing the full data provenance in a relational database making it queryable and traversable, thus enabling high-performance data analytics. AiiDA's workflow language provides advanced automation, error handling features and a flexible plugin model to allow interfacing with any simulation software. The associated plugin registry enables seamless sharing of extensions, empowering a vibrant user community dedicated to making simulations more robust, user-friendly and reproducible. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Journal ref: Scientific Data 7, 300 (2020)

arXiv:1601.06503 [pdf]

TiO2 based Nanostructured Memristor for RRAM and Neuromorphic Applications: A Simulation Approach

Authors: T. D. Dongale, P. J. Patil, N. K. Desai, P. P. Chougule, S. M. Kumbhar, P. P. Waifalkar, P. B. Patil, R. S. Vhatkar, M. V. Takale, P. K. Gaikwad, R. K. Kamat

Abstract: We report simulation of nanostructured memristor device using piecewise linear and nonlinear window functions for RRAM and neuromorphic applications. The linear drift model of memristor has been exploited for the simulation purpose with the linear and non-linear window function as the mathematical and scripting basis. The results evidences that the piecewise linear window function can aptly simula… ▽ More We report simulation of nanostructured memristor device using piecewise linear and nonlinear window functions for RRAM and neuromorphic applications. The linear drift model of memristor has been exploited for the simulation purpose with the linear and non-linear window function as the mathematical and scripting basis. The results evidences that the piecewise linear window function can aptly simulate the memristor characteristics pertaining to RRAM application. However, the nonlinear window function could exhibit the nonlinear phenomenon in simulation only at the lower magnitude of control parameter. This has motivated us to propose a new nonlinear window function for emulating the simulation model of the memristor. Interestingly, the proposed window function is scalable up to f(x)=1 and exhibits the nonlinear behavior at higher magnitude of control parameter. Moreover, the simulation results of proposed nonlinear window function are encouraging and reveals the smooth nonlinear change from LRS to HRS and vice versa and therefore useful for the neuromorphic applications. △ Less

Submitted 25 January, 2016; originally announced January 2016.

Comments: 11 pages, 8 figures

MSC Class: 65Zxx; 74K35; 82Dxx

Showing 1–4 of 4 results for author: Kumbhar, S