Search | arXiv e-print repository

LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

Authors: Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

Abstract: We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficie… ▽ More We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficiency in logical reasoning tasks, which are essential for activities like navigation and puzzle-solving. Thus we evaluate general logical cognition abilities across 5 logical reasoning tasks encompassing 9 different capabilities, using a sample of 448 multiple-choice questions. Each question is annotated with the correct answer and the human-written reasoning behind the selection, enabling both open-ended and multiple-choice evaluation. A total of 8 MLLMs are comprehensively evaluated using LogicVista. Code and Data Available at https://github.com/Yijia-Xiao/LogicVista. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: LogicVista benchmarks the logical reasoning of multimodal large language models in visual tasks

arXiv:2405.02243 [pdf, other]

Towards Improving Learning from Demonstration Algorithms via MCMC Methods

Authors: Carl Qi, Edward Sun, Harry Zhang

Abstract: Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging impl… ▽ More Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions. △ Less

Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2207.04638, arXiv:2204.03597 by other authors

arXiv:2312.17673 [pdf, other]

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

Authors: Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

Abstract: Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In th… ▽ More Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In this work, we introduce Jatmo, a method for generating task-specific models resilient to prompt-injection attacks. Jatmo leverages the fact that LLMs can only follow instructions once they have undergone instruction tuning. It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs. For situations with no pre-existing datasets, Jatmo can use a single example, or in some cases none at all, to produce a fully synthetic dataset. Our experiments on seven tasks show that Jatmo models provide similar quality of outputs on their specific task as standard LLMs, while being resilient to prompt injections. The best attacks succeeded in less than 0.5% of cases against our models, versus 87% success rate against GPT-3.5-Turbo. We release Jatmo at https://github.com/wagner-group/prompt-injection-defense. △ Less

Submitted 8 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 24 pages, 6 figures

arXiv:2311.05623 [pdf, other]

The 4m International Liquid Mirror Telescope: a brief history and some preliminary scientific results

Authors: Jean Surdej, Bhavya Ailawadhi, Talat Akhunov, Ermanno Borra, Monalisa Dubey, Naveen Dukiya, Jiuyang Fu, Baldeep Grewal, Paul Hickson, Brajesh Kumar, Kuntal Misra, Vibhore Negi, Anna Pospieszalska-Surdej, Kumar Pranshu, Ethen Sun

Abstract: The present article is based upon an invited talk delivered at the occasion of the inauguration of the 4m International Liquid Mirror Telescope (ILMT) which took place in Devasthal (ARIES, Uttarakhand, India) on 21st of March 2023. We present hereafter a short history of the liquid mirror telescopes and in particular of the 4m ILMT which is the first liquid mirror telescope entirely dedicated to a… ▽ More The present article is based upon an invited talk delivered at the occasion of the inauguration of the 4m International Liquid Mirror Telescope (ILMT) which took place in Devasthal (ARIES, Uttarakhand, India) on 21st of March 2023. We present hereafter a short history of the liquid mirror telescopes and in particular of the 4m ILMT which is the first liquid mirror telescope entirely dedicated to astrophysical observations. We discuss a few preliminary scientific results and illustrate some direct CCD images taken during the first commissioning phase of the telescope. We invite the reader to refer to the series of ILMT poster papers published in these same proceedings of the BINA3 workshop for more details about the instrument, operation, first observations, performance and scientific results. △ Less