Search | arXiv e-print repository

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Authors: Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara

Abstract: While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a spe… ▽ More While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a specific task configuration (e.g., matrix). However, existing AVR benchmarks only considered a limited set of patterns (addition, conjunction), input shapes (rectangle, square), and task configurations (3 by 3 matrices). To evaluate MLLMs' reasoning abilities comprehensively, we introduce MARVEL, a multidimensional AVR benchmark with 770 puzzles composed of six core knowledge patterns, geometric and abstract shapes, and five different task configurations. To inspect whether the model accuracy is grounded in perception and reasoning, MARVEL complements the general AVR question with perception questions in a hierarchical evaluation framework. We conduct comprehensive experiments on MARVEL with nine representative MLLMs in zero-shot and few-shot settings. Our experiments reveal that all models show near-random performance on the AVR question, with significant performance gaps (40%) compared to humans across all patterns and task configurations. Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning. We release our entire code and dataset. △ Less

Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2401.12117 [pdf, other]

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Authors: Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara

Abstract: While large language models (LLMs) are still being adopted to new domains and utilized in novel applications, we are experiencing an influx of the new generation of foundation models, namely multi-modal large language models (MLLMs). These models integrate verbal and visual information, opening new possibilities to demonstrate more complex reasoning abilities at the intersection of the two modalit… ▽ More While large language models (LLMs) are still being adopted to new domains and utilized in novel applications, we are experiencing an influx of the new generation of foundation models, namely multi-modal large language models (MLLMs). These models integrate verbal and visual information, opening new possibilities to demonstrate more complex reasoning abilities at the intersection of the two modalities. However, despite the revolutionizing prospect of MLLMs, our understanding of their reasoning abilities is limited. In this study, we assess the nonverbal abstract reasoning abilities of open-source and closed-source MLLMs using variations of Raven's Progressive Matrices. Our experiments expose the difficulty of solving such problems while showcasing the immense gap between open-source and closed-source models. We also reveal critical shortcomings with individual visual and textual modules, subjecting the models to low-performance ceilings. Finally, to improve MLLMs' performance, we experiment with various methods, such as Chain-of-Thought prompting, resulting in a significant (up to 100%) boost in performance. △ Less

Submitted 13 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: Code and datasets are available at https://github.com/kahrabian/mllm-nvar

arXiv:2305.10613 [pdf, other]

Temporal Knowledge Graph Forecasting Without Knowledge Using In-Context Learning

Authors: Dong-Ho Lee, Kian Ahrabian, Woojeong **, Fred Morstatter, Jay Pujara

Abstract: Temporal knowledge graph (TKG) forecasting benchmarks challenge models to predict future facts using knowledge of past facts. In this paper, we apply large language models (LLMs) to these benchmarks using in-context learning (ICL). We investigate whether and to what extent LLMs can be used for TKG forecasting, especially without any fine-tuning or explicit modules for capturing structural and temp… ▽ More Temporal knowledge graph (TKG) forecasting benchmarks challenge models to predict future facts using knowledge of past facts. In this paper, we apply large language models (LLMs) to these benchmarks using in-context learning (ICL). We investigate whether and to what extent LLMs can be used for TKG forecasting, especially without any fine-tuning or explicit modules for capturing structural and temporal information. For our experiments, we present a framework that converts relevant historical facts into prompts and generates ranked predictions using token probabilities. Surprisingly, we observe that LLMs, out-of-the-box, perform on par with state-of-the-art TKG models carefully designed and trained for TKG forecasting. Our extensive evaluation presents performances across several models and datasets with different characteristics, compares alternative heuristics for preparing contextual information, and contrasts to prominent TKG methods and simple frequency and recency baselines. We also discover that using numerical indices instead of entity/relation names, i.e., hiding semantic information, does not significantly affect the performance ($\pm$0.4\% Hit@1). This shows that prior semantic knowledge is unnecessary; instead, LLMs can leverage the existing patterns in the context to achieve such performance. Our analysis also reveals that ICL enables LLMs to learn irregular patterns from the historical context, going beyond simple predictions based on common or recent information. △ Less

Submitted 20 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 main conference. 14 pages, 4 figures, 10 tables

arXiv:2302.02231 [pdf, other]

PubGraph: A Large-Scale Scientific Knowledge Graph

Authors: Kian Ahrabian, Xinwei Du, Richard Delwin Myloth, Arun Baalaaji Sankar Ananthan, Jay Pujara

Abstract: Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Unfortunately, the lack of a large-scale, comprehensive, and easy-to-use resource capturing the myriad relationships between publications, their authors, and venues presents a barrier to applications for gaining a deeper understanding of science. In this… ▽ More Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Unfortunately, the lack of a large-scale, comprehensive, and easy-to-use resource capturing the myriad relationships between publications, their authors, and venues presents a barrier to applications for gaining a deeper understanding of science. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale knowledge graph (KG) with more than 385M entities, 13B main edges, and 1.5B qualifier edges. PubGraph is comprehensive and unifies data from various sources, including Wikidata, OpenAlex, and Semantic Scholar, using the Wikidata ontology. Beyond the metadata available from these sources, PubGraph includes outputs from auxiliary community detection algorithms and large language models. To further support studies on reasoning over scientific networks, we create several large-scale benchmarks extracted from PubGraph for the core task of knowledge graph completion (KGC). These benchmarks present many challenges for knowledge graph embedding models, including an adversarial community-based KGC evaluation setting, zero-shot inductive learning, and large-scale learning. All of the aforementioned resources are accessible at https://pubgraph.isi.edu/ and released under the CC-BY-SA license. We plan to update PubGraph quarterly to accommodate the release of new publications. △ Less

Submitted 19 May, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

Comments: 17 Pages, 6 Figures, 9 Tables

arXiv:2009.11355 [pdf, other]

Structure Aware Negative Sampling in Knowledge Graphs

Authors: Kian Ahrabian, Aarash Feizi, Yasmin Salehi, William L. Hamilton, Avishek Joey Bose

Abstract: Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations… ▽ More Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds semantically meaningful negatives and is competitive with SOTA approaches while requires no additional parameters nor difficult adversarial optimization. △ Less

Submitted 6 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: Accepted to EMNLP 2020. Camera-ready submission

arXiv:2007.01231 [pdf, other]

Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs

Authors: Kian Ahrabian, Daniel Tarlow, Hehuimin Cheng, ** L. C. Guo

Abstract: We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest social coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link pred… ▽ More We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest social coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events. △ Less

Submitted 12 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: 11 pages, 1 figure. 37th International Conference on Machine Learning (ICML 2020) - Workshop on Graph Representation Learning and Beyond

arXiv:1712.02781 [pdf, other]

doi 10.1007/s00521-018-3844-z

On Usage of Autoencoders and Siamese Networks for Online Handwritten Signature Verification

Authors: Kian Ahrabian, Bagher Babaali

Abstract: In this paper, we propose a novel writer-independent global feature extraction framework for the task of automatic signature verification which aims to make robust systems for automatically distinguishing negative and positive samples. Our method consists of an autoencoder for modeling the sample space into a fixed length latent space and a Siamese Network for classifying the fixed-length samples… ▽ More In this paper, we propose a novel writer-independent global feature extraction framework for the task of automatic signature verification which aims to make robust systems for automatically distinguishing negative and positive samples. Our method consists of an autoencoder for modeling the sample space into a fixed length latent space and a Siamese Network for classifying the fixed-length samples obtained from the autoencoder based on the reference samples of a subject as being "Genuine" or "Forged." During our experiments, usage of Attention Mechanism and applying Downsampling significantly improved the accuracy of the proposed framework. We evaluated our proposed framework using SigWiComp2013 Japanese and GPDSsyntheticOnLineOffLineSignature datasets. On the SigWiComp2013 Japanese dataset, we achieved 8.65% EER that means 1.2% relative improvement compared to the best-reported result. Furthermore, on the GPDSsyntheticOnLineOffLineSignature dataset, we achieved average EERs of 0.13%, 0.12%, 0.21% and 0.25% respectively for 150, 300, 1000 and 2000 test subjects which indicates improvement of relative EER on the best-reported result by 95.67%, 95.26%, 92.9% and 91.52% respectively. Apart from the accuracy gain, because of the nature of our proposed framework which is based on neural networks and consequently is as simple as some consecutive matrix multiplications, it has less computational cost than conventional methods such as DTW and could be used concurrently on devices such as GPU, TPU, etc. △ Less

Submitted 29 December, 2017; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: 13 pages, 10 figures, Submitted to Neural Computing and Applications journal

Showing 1–7 of 7 results for author: Ahrabian, K