-
A Cognitively-Inspired Neural Architecture for Visual Abstract Reasoning Using Contrastive Perceptual and Conceptual Processing
Authors:
Yuan Yang,
Deepayan Sanyal,
James Ainooson,
Joel Michelson,
Effat Farhana,
Maithilee Kunda
Abstract:
We introduce a new neural architecture for solving visual abstract reasoning tasks inspired by human cognition, specifically by observations that human abstract reasoning often interleaves perceptual and conceptual processing as part of a flexible, iterative, and dynamic cognitive process. Inspired by this principle, our architecture models visual abstract reasoning as an iterative, self-contrasti…
▽ More
We introduce a new neural architecture for solving visual abstract reasoning tasks inspired by human cognition, specifically by observations that human abstract reasoning often interleaves perceptual and conceptual processing as part of a flexible, iterative, and dynamic cognitive process. Inspired by this principle, our architecture models visual abstract reasoning as an iterative, self-contrasting learning process that pursues consistency between perceptual and conceptual processing of visual stimuli. We explain how this new Contrastive Perceptual-Conceptual Network (CPCNet) works using matrix reasoning problems in the style of the well-known Raven's Progressive Matrices intelligence test. Experiments on the machine learning dataset RAVEN show that CPCNet achieves higher accuracy than all previously published models while also using the weakest inductive bias. We also point out a substantial and previously unremarked class imbalance in the original RAVEN dataset, and we propose a new variant of RAVEN -- AB-RAVEN -- that is more balanced in terms of abstract concepts.
△ Less
Submitted 20 October, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play
Authors:
Deepayan Sanyal,
Joel Michelson,
Yuan Yang,
James Ainooson,
Maithilee Kunda
Abstract:
Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object from several different viewpoints. In this paper, we study how learning signals that equate different viewpoints -- e.g., assigning similar representations to dif…
▽ More
Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object from several different viewpoints. In this paper, we study how learning signals that equate different viewpoints -- e.g., assigning similar representations to different views of a single object -- can support robust visual learning. We use the Toybox dataset, which contains egocentric videos of humans manipulating different objects, and conduct experiments using a computer vision framework for self-supervised contrastive learning. We find that representations learned by equating different physical viewpoints of an object benefit downstream image classification accuracy. Further experiments show that this performance improvement is robust to variations in the gaps between viewpoints, and that the benefits transfer to several different image classification tasks.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
A Neurodiversity-Inspired Solver for the Abstraction \& Reasoning Corpus (ARC) Using Visual Imagery and Program Synthesis
Authors:
James Ainooson,
Deepayan Sanyal,
Joel P. Michelson,
Yuan Yang,
Maithilee Kunda
Abstract:
Core knowledge about physical objects -- e.g., their permanency, spatial transformations, and interactions -- is one of the most fundamental building blocks of biological intelligence across humans and non-human animals. While AI techniques in certain domains (e.g. vision, NLP) have advanced dramatically in recent years, no current AI systems can yet match human abilities in flexibly applying core…
▽ More
Core knowledge about physical objects -- e.g., their permanency, spatial transformations, and interactions -- is one of the most fundamental building blocks of biological intelligence across humans and non-human animals. While AI techniques in certain domains (e.g. vision, NLP) have advanced dramatically in recent years, no current AI systems can yet match human abilities in flexibly applying core knowledge to solve novel tasks. We propose a new AI approach to core knowledge that combines 1) visual representations of core knowledge inspired by human mental imagery abilities, especially as observed in studies of neurodivergent individuals; with 2) tree-search-based program synthesis for flexibly combining core knowledge to form new reasoning strategies on the fly. We demonstrate our system's performance on the very difficult Abstraction \& Reasoning Corpus (ARC) challenge, and we share experimental results from publicly available ARC items as well as from our 4th-place finish on the private test set during the 2022 global ARCathon challenge.
△ Less
Submitted 31 October, 2023; v1 submitted 18 February, 2023;
originally announced February 2023.
-
Deep Non-Monotonic Reasoning for Visual Abstract Reasoning Tasks
Authors:
Yuan Yang,
Deepayan Sanyal,
Joel Michelson,
James Ainooson,
Maithilee Kunda
Abstract:
While achieving unmatched performance on many well-defined tasks, deep learning models have also been used to solve visual abstract reasoning tasks, which are relatively less well-defined, and have been widely used to measure human intelligence. However, current deep models struggle to match human abilities to solve such tasks with minimum data but maximum generalization. One limitation is that cu…
▽ More
While achieving unmatched performance on many well-defined tasks, deep learning models have also been used to solve visual abstract reasoning tasks, which are relatively less well-defined, and have been widely used to measure human intelligence. However, current deep models struggle to match human abilities to solve such tasks with minimum data but maximum generalization. One limitation is that current deep learning models work in a monotonic way, i.e., treating different parts of the input in essentially fixed orderings, whereas people repeatedly observe and reason about the different parts of the visual stimuli until the reasoning process converges to a consistent conclusion, i.e., non-monotonic reasoning. This paper proposes a non-monotonic computational approach to solve visual abstract reasoning tasks. In particular, we implemented a deep learning model using this approach and tested it on the RAVEN dataset -- a dataset inspired by the Raven's Progressive Matrices test. Results show that the proposed approach is more effective than existing monotonic deep learning models, under strict experimental settings that represent a difficult variant of the RAVEN dataset problem.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Computational Models of Solving Raven's Progressive Matrices: A Comprehensive Introduction
Authors:
Yuan Yang,
Mathilee Kunda
Abstract:
As being widely used to measure human intelligence, Raven's Progressive Matrices (RPM) tests also pose a great challenge for AI systems. There is a long line of computational models for solving RPM, starting from 1960s, either to understand the involved cognitive processes or solely for problem-solving purposes. Due to the dramatic paradigm shifts in AI researches, especially the advent of deep le…
▽ More
As being widely used to measure human intelligence, Raven's Progressive Matrices (RPM) tests also pose a great challenge for AI systems. There is a long line of computational models for solving RPM, starting from 1960s, either to understand the involved cognitive processes or solely for problem-solving purposes. Due to the dramatic paradigm shifts in AI researches, especially the advent of deep learning models in the last decade, the computational studies on RPM have also changed a lot. Therefore, now is a good time to look back at this long line of research. As the title -- ``a comprehensive introduction'' -- indicates, this paper provides an all-in-one presentation of computational models for solving RPM, including the history of RPM, intelligence testing theories behind RPM, item design and automatic item generation of RPM-like tasks, a conceptual chronicle of computational models for solving RPM, which reveals the philosophy behind the technology evolution of these models, and suggestions for transferring human intelligence testing and AI testing.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Visual-Imagery-Based Analogical Construction in Geometric Matrix Reasoning Task
Authors:
Yuan Yang,
Keith McGreggor,
Maithilee Kunda
Abstract:
Raven's Progressive Matrices is a family of classical intelligence tests that have been widely used in both research and clinical settings. There have been many exciting efforts in AI communities to computationally model various aspects of problem solving such figural analogical reasoning problems. In this paper, we present a series of computational models for solving Raven's Progressive Matrices…
▽ More
Raven's Progressive Matrices is a family of classical intelligence tests that have been widely used in both research and clinical settings. There have been many exciting efforts in AI communities to computationally model various aspects of problem solving such figural analogical reasoning problems. In this paper, we present a series of computational models for solving Raven's Progressive Matrices using analogies and image transformations. We run our models following three different strategies usually adopted by human testees. These models are tested on the standard version of Raven's Progressive Matrices, in which we can solve 57 out 60 problems in it. Therefore, analogy and image transformation are proved to be effective in solving RPM problems.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Automatic Item Generation of Figural Analogy Problems: A Review and Outlook
Authors:
Yuan Yang,
Deepayan Sanyal,
Joel Michelson,
James Ainooson,
Maithilee Kunda
Abstract:
Figural analogy problems have long been a widely used format in human intelligence tests. In the past four decades, more and more research has investigated automatic item generation for figural analogy problems, i.e., algorithmic approaches for systematically and automatically creating such problems. In cognitive science and psychometrics, this research can deepen our understandings of human analo…
▽ More
Figural analogy problems have long been a widely used format in human intelligence tests. In the past four decades, more and more research has investigated automatic item generation for figural analogy problems, i.e., algorithmic approaches for systematically and automatically creating such problems. In cognitive science and psychometrics, this research can deepen our understandings of human analogical ability and psychometric properties of figural analogies. With the recent development of data-driven AI models for reasoning about figural analogies, the territory of automatic item generation of figural analogies has further expanded. This expansion brings new challenges as well as opportunities, which demand reflection on previous item generation research and planning future studies. This paper reviews the important works of automatic item generation of figural analogies for both human intelligence tests and data-driven AI models. From an interdisciplinary perspective, the principles and technical details of these works are analyzed and compared, and desiderata for future research are suggested.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
The AI Triplet: Computational, Conceptual, and Mathematical Knowledge in AI Education
Authors:
Maithilee Kunda
Abstract:
Efforts to enhance education and broaden participation in AI will benefit from a systematic understanding of the competencies underlying AI expertise. In this paper, we observe that AI expertise requires integrating computational, conceptual, and mathematical knowledge and representations. We call this the ``AI triplet,'' similar in spirit to the ``chemistry triplet'' that has heavily influenced t…
▽ More
Efforts to enhance education and broaden participation in AI will benefit from a systematic understanding of the competencies underlying AI expertise. In this paper, we observe that AI expertise requires integrating computational, conceptual, and mathematical knowledge and representations. We call this the ``AI triplet,'' similar in spirit to the ``chemistry triplet'' that has heavily influenced the past four decades of chemistry education research. We describe a theoretical foundation for this triplet and show how it maps onto two sample AI topics: tree search and gradient descent. Finally, just as the chemistry triplet has impacted chemistry education in concrete ways, we suggest two initial hypotheses for how the AI triplet might impact AI education: 1) how we can help AI students gain proficiency in moving between the corners of the triplet; and 2) how all corners of the AI triplet highlight the need for supporting students' spatial cognitive skills.
△ Less
Submitted 29 September, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Do Time Constraints Re-Prioritize Attention to Shapes During Visual Photo Inspection?
Authors:
Yiyuan Yang,
Kenneth Li,
Fernanda Eliott,
Maithilee Kunda
Abstract:
People's visual experiences of the world are easy to carve up and examine along natural language boundaries, e.g., by category labels, attribute labels, etc. However, it is more difficult to elicit detailed visuospatial information about what a person attends to, e.g., the specific shape of a tree. Paying attention to the shapes of things not only feeds into well defined tasks like visual category…
▽ More
People's visual experiences of the world are easy to carve up and examine along natural language boundaries, e.g., by category labels, attribute labels, etc. However, it is more difficult to elicit detailed visuospatial information about what a person attends to, e.g., the specific shape of a tree. Paying attention to the shapes of things not only feeds into well defined tasks like visual category learning, but it is also what enables us to differentiate similarly named objects and to take on creative visual pursuits, like poetically describing the shape of a thing, or finding shapes in the clouds or stars. We use a new data collection method that elicits people's prioritized attention to shapes during visual photo inspection by asking them to trace important parts of the image under varying time constraints. Using data collected via crowdsourcing over a set of 187 photographs, we examine changes in patterns of visual attention across individuals, across image types, and across time constraints.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset
Authors:
Zhanwen Chen,
Shiyao Li,
Roxanne Rashedi,
Xiaoman Zi,
Morgan Elrod-Erickson,
Bryan Hollis,
Angela Maliakal,
Xinyu Shen,
Simeng Zhao,
Maithilee Kunda
Abstract:
Modern social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content, e.g., for a scene in Harry Potter, "Is the father really upset about the boys flying the car?" Social visual question answering (social VQA) is emerging as a valuable methodology for studying social reasoning in both humans (e.g., children with autism) and AI agents…
▽ More
Modern social intelligence includes the ability to watch videos and answer questions about social and theory-of-mind-related content, e.g., for a scene in Harry Potter, "Is the father really upset about the boys flying the car?" Social visual question answering (social VQA) is emerging as a valuable methodology for studying social reasoning in both humans (e.g., children with autism) and AI agents. However, this problem space spans enormous variations in both videos and questions. We discuss methods for creating and characterizing social VQA datasets, including 1) crowdsourcing versus in-house authoring, including sample comparisons of two new datasets that we created (TinySocial-Crowd and TinySocial-InHouse) and the previously existing Social-IQ dataset; 2) a new rubric for characterizing the difficulty and content of a given video; and 3) a new rubric for characterizing question types. We close by describing how having well-characterized social VQA datasets will enhance the explainability of AI agents and can also inform assessments and educational interventions for people.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Creative Captioning: An AI Grand Challenge Based on the Dixit Board Game
Authors:
Maithilee Kunda,
Irina Rabkina
Abstract:
We propose a new class of "grand challenge" AI problems that we call creative captioning---generating clever, interesting, or abstract captions for images, as well as understanding such captions. Creative captioning draws on core AI research areas of vision, natural language processing, narrative reasoning, and social reasoning, and across all these areas, it requires sophisticated uses of common…
▽ More
We propose a new class of "grand challenge" AI problems that we call creative captioning---generating clever, interesting, or abstract captions for images, as well as understanding such captions. Creative captioning draws on core AI research areas of vision, natural language processing, narrative reasoning, and social reasoning, and across all these areas, it requires sophisticated uses of common sense and cultural knowledge. In this paper, we analyze several specific research problems that fall under creative captioning, using the popular board game Dixit as both inspiration and proposed testing ground. We expect that Dixit could serve as an engaging and motivating benchmark for creative captioning across numerous AI research communities for the coming 1-2 decades.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
Neuropsychiatric Disease Classification Using Functional Connectomics -- Results of the Connectomics in NeuroImaging Transfer Learning Challenge
Authors:
Markus D. Schirmer,
Archana Venkataraman,
Islem Rekik,
Minjeong Kim,
Stewart H. Mostofsky,
Mary Beth Nebel,
Keri Rosch,
Karen Seymour,
Deana Crocetti,
Hassna Irzan,
Michael Hütel,
Sebastien Ourselin,
Neil Marlow,
Andrew Melbourne,
Egor Levchenko,
Shuo Zhou,
Mwiza Kunda,
Hai** Lu,
Nicha C. Dvornek,
Juntang Zhuang,
Gideon Pinto,
Sandip Samal,
Jennings Zhang,
Jorge L. Bernal-Rusiel,
Rudolph Pienaar
, et al. (1 additional authors not shown)
Abstract:
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging…
▽ More
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging Transfer Learning Challenge (CNI-TLC), held in conjunction with MICCAI 2019. CNI-TLC included two classification tasks: (1) diagnosis of Attention-Deficit/Hyperactivity Disorder (ADHD) within a pre-adolescent cohort; and (2) transference of the ADHD model to a related cohort of Autism Spectrum Disorder (ASD) patients with an ADHD comorbidity. In total, 240 resting-state fMRI time series averaged according to three standard parcellation atlases, along with clinical diagnosis, were released for training and validation (120 neurotypical controls and 120 ADHD). We also provided demographic information of age, sex, IQ, and handedness. A second set of 100 subjects (50 neurotypical controls, 25 ADHD, and 25 ASD with ADHD comorbidity) was used for testing. Models were submitted in a standardized format as Docker images through ChRIS, an open-source image analysis platform. Utilizing an inclusive approach, we ranked the methods based on 16 different metrics. The final rank was calculated using the rank product for each participant across all measures. Furthermore, we assessed the calibration curves of each method. Five participants submitted their model for evaluation, with one outperforming all other methods in both ADHD and ASD classification. However, further improvements are needed to reach the clinical translation of functional connectomics. We are kee** the CNI-TLC open as a publicly available resource for develo** and validating new classification methodologies in the field of connectomics.
△ Less
Submitted 25 November, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Variable-Viewpoint Representations for 3D Object Recognition
Authors:
Tengyu Ma,
Joel Michelson,
James Ainooson,
Deepayan Sanyal,
Xiaohan Wang,
Maithilee Kunda
Abstract:
For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions. These representations offer trade-offs in…
▽ More
For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions. These representations offer trade-offs in terms of what object information is captured and to what degree of detail it is captured, but it is not clear how to measure these information trade-offs since the two types of representations are so different. We demonstrate that both types of representations in fact exist at two extremes of a common representational continuum, essentially choosing to prioritize either the number of views of an object or the pixels (i.e., field of view) allotted per view. We identify interesting intermediate representations that lie at points in between these two extremes, and we show, through systematic empirical experiments, how accuracy varies along this continuum as a function of input information as well as the particular deep learning architecture that is used.
△ Less
Submitted 8 February, 2020;
originally announced February 2020.
-
Learning Spatially Structured Image Transformations Using Planar Neural Networks
Authors:
Joel Michelson,
Joshua H. Palmer,
Aneesha Dasari,
Maithilee Kunda
Abstract:
Learning image transformations is essential to the idea of mental simulation as a method of cognitive inference. We take a connectionist modeling approach, using planar neural networks to learn fundamental imagery transformations, like translation, rotation, and scaling, from perceptual experiences in the form of image sequences. We investigate how variations in network topology, training data, an…
▽ More
Learning image transformations is essential to the idea of mental simulation as a method of cognitive inference. We take a connectionist modeling approach, using planar neural networks to learn fundamental imagery transformations, like translation, rotation, and scaling, from perceptual experiences in the form of image sequences. We investigate how variations in network topology, training data, and image shape, among other factors, affect the efficiency and effectiveness of learning visual imagery transformations, including effectiveness of transfer to operating on new types of data.
△ Less
Submitted 9 August, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
Modeling Gestalt Visual Reasoning on the Raven's Progressive Matrices Intelligence Test Using Generative Image Inpainting Techniques
Authors:
Tianyu Hua,
Maithilee Kunda
Abstract:
Psychologists recognize Raven's Progressive Matrices as a very effective test of general human intelligence. While many computational models have been developed by the AI community to investigate different forms of top-down, deliberative reasoning on the test, there has been less research on bottom-up perceptual processes, like Gestalt image completion, that are also critical in human test perform…
▽ More
Psychologists recognize Raven's Progressive Matrices as a very effective test of general human intelligence. While many computational models have been developed by the AI community to investigate different forms of top-down, deliberative reasoning on the test, there has been less research on bottom-up perceptual processes, like Gestalt image completion, that are also critical in human test performance. In this work, we investigate how Gestalt visual reasoning on the Raven's test can be modeled using generative image inpainting techniques from computer vision. We demonstrate that a self-supervised inpainting model trained only on photorealistic images of objects achieves a score of 27/36 on the Colored Progressive Matrices, which corresponds to average performance for nine-year-old children. We also show that models trained on other datasets (faces, places, and textures) do not perform as well. Our results illustrate how learning visual regularities in real-world images can translate into successful reasoning about artificial test stimuli. On the flip side, our results also highlight the limitations of such transfer, which may explain why intelligence tests like the Raven's are often sensitive to people's individual sociocultural backgrounds.
△ Less
Submitted 26 November, 2019; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Quantifying Human Behavior on the Block Design Test Through Automated Multi-Level Analysis of Overhead Video
Authors:
Seunghwan Cha,
James Ainooson,
Maithilee Kunda
Abstract:
The block design test is a standardized, widely used neuropsychological assessment of visuospatial reasoning that involves a person recreating a series of given designs out of a set of colored blocks. In current testing procedures, an expert neuropsychologist observes a person's accuracy and completion time as well as overall impressions of the person's problem-solving procedures, errors, etc., th…
▽ More
The block design test is a standardized, widely used neuropsychological assessment of visuospatial reasoning that involves a person recreating a series of given designs out of a set of colored blocks. In current testing procedures, an expert neuropsychologist observes a person's accuracy and completion time as well as overall impressions of the person's problem-solving procedures, errors, etc., thus obtaining a holistic though subjective and often qualitative view of the person's cognitive processes. We propose a new framework that combines room sensors and AI techniques to augment the information available to neuropsychologists from block design and similar tabletop assessments. In particular, a ceiling-mounted camera captures an overhead view of the table surface. From this video, we demonstrate how automated classification using machine learning can produce a frame-level description of the state of the block task and the person's actions over the course of each test problem. We also show how a sequence-comparison algorithm can classify one individual's problem-solving strategy relative to a database of simulated strategies, and how these quantitative results can be visualized for use by neuropsychologists.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
The Toybox Dataset of Egocentric Visual Object Transformations
Authors:
Xiaohan Wang,
Tengyu Ma,
James Ainooson,
Seunghwan Cha,
Xiaotian Wang,
Azhar Molla,
Maithilee Kunda
Abstract:
In object recognition research, many commonly used datasets (e.g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e.g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles. These distributional properties constrain the types of computational experiments that are able t…
▽ More
In object recognition research, many commonly used datasets (e.g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e.g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles. These distributional properties constrain the types of computational experiments that are able to be conducted with such datasets, and also do not reflect naturalistic patterns of embodied visual experience. As a contribution to the small (but growing) number of multi-view object datasets that have been created to bridge this gap, we introduce a new video dataset called Toybox that contains egocentric (i.e., first-person perspective) videos of common household objects and toys being manually manipulated to undergo structured transformations, such as rotation, translation, and zooming. To illustrate potential uses of Toybox, we also present initial neural network experiments that examine 1) how training on different distributions of object instances and views affects recognition performance, and 2) how viewpoint-dependent object concepts are represented within the hidden layers of a trained network.
△ Less
Submitted 26 November, 2018; v1 submitted 15 June, 2018;
originally announced June 2018.