-
Approaching human 3D shape perception with neurally mappable models
Authors:
Thomas P. O'Connell,
Tyler Bonnen,
Yoni Friedman,
Ayush Tewari,
Josh B. Tenenbaum,
Vincent Sitzmann,
Nancy Kanwisher
Abstract:
Humans effortlessly infer the 3D shape of objects. What computations underlie this ability? Although various computational models have been proposed, none of them capture the human ability to match object shape across viewpoints. Here, we ask whether and how this gap might be closed. We begin with a relatively novel class of computational models, 3D neural fields, which encapsulate the basic princ…
▽ More
Humans effortlessly infer the 3D shape of objects. What computations underlie this ability? Although various computational models have been proposed, none of them capture the human ability to match object shape across viewpoints. Here, we ask whether and how this gap might be closed. We begin with a relatively novel class of computational models, 3D neural fields, which encapsulate the basic principles of classic analysis-by-synthesis in a deep neural network (DNN). First, we find that a 3D Light Field Network (3D-LFN) supports 3D matching judgments well aligned to humans for within-category comparisons, adversarially-defined comparisons that accentuate the 3D failure cases of standard DNN models, and adversarially-defined comparisons for algorithmically generated shapes with no category structure. We then investigate the source of the 3D-LFN's ability to achieve human-aligned performance through a series of computational experiments. Exposure to multiple viewpoints of objects during training and a multi-view learning objective are the primary factors behind model-human alignment; even conventional DNN architectures come much closer to human behavior when trained with multi-view objectives. Finally, we find that while the models trained with multi-view learning objectives are able to partially generalize to new object categories, they fall short of human alignment. This work provides a foundation for understanding human shape inferences within neurally mappable computational architectures.
△ Less
Submitted 7 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Unsupervised Segmentation in Real-World Images via Spelke Object Inference
Authors:
Honglin Chen,
Rahul Venkatesh,
Yoni Friedman,
Jiajun Wu,
Joshua B. Tenenbaum,
Daniel L. K. Yamins,
Daniel M. Bear
Abstract:
Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision. Here, we show how to learn static grou** priors from motion self-supervision by building on the cognitive science concept of a Spelke Object: a set of physical stuff that moves together. We introduce the Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns to e…
▽ More
Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision. Here, we show how to learn static grou** priors from motion self-supervision by building on the cognitive science concept of a Spelke Object: a set of physical stuff that moves together. We introduce the Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns to extract pairwise affinity graphs for static scenes from motion-based training signals. EISEN then produces segments from affinities using a novel graph propagation and competition network. During training, objects that undergo correlated motion (such as robot arms and the objects they move) are decoupled by a bootstrap** process: EISEN explains away the motion of objects it has already learned to segment. We show that EISEN achieves a substantial improvement in the state of the art for self-supervised image segmentation on challenging synthetic and real-world robotics datasets.
△ Less
Submitted 25 July, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Identifying concept libraries from language about object structure
Authors:
Catherine Wong,
William P. McCarthy,
Gabriel Grand,
Yoni Friedman,
Joshua B. Tenenbaum,
Jacob Andreas,
Robert D. Hawkins,
Judith E. Fan
Abstract:
Our understanding of the visual world goes beyond naming objects, encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others. We formalize our problem as sear…
▽ More
Our understanding of the visual world goes beyond naming objects, encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others. We formalize our problem as search over a space of program libraries that contain different part concepts, using tools from machine translation to evaluate how well programs expressed in each library align to human language. By combining naturalistic language at scale with structured program representations, we discover a fundamental information-theoretic tradeoff governing the part concepts people name: people favor a lexicon that allows concise descriptions of each object, while also minimizing the size of the lexicon itself.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Accelerated Gradient Descent Learning over Multiple Access Fading Channels
Authors:
Raz Paul,
Yuval Friedman,
Kobi Cohen
Abstract:
We consider a distributed learning problem in a wireless network, consisting of N distributed edge devices and a parameter server (PS). The objective function is a sum of the edge devices' local loss functions, who aim to train a shared model by communicating with the PS over multiple access channels (MAC). This problem has attracted a growing interest in distributed sensing systems, and more rece…
▽ More
We consider a distributed learning problem in a wireless network, consisting of N distributed edge devices and a parameter server (PS). The objective function is a sum of the edge devices' local loss functions, who aim to train a shared model by communicating with the PS over multiple access channels (MAC). This problem has attracted a growing interest in distributed sensing systems, and more recently in federated learning, known as over-the-air computation. In this paper, we develop a novel Accelerated Gradient-descent Multiple Access (AGMA) algorithm that uses momentum-based gradient signals over noisy fading MAC to improve the convergence rate as compared to existing methods. Furthermore, AGMA does not require power control or beamforming to cancel the fading effect, which simplifies the implementation complexity. We analyze AGMA theoretically, and establish a finite-sample bound of the error for both convex and strongly convex loss functions with Lipschitz gradient. For the strongly convex case, we show that AGMA approaches the best-known linear convergence rate as the network increases. For the convex case, we show that AGMA significantly improves the sub-linear convergence rate as compared to existing methods. Finally, we present simulation results using real datasets that demonstrate better performance by AGMA.
△ Less
Submitted 8 October, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.