Skip to main content

Showing 1–50 of 294 results for author: Tenenbaum, J

.
  1. arXiv:2406.19298  [pdf, other

    cs.CV cs.LG

    Compositional Image Decomposition with Diffusion Models

    Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

    Abstract: Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a sce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2024, Webpage: https://energy-based-model.github.io/decomp-diffusion

  2. arXiv:2406.15736  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

    Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

    Abstract: Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as h… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.11179  [pdf, other

    cs.LG cs.AI

    Learning Iterative Reasoning through Energy Diffusion

    Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

    Abstract: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference ba… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024, website: https://energy-based-model.github.io/ired/

  4. arXiv:2406.04302  [pdf, other

    cs.LG

    Representational Alignment Supports Effective Machine Teaching

    Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

    Abstract: A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint

  5. arXiv:2405.20510  [pdf, other

    cs.CV

    Physically Compatible 3D Object Modeling from a Single Image

    Authors: Minghao Guo, Bohan Wang, **chuan Ma, Tianyuan Zhang, Crystal Elaine Owens, Chuang Gan, Joshua B. Tenenbaum, Kaiming He, Wojciech Matusik

    Abstract: We present a computational framework that transforms single images into 3D physical objects. The visual geometry of a physical object in an image is determined by three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Existing single-view 3D reconstruction methods often overlook this underlying composition, presuming rigidity or neglecting external forces. Co… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  6. arXiv:2405.09783  [pdf, other

    cs.LG cs.AI cs.CE

    LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

    Authors: **chuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik

    Abstract: Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulati… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  7. arXiv:2405.09711  [pdf, other

    cs.AI cs.CL cs.CV

    STAR: A Benchmark for Situated Reasoning in Real-World Videos

    Authors: Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B Tenenbaum, Chuang Gan

    Abstract: Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: NeurIPS

  8. arXiv:2405.09605  [pdf, other

    cs.CL cs.AI cs.LG

    Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models

    Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

    Abstract: The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/i… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 21 pages (11 main), 7 figures. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally

  9. arXiv:2405.06906  [pdf, other

    cs.CL

    Finding structure in logographic writing with library learning

    Authors: Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lionel Wong, Joshua B. Tenenbaum, Roger P. Levy

    Abstract: One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted at CogSci 2024 (Talk)

  10. arXiv:2405.06624  [pdf, other

    cs.AI

    Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

    Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

    Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these appro… ▽ More

    Submitted 17 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  11. arXiv:2403.11075  [pdf, other

    cs.HC cs.AI cs.MA

    GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

    Authors: Lance Ying, Kunal Jha, Shivam Aarya, Joshua B. Tenenbaum, Antonio Torralba, Tianmin Shu

    Abstract: Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other's mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

  12. arXiv:2403.10454  [pdf, other

    cs.RO cs.AI

    Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness

    Authors: Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy fo… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  13. arXiv:2403.05334  [pdf, other

    cs.PL cs.AI cs.HC

    WatChat: Explaining perplexing programs by debugging mental models

    Authors: Kartik Chandra, Tzu-Mao Li, Rachit Nigam, Joshua Tenenbaum, Jonathan Ragan-Kelley

    Abstract: Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code. But sometimes, an even better explanation is a bug in the programmer's mental model of the language they are using. Instead of merely debugging our current code ("giving the programmer a fish"), what if our tools could directly debug our mental models ("teaching the programmer to fish")? In this paper,… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  14. arXiv:2402.19471  [pdf, other

    cs.CL cs.AI

    Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling

    Authors: Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum

    Abstract: Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (L… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CogSci 2024

  15. arXiv:2402.17930  [pdf, other

    cs.AI cs.CL cs.LG

    Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning

    Authors: Tan Zhi-Xuan, Lance Ying, Vikash Mansinghka, Joshua B. Tenenbaum

    Abstract: People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction fol… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to AAMAS 2024. 8 pages (excl. references), 5 figures/tables. (Appendix: 8 pages, 8 figures/tables). Code available at: https://github.com/probcomp/CLIPS.jl

  16. arXiv:2402.10416  [pdf, other

    cs.AI cs.CL

    Grounding Language about Belief in a Bayesian Theory-of-Mind

    Authors: Lance Ying, Tan Zhi-Xuan, Lionel Wong, Vikash Mansinghka, Joshua Tenenbaum

    Abstract: Despite the fact that beliefs are mental states that cannot be directly observed, humans talk about each others' beliefs on a regular basis, often using rich compositional language to describe what others think and know. What explains this capacity to interpret the hidden epistemic content of other minds? In this paper, we take a step towards an answer by grounding the semantics of belief statemen… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Under Review, 7 pages

  17. arXiv:2402.06119  [pdf, other

    cs.CV

    ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

    Authors: Zhicheng Zheng, Xin Yan, Zhenfang Chen, **gzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan

    Abstract: We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense. ContPhy complements existing physical reasoning benchmarks by encompassing the inference of diverse physical properties, such as mass and density, across various scenarios and predicting corresponding dynamics. We evaluated a range of AI models and found that they still struggle to… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: The first three authors contributed equally to this work

  18. arXiv:2401.12975  [pdf, other

    cs.CV cs.AI cs.CL

    HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

    Authors: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICLR 2024. The first two authors contributed equally to this work

  19. arXiv:2401.08743  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MMToM-QA: Multimodal Theory of Mind Question Answering

    Authors: Chuanyang **, Yutong Wu, **g Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for develo** machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than v… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 26 pages, 11 figures, 7 tables

  20. arXiv:2401.06005  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    How does the primate brain combine generative and discriminative computations in vision?

    Authors: Benjamin Peters, James J. DiCarlo, Todd Gureckis, Ralf Haefner, Leyla Isik, Joshua Tenenbaum, Talia Konkle, Thomas Naselaris, Kimberly Stachenfeld, Zenna Tavares, Doris Tsao, Ilker Yildirim, Nikolaus Kriegeskorte

    Abstract: Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remo… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  21. arXiv:2312.08715  [pdf, other

    cs.RO

    Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes

    Authors: Nishad Gothoskar, Matin Ghavami, Eric Li, Aidan Curtis, Michael Noseworthy, Karen Chung, Brian Patton, William T. Freeman, Joshua B. Tenenbaum, Mirko Klukas, Vikash K. Mansinghka

    Abstract: Robots cannot yet match humans' ability to rapidly learn the shapes of novel 3D objects and recognize them robustly despite clutter and occlusion. We present Bayes3D, an uncertainty-aware perception system for structured 3D scenes, that reports accurate posterior uncertainty over 3D object shape, pose, and scene composition in the presence of clutter and occlusion. Bayes3D delivers these capabilit… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  22. arXiv:2312.08566  [pdf, other

    cs.AI cs.CL cs.RO

    Learning adaptive planning representations with natural language guidance

    Authors: Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas

    Abstract: Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  23. arXiv:2312.04709  [pdf, other

    cs.LG cs.NE

    How to guess a gradient

    Authors: Utkarsh Singhal, Brian Cheung, Kartik Chandra, Jonathan Ragan-Kelley, Joshua B. Tenenbaum, Tomaso A. Poggio, Stella X. Yu

    Abstract: How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Expl… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  24. arXiv:2312.03682  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    What Planning Problems Can A Relational Neural Network Solve?

    Authors: Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling

    Abstract: Goal-conditioned policies are generally understood to be "feed-forward" circuits, in the form of neural networks that map from the current state and the goal specification to the next action to take. However, under what circumstances such a policy can be learned and how efficient the policy will be are not well understood. In this paper, we present a circuit complexity analysis for relational neur… ▽ More

    Submitted 2 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 (Spotlight). Project page: https://concepts-ai.com/p/goal-regression-width/

  25. arXiv:2311.17053  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models

    Authors: Tsun-Hsuan Wang, Juntian Zheng, **chuan Ma, Yilun Du, Byungchul Kim, Andrew Spielberg, Joshua Tenenbaum, Chuang Gan, Daniela Rus

    Abstract: Nature evolves creatures with a high complexity of morphological and behavioral intelligence, meanwhile computational methods lag in approaching that diversity and efficacy. Co-optimization of artificial creatures' morphology and control in silico shows promise for applications in physical soft robotics and virtual character creation; such approaches, however, require develo** new learning algor… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. Project page: https://diffusebot.github.io/

  26. arXiv:2311.03293  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Reusable Manipulation Strategies

    Authors: Jiayuan Mao, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Humans demonstrate an impressive ability to acquire and generalize manipulation "tricks." Even from a single demonstration, such as using soup ladles to reach for distant objects, we can apply this skill to new scenarios involving different object positions, sizes, and categories (e.g., forks and hammers). Additionally, we can flexibly combine various skills to devise long-term plans. In this pape… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: CoRL 2023. Project page: https://concepts-ai.com/p/mechanisms/

  27. arXiv:2310.19791  [pdf, other

    cs.CL cs.AI cs.LG cs.PL

    LILO: Learning Interpretable Libraries by Compressing and Documenting Code

    Authors: Gabriel Grand, Lionel Wong, Maddy Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas

    Abstract: While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guid… ▽ More

    Submitted 15 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 camera-ready

  28. arXiv:2310.16035  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    What's Left? Concept Grounding with Logic-Enhanced Foundation Models

    Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023. First two authors contributed equally. Project page: https://web.stanford.edu/~joycj/projects/left_neurips_2023

  29. LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

    Authors: Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy

    Abstract: Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing… ▽ More

    Submitted 14 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP Main 2023 (Outstanding Paper Award)

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5153-5176, Singapore. Association for Computational Linguistics

  30. arXiv:2310.14466  [pdf, other

    cs.LG

    Inferring Relational Potentials in Interacting Systems

    Authors: Armand Comas-Massagué, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps

    Abstract: Systems consisting of interacting agents are prevalent in the world, ranging from dynamical systems in physics to complex biological networks. To build systems which can interact robustly in the real world, it is thus important to be able to infer the precise interactions governing such systems. Existing approaches typically discover such interactions by explicitly modeling the feed-forward dynami… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Published at ICML 2023 (Oral)

  31. arXiv:2310.13021  [pdf, other

    q-bio.NC cs.AI

    AI for Mathematics: A Cognitive Science Perspective

    Authors: Cedegao E. Zhang, Katherine M. Collins, Adrian Weller, Joshua B. Tenenbaum

    Abstract: Mathematics is one of the most powerful conceptual systems developed and used by the human species. Dreams of automated mathematicians have a storied history in artificial intelligence (AI). Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems. In this work, we reflect on these goals from a \text… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  32. arXiv:2310.13018  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Getting aligned on representational alignment

    Authors: Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell , et al. (5 additional authors not shown)

    Abstract: Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Working paper, changes to be made in upcoming revisions

  33. arXiv:2310.11614  [pdf, other

    cs.AI

    Learning a Hierarchical Planner from Humans in Multiple Generations

    Authors: Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Josh Tenenbaum, Armando Solar-Lezama

    Abstract: A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: First two authors contributed equally

  34. arXiv:2310.10625  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Video Language Planning

    Authors: Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

    Abstract: We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video language planning (VLP), an algorithm that consists of a tree search procedure, where we train (i) vision-language models to serve as both policies and value… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: https://video-language-planning.github.io/

  35. arXiv:2310.08576  [pdf, other

    cs.RO cs.CV cs.LG stat.ML

    Learning to Act from Actionless Videos through Dense Correspondences

    Authors: Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

    Abstract: In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot g… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Project page: https://flow-diffusion.github.io/

  36. arXiv:2310.03779  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    HandMeThat: Human-Robot Communication in Physical and Social Environments

    Authors: Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum

    Abstract: We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. Ha… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2022 (Dataset and Benchmark Track). First two authors contributed equally. Project page: http://handmethat.csail.mit.edu/

  37. arXiv:2309.16650  [pdf, other

    cs.RO cs.CV

    ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

    Authors: Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull

    Abstract: For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, whi… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc

  38. arXiv:2309.14552  [pdf, other

    cs.RO cs.AI cs.LG

    Tactile Estimation of Extrinsic Contact Patch for Stable Placement

    Authors: Kei Ota, Devesh K. Jha, Krishna Murthy Jatavallabhula, Asako Kanezaki, Joshua B. Tenenbaum

    Abstract: Precise perception of contact interactions is essential for fine-grained manipulation skills for robots. In this paper, we present the design of feedback skills for robots that must learn to stack complex-shaped objects on top of each other (see Fig.1). To design such a system, a robot should be able to reason about the stability of placement from very gentle contact interactions. Our results demo… ▽ More

    Submitted 23 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ICRA2024

  39. arXiv:2309.08587  [pdf, other

    cs.LG cs.AI cs.RO

    Compositional Foundation Models for Hierarchical Planning

    Authors: Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal

    Abstract: To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarc… ▽ More

    Submitted 21 September, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Website: https://hierarchical-planning-foundation-model.github.io/

  40. arXiv:2309.00966  [pdf, other

    cs.RO cs.AI cs.LG

    Compositional Diffusion-Based Continuous Constraint Solvers

    Authors: Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning. Previous methods primarily rely on hand-engineering or learning generators for specific constraint types and then rejecting the value assignments when other constraints are violated. By contrast, our model, the compositional diffusion continuous constraint s… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Journal ref: Proceedings of CoRL 2023

  41. arXiv:2308.11300  [pdf, other

    cs.CV cs.GT

    Approaching human 3D shape perception with neurally mappable models

    Authors: Thomas P. O'Connell, Tyler Bonnen, Yoni Friedman, Ayush Tewari, Josh B. Tenenbaum, Vincent Sitzmann, Nancy Kanwisher

    Abstract: Humans effortlessly infer the 3D shape of objects. What computations underlie this ability? Although various computational models have been proposed, none of them capture the human ability to match object shape across viewpoints. Here, we ask whether and how this gap might be closed. We begin with a relatively novel class of computational models, 3D neural fields, which encapsulate the basic princ… ▽ More

    Submitted 7 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  42. arXiv:2308.11071  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    Neural Amortized Inference for Nested Multi-agent Reasoning

    Authors: Kunal Jha, Tuan Anh Le, Chuanyang **, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i.e., understanding how others infer oneself. Such intricate reasoning can be effectively modeled through nested multi-agent reasoning. Nonetheless, the computational complexity escalates exponentially with each level of reasoning, posing a significant challenge. However, humans ef… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 8 pages, 10 figures

  43. arXiv:2307.02485  [pdf, other

    cs.AI cs.CL cs.CV

    Building Cooperative Embodied Agents Modularly with Large Language Models

    Authors: Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

    Abstract: In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. While previous research either presupposes a cost-free communication channel or relies on a centralized controller with shared observations, we harness the commonsense knowledge, re… ▽ More

    Submitted 17 February, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: ICLR24. The first two authors contributed equally

  44. arXiv:2306.16207  [pdf, other

    cs.AI cs.CL cs.RO

    Inferring the Goals of Communicating Agents from Actions and Instructions

    Authors: Lance Ying, Tan Zhi-Xuan, Vikash Mansinghka, Joshua B. Tenenbaum

    Abstract: When humans cooperate, they frequently coordinate their activity through both verbal communication and non-verbal actions, using this information to infer a shared goal and plan. How can we model this inferential ability? In this paper, we introduce a model of a cooperative team where one agent, the principal, may communicate natural language instructions about their shared plan to another agent,… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 8 pages, 5 figures. Accepted to the ICML 2023 Workshop on Theory of Mind in Communicating Agents. Supplementary Information: https://osf.io/gh758/

  45. arXiv:2306.15668  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties

    Authors: Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel Bear, Chuang Gan, Joshua B. Tenenbaum, Daniel LK Yamins, Judith E Fan, Kevin A. Smith

    Abstract: General physical scene understanding requires more than simply localizing and recognizing objects -- it requires knowledge that objects can have different latent properties (e.g., mass or elasticity), and that those properties affect the outcome of physical events. While there has been great progress in physical and video prediction models in recent years, benchmarks to test their performance typi… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  46. arXiv:2306.14325  [pdf, other

    cs.AI cs.LG

    The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs

    Authors: Lance Ying, Katherine M. Collins, Megan Wei, Cedegao E. Zhang, Tan Zhi-Xuan, Adrian Weller, Joshua B. Tenenbaum, Lionel Wong

    Abstract: Human beings are social creatures. We routinely reason about other agents, and a crucial component of this social reasoning is inferring people's goals as we learn about their actions. In many settings, we can perform intuitive but reliable goal inference from language descriptions of agents, actions, and the background environments. In this paper, we study this process of language driving and inf… ▽ More

    Submitted 27 June, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: To appear at ICML Workshop on Theory of Mind in Communicating Agents

  47. arXiv:2306.12672  [pdf, other

    cs.CL cs.AI cs.SC

    From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

    Authors: Lionel Wong, Gabriel Grand, Alexander K. Lew, Noah D. Goodman, Vikash K. Mansinghka, Jacob Andreas, Joshua B. Tenenbaum

    Abstract: How does language inform our downstream thinking? In particular, how do humans make meaning from language--and how can we leverage a theory of linguistic meaning to build machines that think in more human-like ways? In this paper, we propose rational meaning construction, a computational framework for language-informed thinking that combines neural language models with probabilistic models for rat… ▽ More

    Submitted 23 June, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  48. arXiv:2306.11719  [pdf, other

    cs.CV cs.GR cs.LG

    Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

    Authors: Ayush Tewari, Tianwei Yin, George Cazenavette, Semon Rezchikov, Joshua B. Tenenbaum, Frédo Durand, William T. Freeman, Vincent Sitzmann

    Abstract: Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with… ▽ More

    Submitted 16 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Project page: https://diffusion-with-forward-models.github.io/

  49. arXiv:2306.05357  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

    Authors: Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a c… ▽ More

    Submitted 3 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: ICCV 2023. Project Webpage: https://energy-based-model.github.io/unsupervised-concept-discovery/

  50. arXiv:2306.01872  [pdf, other

    cs.AI

    Probabilistic Adaptation of Text-to-Video Models

    Authors: Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

    Abstract: Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expens… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Project website https://video-adapter.github.io/. First two authors contributed equally