Skip to main content

Showing 1–17 of 17 results for author: Tarr, M J

.
  1. arXiv:2406.14596  [pdf, other

    cs.CV cs.AI cs.LG

    ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights

    Authors: Gabriel Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki

    Abstract: Large-scale generative language and vision-language models (LLMs and VLMs) excel in few-shot in-context learning for decision making and instruction following. However, they require high-quality exemplar demonstrations to be included in their context window. In this work, we ask: Can LLMs and VLMs generate their own prompt examples from generic, sub-optimal demonstrations? We propose In-Context Ab… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project website: http://ical-learning.github.io/

  2. arXiv:2406.13735  [pdf, other

    cs.CV cs.LG

    StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

    Authors: Rushikesh Zawar, Shaurya Dewan, Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

    Abstract: Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statist… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Dataset website: https://stablesemantics.github.io/StableSemantics

  3. arXiv:2406.02659  [pdf, other

    q-bio.NC cs.AI cs.CV

    Neural Representations of Dynamic Visual Stimuli

    Authors: Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr

    Abstract: Humans experience the world through constantly changing visual stimuli, where scenes can shift and move, change in appearance, and vary in distance. The dynamic nature of visual perception is a fundamental aspect of our daily lives, yet the large majority of research on object and scene processing, particularly using fMRI, has focused on static stimuli. While studies of static image perception are… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2404.19065  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

    Authors: Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki

    Abstract: Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans. In this technical report, we extend the capabilities of HELP… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Videos and code https://helper-agent-llm.github.io/

  5. arXiv:2311.09308  [pdf, other

    cs.CL cs.AI cs.LG q-bio.NC

    Divergences between Language Models and Human Brains

    Authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe

    Abstract: Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use… ▽ More

    Submitted 4 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  6. arXiv:2310.15127  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

    Authors: Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

    Abstract: Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an… ▽ More

    Submitted 20 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Project page with code & videos: https://helper-agent-llm.github.io

  7. arXiv:2310.04420  [pdf, other

    cs.LG q-bio.NC

    BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

    Authors: Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

    Abstract: Understanding the functional organization of higher visual cortex is a central focus in neuroscience. Past studies have primarily mapped the visual and semantic selectivity of neural populations using hand-selected stimuli, which may potentially bias results towards pre-existing hypotheses of visual cortex functionality. Moving beyond conventional approaches, we introduce a data-driven method that… ▽ More

    Submitted 3 May, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project page: https://www.cs.cmu.edu/~afluo/BrainSCUBA

  8. arXiv:2306.14035  [pdf, other

    cs.CV

    Thinking Like an Annotator: Generation of Dataset Labeling Instructions

    Authors: Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan

    Abstract: Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the stru… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  9. arXiv:2306.03089  [pdf, other

    cs.CV

    Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

    Authors: Andrew F. Luo, Margaret M. Henderson, Leila Wehbe, Michael J. Tarr

    Abstract: A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually as… ▽ More

    Submitted 28 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Oral). Project page: https://www.cs.cmu.edu/~afluo/BrainDiVE/

  10. arXiv:2304.02492  [pdf, other

    cs.CL cs.AI cs.LG

    Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition

    Authors: Yuchen Zhou, Michael J. Tarr, Daniel Yurovsky

    Abstract: Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning b… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  11. arXiv:2207.10761  [pdf, other

    cs.CV

    TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

    Authors: Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

    Abstract: We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors. TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects. Commonsense priors are encoded in three modules… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  12. arXiv:2204.00628  [pdf, other

    cs.SD cs.CV cs.LG cs.RO eess.AS

    Learning Neural Acoustic Fields

    Authors: Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

    Abstract: Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of t… ▽ More

    Submitted 14 January, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022. Project page: https://www.andrew.cmu.edu/user/afluo/Neural_Acoustic_Fields/

  13. arXiv:2008.07073  [pdf, other

    cs.CV

    AlphaNet: Improving Long-Tail Classification By Combining Classifiers

    Authors: Nadine Chang, Jayanth Koushik, Aarti Singh, Martial Hebert, Yu-Xiong Wang, Michael J. Tarr

    Abstract: Methods in long-tail learning focus on improving performance for data-poor (rare) classes; however, performance for such classes remains much lower than performance for more data-rich (frequent) classes. Analyzing the predictions of long-tail methods for rare classes reveals that a large number of errors are due to misclassification of rare items as visually similar frequent classes. To address th… ▽ More

    Submitted 26 July, 2023; v1 submitted 16 August, 2020; originally announced August 2020.

  14. Learning Intermediate Features of Object Affordances with a Convolutional Neural Network

    Authors: Aria Yuan Wang, Michael J. Tarr

    Abstract: Our ability to interact with the world around us relies on being able to infer what actions objects afford -- often referred to as affordances. The neural mechanisms of object-action associations are realized in the visuomotor pathway where information about both visual properties and actions is integrated into common representations. However, explicating these mechanisms is particularly challengi… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: Published on 2018 Conference on Cognitive Computational Neuroscience. See <https://ccneuro.org/2018/Papers/ViewPapers.asp?PaperNum=1134>

  15. BOLD5000: A public fMRI dataset of 5000 images

    Authors: Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff

    Abstract: Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that integrate neuroscience, the number of images used in… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Currently in submission to Scientific Data

  16. arXiv:1606.05004  [pdf

    q-bio.NC

    Temporal and Semantic Effects on Multisensory Integration

    Authors: Jean M. Vettel, Julia R. Green, Laurie Heller, Michael J. Tarr

    Abstract: How do we integrate modality-specific perceptual information arising from the same physical event into a coherent percept? One possibility is that observers rely on information across perceptual modalities that shares temporal structure and/or semantic associations. To explore the contributions of these two factors in multisensory integration, we manipulated the temporal and semantic relationships… ▽ More

    Submitted 15 June, 2016; originally announced June 2016.

    Comments: in revision from JoN

  17. arXiv:1512.00899  [pdf, other

    stat.AP

    Estimating Learning Effects: A Short-Time Fourier Transform Regression Model for MEG Source Localization

    Authors: Ying Yang, Michael J. Tarr, Robert E. Kass

    Abstract: Magnetoencephalography (MEG) has a high temporal resolution well-suited for studying perceptual learning. However, to identify where learning happens in the brain, one needs to ap- ply source localization techniques to project MEG sensor data into brain space. Previous source localization methods, such as the short-time Fourier transform (STFT) method by Gramfort et al.([Gramfort et al., 2013]) pr… ▽ More

    Submitted 2 December, 2015; originally announced December 2015.

    Comments: Author manuscript accepted to 4th NIPS Workshop on Machine Learning and Interpretation in Neuroimaging (2014), (in press on Lecture Notes in Computer Science, by Springer)