Skip to main content

Showing 1–19 of 19 results for author: Sun, J J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.13217  [pdf, other

    cs.CV cs.AI

    VideoPrism: A Foundational Visual Encoder for Video Understanding

    Authors: Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

    Abstract: We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic… ▽ More

    Submitted 15 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. v2: added retrieval results on MSRVTT (1K-A), more data analyses, and ablation studies

  2. arXiv:2310.12690  [pdf, other

    cs.LG cs.AI stat.ML

    Neurosymbolic Grounding for Compositional World Models

    Authors: Atharva Sehgal, Arya Grayeli, Jennifer J. Sun, Swarat Chaudhuri

    Abstract: We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CompGen), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene e… ▽ More

    Submitted 10 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Uploading ICLR,2024 Camera Ready Version

  3. arXiv:2212.07401  [pdf, other

    cs.CV cs.AI

    BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

    Authors: Jennifer J. Sun, Lili Karashchuk, Amil Dravid, Serim Ryou, Sonia Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona

    Abstract: Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a ne… ▽ More

    Submitted 2 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Project page: https://sites.google.com/view/b-kind/3d Code: https://github.com/neuroethology/BKinD-3D

  4. arXiv:2210.05050  [pdf, other

    cs.AI

    Neurosymbolic Programming for Science

    Authors: Jennifer J. Sun, Megan Tjandrasuwita, Atharva Sehgal, Armando Solar-Lezama, Swarat Chaudhuri, Yisong Yue, Omar Costilla-Reyes

    Abstract: Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpret… ▽ More

    Submitted 7 November, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Neural Information Processing Systems 2022 - AI for science workshop

  5. arXiv:2207.10553  [pdf, other

    cs.LG cs.AI cs.CV cs.MA

    MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior

    Authors: Jennifer J. Sun, Markus Marks, Andrew Ulmer, Dipam Chakraborty, Brian Geuther, Edward Hayes, Heng Jia, Vivek Kumar, Sebastian Oleszko, Zachary Partridge, Milan Peelman, Alice Robie, Catherine E. Schretter, Keith Sheppard, Chao Sun, Param Uttarwar, Julian M. Wagner, Eric Werner, Joseph Parker, Pietro Perona, Yisong Yue, Kristin Branson, Ann Kennedy

    Abstract: We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. This dataset is collected from a variety of biology experiments, and includes triplets of interacting mice (4.7 million frames video+pose tracking data, 10 million frames pose only), symbiotic beetle-ant interactions (10 million frames video data), and groups of… ▽ More

    Submitted 30 June, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

    Comments: To appear in ICML 2023, Project website: https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset

  6. arXiv:2206.08094  [pdf, ps, other

    cs.LG cs.NE q-bio.NC stat.ML

    Deep Neural Imputation: A Framework for Recovering Incomplete Brain Recordings

    Authors: Sabera Talukder, Jennifer J. Sun, Matthew Leonard, Bingni W. Brunton, Yisong Yue

    Abstract: Neuroscientists and neuroengineers have long relied on multielectrode neural recordings to study the brain. However, in a typical experiment, many factors corrupt neural recordings from individual electrodes, including electrical noise, movement artifacts, and faulty manufacturing. Currently, common practice is to discard these corrupted recordings, reducing already limited data that is difficult… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  7. arXiv:2204.02842  [pdf

    q-bio.QM cs.CV q-bio.NC

    Open-Source Tools for Behavioral Video Analysis: Setup, Methods, and Development

    Authors: Kevin Luxem, Jennifer J. Sun, Sean P. Bradley, Keerthi Krishnan, Eric A. Yttri, Jan Zimmermann, Talmo D. Pereira, Mark Laubach

    Abstract: Recently developed methods for video analysis, especially models for pose estimation and behavior classification, are transforming behavioral quantification to be more precise, scalable, and reproducible in fields such as neuroscience and ethology. These tools overcome long-standing limitations of manual scoring of video frames and traditional "center of mass" tracking algorithms to enable video a… ▽ More

    Submitted 9 March, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: 26 pages, 2 figures, 3 tables; this is a commentary on video methods for analyzing behavior in animals that emerged from a working group organized by the OpenBehavior project (openbehavior.com)

  8. arXiv:2112.05121  [pdf, other

    cs.CV

    Self-Supervised Keypoint Discovery in Behavioral Videos

    Authors: Jennifer J. Sun, Serim Ryou, Roni Goldshmid, Brandon Weissbourd, John Dabiri, David J. Anderson, Ann Kennedy, Yisong Yue, Pietro Perona

    Abstract: We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video fram… ▽ More

    Submitted 27 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Code: https://github.com/neuroethology/BKinD Project page: https://sites.google.com/view/b-kind

  9. arXiv:2111.15186  [pdf, other

    cs.LG cs.CV

    Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis

    Authors: Albert Tseng, Jennifer J. Sun, Yisong Yue

    Abstract: Obtaining annotations for large training sets is expensive, especially in settings where domain knowledge is required, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak labels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting sca… ▽ More

    Submitted 11 May, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: 8 pages, to appear at CVPR 2022

  10. arXiv:2107.13132  [pdf, other

    cs.LG cs.AI

    Unsupervised Learning of Neurosymbolic Encoders

    Authors: Eric Zhan, Jennifer J. Sun, Ann Kennedy, Yisong Yue, Swarat Chaudhuri

    Abstract: We present a framework for the unsupervised learning of neurosymbolic encoders, which are encoders obtained by composing neural networks with symbolic programs from a domain-specific language. Our framework naturally incorporates symbolic expert knowledge into the learning process, which leads to more interpretable and factorized latent representations compared to fully neural encoders. We integra… ▽ More

    Submitted 20 December, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

  11. arXiv:2106.06114  [pdf, other

    cs.LG cs.AI

    Interpreting Expert Annotation Differences in Animal Behavior

    Authors: Megan Tjandrasuwita, Jennifer J. Sun, Ann Kennedy, Swarat Chaudhuri, Yisong Yue

    Abstract: Hand-annotated data can vary due to factors such as subjective differences, intra-rater variability, and differing annotator expertise. We study annotations from different experts who labelled the same behavior classes on a set of animal behavior videos, and observe a variation in annotation styles. We propose a new method using program synthesis to help interpret annotation differences for behavi… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 4 pages, 5 figures, presented as a poster at CV4Animals workshop @ CVPR21

  12. arXiv:2104.02710  [pdf, other

    cs.LG cs.CV

    The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions

    Authors: Jennifer J. Sun, Tomomi Karigo, Dipam Chakraborty, Sharada P. Mohanty, Benjamin Wild, Quan Sun, Chen Chen, David J. Anderson, Pietro Perona, Yisong Yue, Ann Kennedy

    Abstract: Multi-agent behavior modeling aims to understand the interactions that occur between agents. We present a multi-agent dataset from behavioral neuroscience, the Caltech Mouse Social Interactions (CalMS21) Dataset. Our dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. To help accelerate behavioral studies,… ▽ More

    Submitted 18 November, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: NeurIPS2021 Datasets & Benchmarks. Dataset: https://data.caltech.edu/records/1991, Website: https://sites.google.com/view/computational-behavior/our-datasets/calms21-dataset

  13. arXiv:2012.01405  [pdf, other

    cs.CV

    Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

    Authors: Long Zhao, Yuxiao Wang, Jia** Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

    Abstract: We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses. The method trains a network using cross-view mutual information maximization (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints in a contrastive learning manner. We further propose two regularization terms to ensure d… ▽ More

    Submitted 26 March, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR 2021 (Oral presentation). Code is available at https://github.com/google-research/google-research/tree/master/poem

  14. arXiv:2011.13917  [pdf, other

    cs.CV cs.LG

    Task Programming: Learning Data Efficient Behavior Representations

    Authors: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Yue, Pietro Perona

    Abstract: Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annot… ▽ More

    Submitted 29 March, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: To appear in as an Oral in CVPR 2021. Code: https://github.com/neuroethology/TREBA. Project page: https://sites.google.com/view/task-programming

  15. View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

    Authors: Ting Liu, Jennifer J. Sun, Long Zhao, Jia** Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam

    Abstract: Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people. However, cameras generally capture human poses in 2D as images and videos, which can have significant appearance variations across viewpoints that make the recognition tasks challenging. To address this, we explore recognizing similarity in 3D human body poses from 2D information, which has n… ▽ More

    Submitted 18 November, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted to International Journal of Computer Vision (IJCV). Code is available at https://github.com/google-research/google-research/tree/master/poem. Video synchronization results are available at https://drive.google.com/corp/drive/folders/1nhPuEcX4Lhe6iK3nv84cvSCov2eJ52Xy. arXiv admin note: text overlap with arXiv:1912.01001

  16. arXiv:2007.12101  [pdf, other

    cs.LG cs.AI cs.PL stat.ML

    Learning Differentiable Programs with Admissible Neural Heuristics

    Authors: Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri

    Abstract: We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires optimizing over a combinatorial space of program "architectures". We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivati… ▽ More

    Submitted 27 March, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: 9 pages, published in NeurIPS 2020

    ACM Class: D.3.2; I.2.6

  17. arXiv:2001.05488  [pdf, other

    cs.CV

    EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video

    Authors: Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig Adam, Gautam Prasad

    Abstract: Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels,… ▽ More

    Submitted 22 February, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Data subset at https://github.com/google-research-datasets/eev

  18. arXiv:1912.01001  [pdf, other

    cs.CV

    View-Invariant Probabilistic Embedding for Human Pose

    Authors: Jennifer J. Sun, Jia** Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu

    Abstract: Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding s… ▽ More

    Submitted 22 October, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted to ECCV 2020 (Spotlight presentation). Code is available at https://github.com/google-research/google-research/tree/master/poem . Video synchronization results are available at https://drive.google.com/corp/drive/folders/1kTc_UT0Eq0H2ZBgfEoh8qEJMFBouC-Wv

  19. arXiv:1911.12361  [pdf, ps, other

    cs.CV cs.SD eess.AS eess.IV

    GLA in MediaEval 2018 Emotional Impact of Movies Task

    Authors: Jennifer J. Sun, Ting Liu, Gautam Prasad

    Abstract: The visual and audio information from movies can evoke a variety of emotions in viewers. Towards a better understanding of viewer impact, we present our methods for the MediaEval 2018 Emotional Impact of Movies Task to predict the expected valence and arousal continuously in movies. This task, using the LIRIS-ACCEDE dataset, enables researchers to compare different approaches for predicting viewer… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

    Comments: MediaEval 2018, 29-31 October 2018, Sophia Antipolis, France. This work is presented at the workshop in MediaEval 2018 for the Emotional Impact of Movies Task

    Report number: urn:nbn:de:0074-2283-7