Skip to main content

Showing 1–50 of 71 results for author: Niebles, J C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2405.19522  [pdf

    cs.AI

    Artificial Intelligence Index Report 2024

    Authors: Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Jack Clark

    Abstract: The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ev… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2402.15506  [pdf, other

    cs.AI cs.CL cs.LG

    AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

    Authors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

    Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Add GitHub repo link at \url{https://github.com/SalesforceAIResearch/xLAM} and HuggingFace model link at \url{https://huggingface.co/Salesforce/xLAM-v0.1-r}

  4. arXiv:2401.10495  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Causal Layering via Conditional Entropy

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle, when distributions are discrete. Our algorithms work by repeatedly removing sources or s… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  5. arXiv:2401.07526  [pdf, other

    cs.CL cs.AI cs.LG

    Editing Arbitrary Propositions in LLMs without Subject Labels

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositi… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  6. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  7. arXiv:2310.18615  [pdf, other

    cs.LG stat.ML

    Temporally Disentangled Representation Learning under Unknown Nonstationarity

    Authors: Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, Kun Zhang

    Abstract: In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed aux… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  8. arXiv:2310.03715  [pdf

    cs.AI cs.CY

    Artificial Intelligence Index Report 2023

    Authors: Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, Raymond Perrault

    Abstract: Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. Th… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  9. arXiv:2308.05960  [pdf, other

    cs.AI

    BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

    Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limi… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Preprint

  10. arXiv:2308.02151  [pdf, other

    cs.CL cs.AI

    Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

    Authors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents ena… ▽ More

    Submitted 5 May, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

  11. arXiv:2307.08962  [pdf, other

    cs.AI cs.LG

    REX: Rapid Exploration and eXploitation for AI Agents

    Authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer… ▽ More

    Submitted 26 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  12. arXiv:2306.01623  [pdf, other

    cs.CV cs.AI cs.LG

    HomE: Homography-Equivariant Video Representation Learning

    Authors: Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li Fei-Fei, Ehsan Adeli

    Abstract: Recent advances in self-supervised representation learning have enabled more efficient and robust model performance without relying on extensive labeled data. However, most works are still focused on images, with few working on videos and even fewer on multi-view videos, where more powerful inductive biases can be leveraged for self-supervision. In this work, we propose a novel method for represen… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: 10 pages, 4 figures, 4 tables

  13. arXiv:2305.11147  [pdf, other

    cs.CV cs.AI

    UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

    Authors: Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, Ran Xu

    Abstract: Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  14. arXiv:2305.08275  [pdf, other

    cs.CV

    ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

    Authors: Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

    Abstract: Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frameworks to curate such multimodal data, in particular language descriptions for 3D shapes, are not scalable, and the collected language descriptions are… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: CVPR2024

    Journal ref: CVPR2024

  15. arXiv:2303.18230  [pdf, other

    cs.CV

    Procedure-Aware Pretraining for Instructional Video Understanding

    Authors: Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, Juan Carlos Niebles

    Abstract: Our goal is to learn a video representation that is useful for downstream procedure understanding tasks in instructional videos. Due to the small amount of available annotations, a key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge such as the identity of the task (e.g., 'make latte'), its steps (e.g., 'pour milk'), or the potential nex… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  16. arXiv:2303.16891  [pdf, other

    cs.CV

    Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

    Authors: Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, Ran Xu

    Abstract: Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023. Project site: https://vibashan.github.io/ovis-web/

  17. arXiv:2303.05628  [pdf, other

    cs.LG cs.AI stat.ME

    On the Unlikelihood of D-Separation

    Authors: Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Devansh Arpit

    Abstract: Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist, unless the graph is extremely sparse. We then provide an analytic average ca… ▽ More

    Submitted 3 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  18. arXiv:2303.04991  [pdf, other

    cs.CV

    Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

    Authors: Qichen Fu, Xingyu Liu, Ran Xu, Juan Carlos Niebles, Kris M. Kitani

    Abstract: Accurately estimating 3D hand pose is crucial for understanding how humans interact with the world. Despite remarkable progress, existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. To adaptively leverage the… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: In ICCV 2023. Project: https://fuqichen1998.github.io/Deformer/

  19. arXiv:2301.10859  [pdf, other

    cs.LG cs.AI

    Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data

    Authors: Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, Juan Carlos Niebles

    Abstract: We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneous types. This library includes algorithms that handle linear and non-linear causal relationships between variables, and uses multi-processing for speed-up. We al… ▽ More

    Submitted 22 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  20. arXiv:2301.02650  [pdf, other

    cs.CV

    Hierarchical Point Attention for Indoor 3D Object Detection

    Authors: Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Caiming Xiong, Tom Goldstein, Juan Carlos Niebles, Ran Xu

    Abstract: 3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer restrains its ability to learn features at different scales. Such limitation makes transformer detec… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: ICRA 2024 camera-ready (7 pages, 5 figures)

  21. arXiv:2212.09877  [pdf, other

    cs.CV

    LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

    Authors: Ning Yu, Chia-Chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu

    Abstract: Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground conte… ▽ More

    Submitted 24 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

  22. arXiv:2212.05171  [pdf, other

    cs.CV

    ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

    Authors: Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

    Abstract: The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. In its 2D counterpart, recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language. Inspired by this, leveraging multimodal information for 3D modalit… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted by CVPR 2023

  23. arXiv:2208.10077  [pdf, other

    cs.CV cs.AI

    Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

    Authors: Stephen Su, Samuel Kwong, Qingyu Zhao, De-An Huang, Juan Carlos Niebles, Ehsan Adeli

    Abstract: There has been an increasing interest in multi-task learning for video understanding in recent years. In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on. We employ Necessary Condition Analysis (NCA) as a data-driven approach for deciding what… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  24. arXiv:2206.03891  [pdf, other

    cs.CV cs.AI cs.CR cs.LG eess.IV

    PrivHAR: Recognizing Human Actions From Privacy-preserving Lens

    Authors: Carlos Hinojosa, Miguel Marquez, Henry Arguello, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

    Abstract: The accelerated use of digital cameras prompts an increasing concern about privacy and security, particularly in applications such as action recognition. In this paper, we propose an optimizing framework to provide robust visual privacy protection along the human action recognition pipeline. Our framework parameterizes the camera lens to successfully degrade the quality of the videos to inhibit pr… ▽ More

    Submitted 29 January, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Oral paper presented at European Conference on Computer Vision (ECCV) 2022, in Tel Aviv, Israel

    Journal ref: Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part IV

  25. arXiv:2206.01720  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Revisiting the "Video" in Video-Language Understanding

    Authors: Shyamal Buch, Cristóbal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles

    Abstract: What makes a video task uniquely suited for videos, beyond what can be understood from a single image? Building on recent progress in self-supervised image-language models, we revisit this question in the context of video and language tasks. We propose the atemporal probe (ATP), a new model for video-language analysis which provides a stronger bound on the baseline accuracy of multimodal models co… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Comments: CVPR 2022 (Oral)

  26. arXiv:2205.03468  [pdf

    cs.AI

    The AI Index 2022 Annual Report

    Authors: Daniel Zhang, Nestor Maslej, Erik Brynjolfsson, John Etchemendy, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Michael Sellitto, Ellie Sakhaee, Yoav Shoham, Jack Clark, Raymond Perrault

    Abstract: Welcome to the fifth edition of the AI Index Report! The latest edition includes data from a broad set of academic, private, and nonprofit organizations as well as more self-collected data and original analysis than any previous editions, including an expanded technical performance chapter, a new survey of robotics researchers around the world, data on global AI legislation records in 25 countries… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

  27. arXiv:2112.09583  [pdf, other

    cs.CV

    Align and Prompt: Video-and-Language Pre-training with Entity Prompts

    Authors: Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi

    Abstract: Video-and-language pre-training has shown promising improvements on various downstream tasks. Most previous methods capture cross-modal interactions with a transformer-based multimodal encoder, not fully addressing the misalignment between unimodal video and text features. Besides, learning fine-grained visual-language alignment usually requires off-the-shelf object detectors to provide object inf… ▽ More

    Submitted 23 December, 2021; v1 submitted 17 December, 2021; originally announced December 2021.

  28. arXiv:2112.00804  [pdf, other

    cs.CV

    PreViTS: Contrastive Pretraining with Video Tracking Supervision

    Authors: Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik

    Abstract: Videos are a rich source for self-supervised learning (SSL) of visual representations due to the presence of natural temporal transformations of objects. However, current methods typically randomly sample video clips for learning, which results in an imperfect supervisory signal. In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips co… ▽ More

    Submitted 27 September, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: To be presented at WACV 2023

  29. arXiv:2111.09452  [pdf, other

    cs.CV

    Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

    Authors: Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao Liu, Caiming Xiong

    Abstract: Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on… ▽ More

    Submitted 13 July, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: ECCV 2022

  30. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  31. arXiv:2106.11173  [pdf, other

    cs.CV

    TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

    Authors: Andrés Villa, Juan-Manuel Perez-Rua, Victor Escorcia, Vladimir Araujo, Juan Carlos Niebles, Alvaro Soto

    Abstract: Recently, few-shot video classification has received an increasing interest. Current approaches mostly focus on effectively exploiting the temporal dimension in videos to improve learning under low data regimes. However, most works have largely ignored that videos are often accompanied by rich textual descriptions that can also be an essential source of information to handle few-shot recognition c… ▽ More

    Submitted 15 December, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: 18 pages including references, 6 figures, and 3 tables

  32. arXiv:2105.05226  [pdf, other

    cs.CV

    Home Action Genome: Cooperative Compositional Action Understanding

    Authors: Nishant Rai, Haofeng Chen, **gwei Ji, Rishi Desai, Kazuki Kozuka, Shun Ishizaka, Ehsan Adeli, Juan Carlos Niebles

    Abstract: Existing research on action recognition treats activities as monolithic events occurring in videos. Recently, the benefits of formulating actions as a combination of atomic-actions have shown promise in improving action understanding with the emergence of datasets containing such annotations, allowing us to learn representations capturing this information. However, there remains a lack of studies… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: CVPR '21

  33. arXiv:2104.14764  [pdf, other

    cs.CV

    CoCon: Cooperative-Contrastive Learning

    Authors: Nishant Rai, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles

    Abstract: Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. However, when applied to real-world videos, contrastive learning may unknowingly lead to the separation of instances that contain s… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

  34. arXiv:2104.09052  [pdf, other

    cs.LG

    Metadata Normalization

    Authors: Mandy Lu, Qingyu Zhao, Jiequan Zhang, Kilian M. Pohl, Li Fei-Fei, Juan Carlos Niebles, Ehsan Adeli

    Abstract: Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metad… ▽ More

    Submitted 5 May, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021. Project page: https://mml.stanford.edu/MDN/

  35. TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

    Authors: Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Carlos Niebles, Silvio Savarese, Ehsan Adeli, Hamid Rezatofighi

    Abstract: Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans' interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynam… ▽ More

    Submitted 27 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Journal ref: IEEE/CVF International Conference on Computer Vision, pp. 13390-13400. 2021

  36. arXiv:2103.06312  [pdf

    cs.AI cs.GL

    The AI Index 2021 Annual Report

    Authors: Daniel Zhang, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons, James Manyika, Juan Carlos Niebles, Michael Sellitto, Yoav Shoham, Jack Clark, Raymond Perrault

    Abstract: Welcome to the fourth edition of the AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index Report tracks, collates, distills, and visualizes data related to artif… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  37. arXiv:2007.08920  [pdf, other

    cs.CV cs.LG eess.IV

    Vision-based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity

    Authors: Mandy Lu, Kathleen Poston, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Juan Carlos Niebles, Ehsan Adeli

    Abstract: Parkinson's disease (PD) is a progressive neurological disorder primarily affecting motor function resulting in tremor at rest, rigidity, bradykinesia, and postural instability. The physical severity of PD impairments can be quantified through the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a widely used clinical rating scale. Accurate and quantitative assessmen… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted as a conference paper at MICCAI (Medical Image Computing and Computer Assisted Intervention), Lima, Peru, October 2020. 11 pages, LaTeX

  38. arXiv:2007.06843  [pdf, other

    cs.CV

    Socially and Contextually Aware Human Motion and Pose Forecasting

    Authors: Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi

    Abstract: Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: Accepted in RA-L and IROS

  39. arXiv:2003.13942  [pdf, other

    cs.CV

    Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

    Authors: Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

    Abstract: Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal grap… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  40. arXiv:2002.08945  [pdf, other

    cs.CV

    Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction

    Authors: Bingbin Liu, Ehsan Adeli, Zhangjie Cao, Kuan-Hui Lee, Abhijeet Shenoi, Adrien Gaidon, Juan Carlos Niebles

    Abstract: Reasoning over visual data is a desirable capability for robotics and vision-based applications. Such reasoning enables forecasting of the next events or actions in videos. In recent years, various models have been developed based on convolution operations for prediction or forecasting, but they lack the ability to reason over spatiotemporal data and infer the relationships of different objects in… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: Accepted at ICRA 2020 and IEEE Robotics and Automation Letters

  41. arXiv:1912.10405  [pdf, other

    cs.CV cs.LG eess.IV

    Adversarial Cross-Domain Action Recognition with Co-Attention

    Authors: Boxiao Pan, Zhangjie Cao, Ehsan Adeli, Juan Carlos Niebles

    Abstract: Action recognition has been a widely studied topic with a heavy focus on supervised learning involving sufficient labeled videos. However, the problem of cross-domain action recognition, where training and testing videos are drawn from different underlying distributions, remains largely under-explored. Previous methods directly employ techniques for cross-domain image recognition, which tend to su… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

    Comments: AAAI 2020

  42. arXiv:1912.06992  [pdf, other

    cs.CV

    Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

    Authors: **gwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

    Abstract: Action recognition has typically treated actions and activities as monolithic events that occur in videos. However, there is evidence from Cognitive Science and Neuroscience that people actively encode activities into consistent hierarchical part structures. However in Computer Vision, few explorations on representations encoding event partonomies have been made. Inspired by evidence that the prot… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

  43. arXiv:1911.05864  [pdf, other

    cs.RO cs.AI cs.CV

    Motion Reasoning for Goal-Based Imitation Learning

    Authors: De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

    Abstract: We address goal-based imitation learning, where the aim is to output the symbolic goal from a third-person video demonstration. This enables the robot to plan for execution and reproduce the same goal in a completely different environment. The key challenge is that the goal of a video demonstration is often ambiguous at the level of semantic actions. The human demonstrators might unintentionally a… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  44. arXiv:1911.01138  [pdf, other

    cs.CV cs.LG cs.RO

    Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

    Authors: Karttikeya Mangalam, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles

    Abstract: We tackle the problem of Human Locomotion Forecasting, a task for jointly predicting the spatial positions of several keypoints on the human body in the near future under an egocentric setting. In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically us… ▽ More

    Submitted 13 April, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted to WACV 2020 (Oral)

  45. arXiv:1910.03676  [pdf, other

    cs.CV cs.LG

    Representation Learning with Statistical Independence to Mitigate Bias

    Authors: Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

    Abstract: Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome… ▽ More

    Submitted 20 November, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: WACV 2021

  46. arXiv:1910.01286  [pdf, other

    cs.CV

    Learning Temporal Action Proposals With Fewer Labels

    Authors: **gwei Ji, Kaidi Cao, Juan Carlos Niebles

    Abstract: Temporal action proposals are a common module in action detection pipelines today. Most current methods for training action proposal modules rely on fully supervised approaches that require large amounts of annotated temporal action intervals in long video sequences. The large cost and effort in annotation that this entails motivate us to study the problem of training proposal modules with less su… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

  47. arXiv:1909.03449  [pdf, other

    cs.CV

    Imitation Learning for Human Pose Prediction

    Authors: Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, Juan Carlos Niebles

    Abstract: Modeling and prediction of human motion dynamics has long been a challenging problem in computer vision, and most existing methods rely on the end-to-end supervised training of various architectures of recurrent neural networks. Inspired by the recent success of deep reinforcement learning methods, in this paper we propose a new reinforcement learning formulation for the problem of human pose pred… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

    Comments: 10 pages, 7 figures, accepted to ICCV 2019

  48. arXiv:1908.06769  [pdf, other

    cs.AI cs.LG cs.RO

    Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

    Authors: De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

    Abstract: We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning pr… ▽ More

    Submitted 4 November, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: IROS 2019

  49. arXiv:1907.01172  [pdf, other

    cs.CV

    Procedure Planning in Instructional Videos

    Authors: Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

    Abstract: In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking. Given the current visual observation of the world and a visual goal, we ask the question "What actions need to be taken in order to achieve the goal?". The key technical challenge is to lear… ▽ More

    Submitted 13 April, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: 14 pages, 7 figures

  50. arXiv:1906.11415  [pdf, other

    cs.CV

    Few-Shot Video Classification via Temporal Alignment

    Authors: Kaidi Cao, **gwei Ji, Zhangjie Cao, Chien-Yi Chang, Juan Carlos Niebles

    Abstract: There is a growing interest in learning a model which could recognize novel classes with only a few labeled examples. In this paper, we propose Temporal Alignment Module (TAM), a novel few-shot learning framework that can learn to classify a previous unseen video. While most previous works neglect long-term temporal ordering information, our proposed model explicitly leverages the temporal orderin… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.