Skip to main content

Showing 1–50 of 134 results for author: Farhadi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11775  [pdf, other

    cs.CV cs.AI

    Task Me Anything

    Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

    Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: website: https://www.task-me-anything.org

  2. arXiv:2405.18400  [pdf, other

    cs.CL cs.LG

    Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

    Authors: Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

    Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages, 15 figures

  3. arXiv:2401.07100  [pdf, other

    cs.IT eess.SP

    Meta-Learning for Resource Allocation in Uplink Multi STAR-RIS-aided NOMA System

    Authors: Sepideh Javadi, Armin Farhadi, Mohammad Robat Mili, Eduard Jorswieck, Naofal Al-Dhahir

    Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a novel technology which enables the full-space coverage. In this letter, a multi STAR-RIS-aided system using non-orthogonal multiple access in an uplink transmission is considered, where the multi-order reflections among multiple STAR-RISs assist the transmission from the single-antenna users to the multi-… ▽ More

    Submitted 25 May, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

  4. arXiv:2312.04837  [pdf, other

    cs.AI cs.CL cs.CV

    Localized Symbolic Knowledge Distillation for Visual Commonsense Models

    Authors: Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Ye** Choi

    Abstract: Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applica… ▽ More

    Submitted 12 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Neurips 2023

  5. arXiv:2312.02332  [pdf, ps, other

    cs.DS

    Connected Components in Linear Work and Near-Optimal Time

    Authors: Alireza Farhadi, S. Cliff Liu, Elaine Shi

    Abstract: Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in $o(\log n)$ parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components… ▽ More

    Submitted 20 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  6. arXiv:2311.05784  [pdf, other

    cs.CV cs.AI cs.LG

    Are "Hierarchical" Visual Representations Hierarchical?

    Authors: Ethan Shen, Ali Farhadi, Aditya Kusupati

    Abstract: Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this wor… ▽ More

    Submitted 23 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

  7. arXiv:2311.04193  [pdf, other

    cs.CV cs.AI

    Selective Visual Representations Improve Convergence and Generalization for Embodied AI

    Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

    Abstract: Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visu… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: See project website: https://embodied-codebook.github.io

  8. arXiv:2310.14108  [pdf, other

    cs.LG cs.AI cs.CV

    CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

    Authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

    Abstract: Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual represent… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  9. arXiv:2310.12126  [pdf, other

    cs.LG cs.AI cs.CL

    SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

    Authors: Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  10. arXiv:2310.07707  [pdf, other

    cs.LG cs.CL cs.CV

    MatFormer: Nested Transformer for Elastic Inference

    Authors: Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain

    Abstract: Transformer models are deployed in a wide range of settings, from multi-accelerator clusters to standalone mobile phones. The diverse inference constraints in these scenarios necessitate practitioners to train foundation models such as PaLM 2, Llama, & ViTs as a series of models of varying sizes. Due to significant training costs, only a select few model sizes are trained and supported, limiting m… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 31 pages, 12 figures, first three authors contributed equally

  11. arXiv:2307.12532  [pdf, other

    cs.CV cs.LG

    On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

    Authors: Vivek Ramanujan, Thao Nguyen, Sewoong Oh, Ludwig Schmidt, Ali Farhadi

    Abstract: Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustn… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  12. arXiv:2307.05663  [pdf, other

    cs.CV cs.AI

    Objaverse-XL: A Universe of 10M+ 3D Objects

    Authors: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi

    Abstract: Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  13. arXiv:2306.10191  [pdf, other

    cs.LG cs.AI cs.CV

    Neural Priming for Sample-Efficient Adaptation

    Authors: Matthew Wallingford, Vivek Ramanujan, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi

    Abstract: We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be perfo… ▽ More

    Submitted 4 December, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: 18 pages, 7 figures, 9 tables

  14. arXiv:2306.00238  [pdf, other

    cs.CV

    Bytes Are All You Need: Transformers Operating Directly On File Bytes

    Authors: Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari

    Abstract: Modern deep learning approaches usually utilize modality-specific processing. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate modality-independent representation learning by performing classification directly on file bytes, without the need for decoding f… ▽ More

    Submitted 1 July, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Journal ref: Transactions on Machine Learning Research 2835-8856 (2024)

  15. arXiv:2305.19435  [pdf, other

    cs.LG cs.IR

    AdANNS: A Framework for Adaptive Semantic Search

    Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi

    Abstract: Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive… ▽ More

    Submitted 18 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 25 pages, 15 figures. NeurIPS 2023 camera ready publication

  16. arXiv:2304.14108  [pdf, other

    cs.CV cs.CL cs.LG

    DataComp: In search of the next generation of multimodal datasets

    Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song , et al. (9 additional authors not shown)

    Abstract: Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Commo… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  17. arXiv:2304.13013  [pdf, other

    cs.LG cs.CV

    Stable and low-precision training for large-scale vision-language models

    Authors: Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

    Abstract: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the performance of bfloat16 training within 0.1 percentage points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date. Our main focus… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023

  18. arXiv:2304.12289  [pdf, other

    cs.CV cs.AI cs.RO

    Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

    Authors: Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi

    Abstract: A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet f… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: 21 pages, 17 figures, ICLR 2023

  19. arXiv:2303.08983  [pdf, other

    cs.CV cs.AI cs.LG

    Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

    Authors: Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel

    Abstract: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-base… ▽ More

    Submitted 22 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted at International Conference on Computer Vision (ICCV) 2023. v2: Camera-ready version with new Tables 9 and 10. v3: Correction to Table 7-Avg. column

  20. arXiv:2303.04766  [pdf, other

    cs.CV cs.IR cs.LG

    FastFill: Efficient Compatible Model Update

    Authors: Florian Jaeckle, Fartash Faghri, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

    Abstract: In many retrieval systems the original high dimensional data (e.g., images) is mapped to a lower dimensional feature through a learned embedding model. The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features. When the embedding model is updated, it might produce features that are not comparable/compatible with f… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: To appear in The Eleventh International Conference on Learning Representations

  21. arXiv:2301.04101  [pdf, other

    cs.CV cs.LG

    Neural Radiance Field Codebooks

    Authors: Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi

    Abstract: Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view recons… ▽ More

    Submitted 30 April, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: 19 pages, 8 figures, 9 tables

    Journal ref: International Conference on Learning Representations 2023

  22. arXiv:2212.10553  [pdf, other

    cs.CV cs.AI cs.LG

    RangeAugment: Efficient Online Augmentation with Range Learning

    Authors: Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari

    Abstract: State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Technical report (22 pages including references and appendix)

  23. arXiv:2212.08051  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Objaverse: A Universe of Annotated 3D Objects

    Authors: Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi

    Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Website: objaverse.allenai.org

  24. arXiv:2212.05923  [pdf, other

    cs.RO cs.LG

    Self-Supervised Object Goal Navigation with In-Situ Finetuning

    Authors: So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang

    Abstract: A household robot should be able to navigate to target objects without requiring users to first annotate everything in their home. Most current approaches to object navigation do not test on real robots and rely solely on reconstructed scans of houses and their expensively labeled semantic 3D meshes. In this work, our goal is to build an agent that builds self-supervised models of the world via ex… ▽ More

    Submitted 1 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  25. arXiv:2212.04819  [pdf, other

    cs.RO cs.AI cs.CV

    Phone2Proc: Bringing Robust Robots Into Our Chaotic World

    Authors: Matt Deitke, Rose Hendrix, Luca Weihs, Ali Farhadi, Kiana Ehsani, Aniruddha Kembhavi

    Abstract: Training embodied agents in simulation has become mainstream for the embodied AI community. However, these agents often struggle when deployed in the physical world due to their inability to generalize to real-world environments. In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan and conditional procedural generation to create a distribution of training scenes that are… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: https://allenai.org/project/phone2proc

  26. arXiv:2212.04089  [pdf, other

    cs.LG cs.CL cs.CV

    Editing Models with Task Arithmetic

    Authors: Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

    Abstract: Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when develo** machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pr… ▽ More

    Submitted 31 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023)

  27. arXiv:2210.17515  [pdf, other

    cs.DS

    Beating $(1-1/e)$-Approximation for Weighted Stochastic Matching

    Authors: Mahsa Derakhshan, Alireza Farhadi

    Abstract: In the stochastic weighted matching problem, the goal is to find a large-weight matching of a graph when we are uncertain about the existence of its edges. In particular, each edge $e$ has a known weight $w_e$ but is realized independently with some probability $p_e$. The algorithm may query an edge to see whether it is realized. We consider the well-studied query commit version of the problem, in… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  28. arXiv:2210.11948  [pdf, other

    cs.LG

    lo-fi: distributed fine-tuning without communication

    Authors: Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

    Abstract: When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base… ▽ More

    Submitted 12 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  29. arXiv:2210.06849  [pdf, other

    cs.CV

    Retrospectives on the Embodied AI Workshop

    Authors: Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi , et al. (14 additional authors not shown)

    Abstract: We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of… ▽ More

    Submitted 4 December, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  30. arXiv:2209.13156  [pdf, other

    cs.CV cs.AI

    Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

    Authors: Yao-Hung Hubert Tsai, Hanlin Goh, Ali Farhadi, Jian Zhang

    Abstract: The perception system in personalized mobile agents requires develo** indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc. Nonetheless, this direction has not been well-explored in comparison with models for outdoor environments (e.g., the autonomous driving system that includes pedestrian prediction, car detection, traffic s… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Submitted to ICRA2023

  31. arXiv:2209.11789  [pdf, other

    cs.RO cs.AI

    SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning

    Authors: Mario Srouji, Hugues Thomas, Hubert Tsai, Ali Farhadi, Jian Zhang

    Abstract: Collision avoidance is key for mobile robots and agents to operate safely in the real world. In this work we present SAFER, an efficient and effective collision avoidance system that is able to improve safety by correcting the control commands sent by an operator. It combines real-world reinforcement learning (RL), search-based online trajectory planning, and automatic emergency intervention, e.g.… ▽ More

    Submitted 28 June, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted in IEEE International Conference on Automation Science and Engineering (CASE), 2023

  32. arXiv:2209.03320  [pdf, other

    cs.CV cs.LG

    What does a platypus look like? Generating customized prompts for zero-shot image classification

    Authors: Sarah Pratt, Ian Covert, Rosanne Liu, Ali Farhadi

    Abstract: Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the ca… ▽ More

    Submitted 3 December, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: ICCV 2023

  33. arXiv:2208.05592  [pdf, other

    cs.CV cs.LG

    Patching open-vocabulary models by interpolating weights

    Authors: Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

    Abstract: Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method tha… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  34. arXiv:2207.13738  [pdf, other

    cs.CV cs.AI

    Break and Make: Interactive Structural Understanding Using LEGO Bricks

    Authors: Aaron Walsman, Muru Zhang, Klemen Kotar, Karthik Desingh, Ali Farhadi, Dieter Fox

    Abstract: Visual understanding of geometric structures with complex spatial relationships is a fundamental component of human intelligence. As children, we learn how to reason about structure not only from observation, but also by interacting with the world around us -- by taking things apart and putting them back together again. The ability to reason about structure and compositionality allows us to not on… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. LTRON simulator and environment page: https://github.com/aaronwalsman/ltron. Training examples: https://github.com/aaronwalsman/ltron-torch-eccv22

  35. arXiv:2206.06994  [pdf, other

    cs.AI cs.CV cs.RO

    ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

    Authors: Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, an… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: ProcTHOR website: https://procthor.allenai.org

  36. arXiv:2205.14717  [pdf, ps, other

    cs.DS

    Generalized Stochastic Matching

    Authors: Alireza Farhadi, Jacob Gilbert, MohammadTaghi Hajiaghayi

    Abstract: In this paper, we generalize the recently studied Stochastic Matching problem to more accurately model a significant medical process, kidney exchange, and several other applications. Up until now the Stochastic Matching problem that has been studied was as follows: given a graph G = (V, E), each edge is included in the realized sub-graph of G mutually independently with probability p_e, and the go… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

  37. arXiv:2205.13147  [pdf, other

    cs.LG cs.CV

    Matryoshka Representation Learning

    Authors: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

    Abstract: Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we d… ▽ More

    Submitted 7 February, 2024; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Edited related work to include intrinsic dimensionality works

  38. arXiv:2203.08141  [pdf, other

    cs.CV cs.LG cs.RO

    Object Manipulation via Visual Target Localization

    Authors: Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent's arm, noisy object detection and localization, and the target frequently going out of view as the agent moves around in the scene. We propose Manipulation via Visual O… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  39. arXiv:2203.05482  [pdf, other

    cs.LG cs.CL cs.CV

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

    Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

    Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low… ▽ More

    Submitted 1 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ICML 2022. The last three authors contributed equally

  40. arXiv:2201.02639  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

    Authors: Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Ye** Choi

    Abstract: As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve, a model that represents videos jointly over time -- through a new training objective that learns from audio, subtitles, and video frames. Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet. Ou… ▽ More

    Submitted 13 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: CVPR 2022. Project page at https://rowanzellers.com/merlotreserve

  41. arXiv:2201.00411  [pdf, other

    cs.CV cs.AI

    The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

    Authors: Sarah Pratt, Luca Weihs, Ali Farhadi

    Abstract: The last few years have witnessed substantial progress in the field of embodied AI where artificial agents, mirroring biological counterparts, are now able to learn from interaction to accomplish complex tasks. Despite this success, biological organisms still hold one large advantage over these simulated agents: adaptation. While both living and simulated agents make decisions to achieve goals (st… ▽ More

    Submitted 2 January, 2022; originally announced January 2022.

  42. arXiv:2112.02805  [pdf, other

    cs.CV

    Forward Compatible Training for Large-Scale Embedding Retrieval Systems

    Authors: Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

    Abstract: In visual retrieval systems, updating the embedding model requires recomputing features for every piece of data. This expensive process is referred to as backfilling. Recently, the idea of backward compatible training (BCT) was proposed. To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model. However, BCT can sign… ▽ More

    Submitted 29 March, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: 14 pages with appendix. In proceedings at the conference on Computer Vision and Pattern Recognition 2022

  43. arXiv:2112.00800  [pdf, other

    cs.CL cs.AI

    Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

    Authors: Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali Farhadi

    Abstract: Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challeng… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: In EMNLP 2021

  44. arXiv:2110.07084  [pdf, other

    cs.DS

    Online Bipartite Matching with Reusable Resources

    Authors: Steven Delong, Alireza Farhadi, Rad Niazadeh, Balasubramanian Sivan, Rajan Udwani

    Abstract: We study the classic online bipartite matching problem with a twist: offline vertices, called resources, are $\textit{reusable}$. In particular, when a resource is matched to an online vertex it is unavailable for a deterministic time duration $d$ after which it becomes available again for a re-match. Thus, a resource can be matched to many different online vertices over a period of time. While re… ▽ More

    Submitted 23 October, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: This paper is a merged version of two ACM EC 2022 papers: (i) "Online Bipartite Matching of Reusable Resources" by Steven Delong, Alireza Farhadi, Rad Niazadeh and Balu Sivan (ii) "Periodic Reranking for Online Matching of Reusable Resources" By Rajan Udwani Journal version: Under submission in Mathematics of Operations Research

  45. arXiv:2110.04252  [pdf, other

    cs.LG cs.CV

    LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

    Authors: Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari

    Abstract: When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  46. arXiv:2109.01903  [pdf, other

    cs.CV cs.LG

    Robust fine-tuning of zero-shot models

    Authors: Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

    Abstract: Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple a… ▽ More

    Submitted 21 June, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: CVPR 2022

  47. arXiv:2107.03438  [pdf, other

    cs.RO cs.CL cs.CV

    LanguageRefer: Spatial-Language Model for 3D Visual Grounding

    Authors: Junha Roh, Karthik Desingh, Ali Farhadi, Dieter Fox

    Abstract: For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to identify common objects in real-world 3D scenes. In this paper, we introduce a spatial-language model for a 3D visual grounding problem. Specifically, given a reconstructed 3D scene in the form of point clouds with 3D boundin… ▽ More

    Submitted 4 November, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: 11 pages, 3 figures

  48. arXiv:2106.02636  [pdf, other

    cs.CV cs.CL cs.LG

    MERLOT: Multimodal Neural Script Knowledge Models

    Authors: Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Ye** Choi

    Abstract: As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (s… ▽ More

    Submitted 21 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: project page at https://rowanzellers.com/merlot; NeurIPS 2021 camera ready

  49. arXiv:2106.01487  [pdf, other

    cs.LG cs.CV

    LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

    Authors: Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

    Abstract: Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require large bit-codes to be accurate. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for ins… ▽ More

    Submitted 6 October, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Camera Ready. 19 pages, 6 figures

  50. arXiv:2106.00508  [pdf, other

    cs.CR cs.DS cs.GT

    Differentially Private Densest Subgraph

    Authors: Alireza Farhadi, MohammadTaghi Hajiaghayi, Elaine Shi

    Abstract: Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to th… ▽ More

    Submitted 14 November, 2022; v1 submitted 1 June, 2021; originally announced June 2021.