Skip to main content

Showing 101–135 of 135 results for author: Farhadi, A

.
  1. arXiv:1712.05474  [pdf, other

    cs.CV cs.AI cs.LG

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Authors: Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

    Abstract: We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning,… ▽ More

    Submitted 26 August, 2022; v1 submitted 14 December, 2017; originally announced December 2017.

  2. arXiv:1712.03316  [pdf, other

    cs.CV

    IQA: Visual Question Answering in Interactive Environments

    Authors: Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, Ali Farhadi

    Abstract: We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: "Are there any apples in the fridge?" The agent must navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and… ▽ More

    Submitted 6 September, 2018; v1 submitted 8 December, 2017; originally announced December 2017.

    Comments: Published in CVPR 2018

  3. arXiv:1712.01867  [pdf, other

    cs.CV

    Structured Set Matching Networks for One-Shot Part Labeling

    Authors: Jonghyun Choi, Jayant Krishnamurthy, Aniruddha Kembhavi, Ali Farhadi

    Abstract: Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large training datasets of diagrams, which are very hard to obtain. Instead, this can be addressed as a matching problem either between labeled diagrams, images or both. This problem is very challenging since th… ▽ More

    Submitted 3 April, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: one shot part labeling. CVPR 2018 accepted as spotlight presentation

  4. arXiv:1711.02085  [pdf, other

    cs.CL

    Neural Speed Reading via Skim-RNN

    Authors: Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens. Skim-RNN gives computational advantage over an RNN that always updates the entire hidden state. Skim-RNN uses the same input and output interfaces as a standard RNN and can be easil… ▽ More

    Submitted 28 March, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  5. arXiv:1710.00271  [pdf, ps, other

    cs.GT

    On the Complexity of Chore Division

    Authors: Alireza Farhadi, MohammadTaghi Hajiaghayi

    Abstract: We study the proportional chore division problem where a protocol wants to divide an undesirable object, called chore, among $n$ different players. The goal is to find an allocation such that the cost of the chore assigned to each player be at most $1/n$ of the total cost. This problem is the dual variant of the cake cutting problem in which we want to allocate a desirable object. Edmonds and Pruh… ▽ More

    Submitted 7 May, 2018; v1 submitted 30 September, 2017; originally announced October 2017.

  6. arXiv:1709.05939  [pdf, other

    cs.CV q-bio.NC

    AJILE Movement Prediction: Multimodal Deep Learning for Natural Human Neural Recordings and Video

    Authors: Nancy Xin Ru Wang, Ali Farhadi, Rajesh Rao, Bingni Brunton

    Abstract: Develo** useful interfaces between brains and machines is a grand challenge of neuroengineering. An effective interface has the capacity to not only interpret neural signals, but predict the intentions of the human to perform an action in the near future; prediction is made even more challenging outside well-controlled laboratory experiments. This paper describes our approach to detect and to pr… ▽ More

    Submitted 1 March, 2018; v1 submitted 12 September, 2017; originally announced September 2017.

    Journal ref: Thirty-Second AAAI Conference On Artificial Intelligence (2018)

  7. arXiv:1705.08080  [pdf, other

    cs.CV cs.LG cs.RO

    Visual Semantic Planning using Deep Successor Representations

    Authors: Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

    Abstract: A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects an… ▽ More

    Submitted 15 August, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: ICCV 2017 camera ready

  8. arXiv:1705.06368  [pdf, other

    cs.CV

    Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects

    Authors: Daniel Gordon, Ali Farhadi, Dieter Fox

    Abstract: Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time. A tracker must be able to modify its underlying model and adapt to new observations. We present Re3, a real-time deep object tracker capable of incorporating temporal information into its model. Rather than focusing on a limited set of objects or traini… ▽ More

    Submitted 26 February, 2018; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: Presented at ICRA 2018

    Journal ref: IEEE Robotics and Automation Letters 2018

  9. arXiv:1703.10239  [pdf, other

    cs.CV

    SeGAN: Segmenting and Generating the Invisible

    Authors: Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

    Abstract: Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects. Doing so requires knowing which pixels to paint (segmenting the invisible parts of objects) and what… ▽ More

    Submitted 7 May, 2018; v1 submitted 29 March, 2017; originally announced March 2017.

    Comments: Accepted to CVPR18 as spotlight

  10. arXiv:1703.01649  [pdf, ps, other

    cs.GT

    Fair Allocation of Indivisible Goods to Asymmetric Agents

    Authors: Alireza Farhadi, Mohammad Ghodsi, MohammadTaghi Hajiaghayi, Sebastien Lahaie, David Pennock, Masoud Seddighin, Saeed Seddighin, Hadi Yami

    Abstract: We study fair allocation of indivisible goods to agents with unequal entitlements. Fair allocation has been the subject of many studies in both divisible and indivisible settings. Our emphasis is on the case where the goods are indivisible and agents have unequal entitlements. This problem is a generalization of the work by Procaccia and Wang wherein the agents are assumed to be symmetric with res… ▽ More

    Submitted 11 April, 2017; v1 submitted 5 March, 2017; originally announced March 2017.

  11. arXiv:1701.02718  [pdf, other

    cs.CV

    See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content

    Authors: Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi

    Abstract: Humans have rich understanding of liquid containers and their contents; for example, we can effortlessly pour water from a pitcher to a cup. Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher. Very little attention in computer vision has been made to liquids and their containers. In thi… ▽ More

    Submitted 6 September, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

  12. arXiv:1612.08242  [pdf, other

    cs.CV

    YOLO9000: Better, Faster, Stronger

    Authors: Joseph Redmon, Ali Farhadi

    Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78… ▽ More

    Submitted 25 December, 2016; originally announced December 2016.

  13. arXiv:1612.06371  [pdf, other

    cs.CV

    Asynchronous Temporal Fields for Action Recognition

    Authors: Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta

    Abstract: Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for r… ▽ More

    Submitted 24 July, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

  14. arXiv:1612.00901  [pdf, other

    cs.CV cs.AI

    Commonly Uncommon: Semantic Sparsity in Situation Recognition

    Authors: Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi

    Abstract: Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objec… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

  15. arXiv:1611.06473  [pdf, other

    cs.CV

    LCNN: Lookup-based Convolutional Neural Network

    Authors: Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

    Abstract: Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for convolutional neural networks that enables efficient learning and inference. We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is t… ▽ More

    Submitted 12 June, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

    Comments: CVPR 17

  16. arXiv:1611.01603  [pdf, other

    cs.CL

    Bidirectional Attention Flow for Machine Comprehension

    Authors: Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni… ▽ More

    Submitted 21 June, 2018; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: Published as a conference paper at ICLR 2017

  17. arXiv:1609.05143  [pdf, other

    cs.CV

    Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

    Authors: Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

    Abstract: Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to the task of target-driven vis… ▽ More

    Submitted 16 September, 2016; originally announced September 2016.

  18. arXiv:1607.07429  [pdf, other

    cs.HC cs.CV

    Much Ado About Time: Exhaustive Annotation of Temporal Data

    Authors: Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

    Abstract: Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to perceive. In contrast, we investigate and determine the most cost-effective way of obtaining high-qu… ▽ More

    Submitted 2 October, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

    Comments: HCOMP 2016 Camera Ready

  19. arXiv:1606.04582  [pdf, other

    cs.CL cs.NE

    Query-Reduction Networks for Question Answering

    Authors: Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: In this paper, we study the problem of question answering when reasoning over multiple facts is required. We propose Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that effectively handles both short-term (local) and long-term (global) sequential dependencies to reason over multiple facts. QRN considers the context sentences as a sequence of state-changing triggers, and… ▽ More

    Submitted 24 February, 2017; v1 submitted 14 June, 2016; originally announced June 2016.

    Comments: Published as a conference paper at ICLR 2017. Title of the paper has changed from "Query-Regression Networks for Machine Comprehension"

  20. arXiv:1604.03650  [pdf, other

    cs.CV

    Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

    Authors: Junyuan Xie, Ross Girshick, Ali Farhadi

    Abstract: As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks for automatically converting 2D videos and images to stereoscopic 3D format. In contrast to previous automatic 2D-to-3D conversion algorithms, which have separate stages… ▽ More

    Submitted 13 April, 2016; originally announced April 2016.

  21. arXiv:1604.01753  [pdf, other

    cs.CV

    Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

    Authors: Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

    Abstract: Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So h… ▽ More

    Submitted 26 July, 2016; v1 submitted 6 April, 2016; originally announced April 2016.

  22. arXiv:1603.07396  [pdf, other

    cs.CV cs.AI

    A Diagram Is Worth A Dozen Images

    Authors: Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi

    Abstract: Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challengi… ▽ More

    Submitted 23 March, 2016; originally announced March 2016.

  23. arXiv:1603.05600  [pdf, other

    cs.CV

    "What happens if..." Learning to Predict the Effect of Forces in Images

    Authors: Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

    Abstract: What happens if one pushes a cup sitting on a table toward the edge of the table? How about pushing a desk against a wall? In this paper, we study the problem of understanding the movements of objects as a result of applying external forces to them. For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing… ▽ More

    Submitted 17 March, 2016; originally announced March 2016.

  24. arXiv:1603.05279  [pdf, other

    cs.CV

    XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

    Authors: Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi

    Abstract: We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32x memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This resu… ▽ More

    Submitted 2 August, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

  25. arXiv:1602.00753  [pdf, other

    cs.AI cs.CV

    Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

    Authors: Hessam Bagherinezhad, Hannaneh Hajishirzi, Ye** Choi, Ali Farhadi

    Abstract: Human vision greatly benefits from the information about sizes of objects. The role of size in several visual reasoning tasks has been thoroughly explored in human perception and cognition. However, the impact of the information about sizes of objects is yet to be determined in AI. We postulate that this is mainly attributed to the lack of a comprehensive repository of size information. In this pa… ▽ More

    Submitted 1 February, 2016; originally announced February 2016.

    Comments: To appear in AAAI 2016

  26. arXiv:1512.01325  [pdf, other

    cs.CV cs.AI cs.HC cs.IT cs.LG

    Toward a Taxonomy and Computational Models of Abnormalities in Images

    Authors: Babak Saleh, Ahmed Elgammal, Jacob Feldman, Ali Farhadi

    Abstract: The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to dis… ▽ More

    Submitted 4 December, 2015; originally announced December 2015.

    Comments: To appear in the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016)

  27. arXiv:1512.00795  [pdf, other

    cs.CV

    Actions ~ Transformations

    Authors: Xiaolong Wang, Ali Farhadi, Abhinav Gupta

    Abstract: What defines an action like "kicking ball"? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect). Motivated b… ▽ More

    Submitted 26 July, 2016; v1 submitted 2 December, 2015; originally announced December 2015.

  28. arXiv:1511.06335  [pdf, other

    cs.LG cs.CV

    Unsupervised Deep Embedding for Clustering Analysis

    Authors: Junyuan Xie, Ross Girshick, Ali Farhadi

    Abstract: Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grou** algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks.… ▽ More

    Submitted 24 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: icml2016

  29. arXiv:1511.04048  [pdf, other

    cs.CV

    Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

    Authors: Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

    Abstract: In this paper, we study the challenging problem of predicting the dynamics of objects in static images. Given a query object in an image, our goal is to provide a physical understanding of the object in terms of the forces acting upon it and its long term motion as response to those forces. Direct and explicit estimation of the forces and the motion of objects from a single image is extremely chal… ▽ More

    Submitted 12 November, 2015; originally announced November 2015.

  30. arXiv:1510.08973  [pdf, other

    cs.CV

    VISALOGY: Answering Visual Analogy Questions

    Authors: Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi

    Abstract: In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the map** from image A to image B and then extending the map** to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embe… ▽ More

    Submitted 30 October, 2015; originally announced October 2015.

    Comments: To appear in NIPS 2015

  31. arXiv:1509.08075  [pdf, other

    cs.CV

    Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

    Authors: Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Ye** Choi, Ali Farhadi

    Abstract: We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a high-quality segment-phrase table using minimal human supervision. More importantly, we demonstrate the unique value unleashed by… ▽ More

    Submitted 27 September, 2015; originally announced September 2015.

    Comments: 9 pages

  32. arXiv:1506.02640  [pdf, other

    cs.CV

    You Only Look Once: Unified, Real-Time Object Detection

    Authors: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

    Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detec… ▽ More

    Submitted 9 May, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

  33. arXiv:1411.6909  [pdf, other

    cs.CV

    Image Classification and Retrieval from User-Supplied Tags

    Authors: Hamid Izadinia, Ali Farhadi, Aaron Hertzmann, Matthew D. Hoffman

    Abstract: This paper proposes direct learning of image classification from user-supplied tags, without filtering. Each tag is supplied by the user who shared the image online. Enormous numbers of these tags are freely available online, and they give insight about the image categories important to users and to image classification. Our approach is complementary to the conventional approach of manual annotati… ▽ More

    Submitted 25 November, 2014; originally announced November 2014.

  34. arXiv:1411.2214  [pdf, other

    cs.CV

    Abnormal Object Recognition: A Comprehensive Study

    Authors: Babak Saleh, Ali Farhadi, Ahmed Elgammal

    Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviat… ▽ More

    Submitted 9 November, 2014; originally announced November 2014.

  35. arXiv:1210.4854  [pdf

    cs.CL cs.AI

    Semantic Understanding of Professional Soccer Commentaries

    Authors: Hannaneh Hajishirzi, Mohammad Rastegari, Ali Farhadi, Jessica K. Hodgins

    Abstract: This paper presents a novel approach to the problem of semantic parsing via learning the correspondences between complex sentences and rich sets of events. Our main intuition is that correct correspondences tend to occur more frequently. Our model benefits from a discriminative notion of similarity to learn the correspondence between sentence and an event and a ranking machinery that scores the po… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-326-335