Skip to main content

Showing 1–25 of 25 results for author: Al-Halah, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.04760  [pdf, other

    cs.CV cs.SD eess.AS

    Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

    Authors: Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

    Abstract: We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. Our method uses a masked auto-encoding framework to synthesize masked binaural (multi-channel) audio through the synergy of audio and vision, thereby learning useful spatial relationships between the two modalities. We use our pretrained features to tackle two downst… ▽ More

    Submitted 5 May, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: Accepted to CVPR 2024

  2. arXiv:2306.15850  [pdf, other

    cs.CV

    SpotEM: Efficient Video Search for Episodic Memory

    Authors: Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve effici… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Published in ICML 2023

  3. A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Authors: Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee , et al. (22 additional authors not shown)

    Abstract: Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: To appear in Neural Networks

  4. arXiv:2301.00746  [pdf, other

    cs.CV

    NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

    Authors: Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window output… ▽ More

    Submitted 25 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: 13 pages, 7 figures, appearing in CVPR 2023

  5. arXiv:2206.04006  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Few-Shot Audio-Visual Learning of Environment Acoustics

    Authors: Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman

    Abstract: Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed… ▽ More

    Submitted 24 November, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

  6. arXiv:2202.02440  [pdf, other

    cs.CV cs.AI cs.LG

    Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

    Authors: Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman

    Abstract: In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality.… ▽ More

    Submitted 28 April, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: CVPR 2022. Project page: https://vision.cs.utexas.edu/projects/zsel/

  7. arXiv:2201.10029  [pdf, other

    cs.CV cs.AI

    PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

    Authors: Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

    Abstract: State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?' for an object and `how to navigate to (x, y)?'. Our key insight is that… ▽ More

    Submitted 17 June, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 8 pages + supplementary. Accepted in CVPR 2022

  8. arXiv:2105.07142  [pdf, other

    cs.CV cs.LG cs.RO cs.SD eess.AS

    Move2Hear: Active Audio-Visual Source Separation

    Authors: Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

    Abstract: We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and it must use its eyes and ears to automatically separate out the sounds originating fro… ▽ More

    Submitted 25 August, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

    Comments: Accepted to ICCV 2021

  9. arXiv:2102.02337  [pdf, other

    cs.CV

    Environment Predictive Coding for Embodied Agents

    Authors: Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

    Abstract: We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of images gathered by an agent as it moves about in 3D environments. We learn these representations via a zone prediction task, where we intelligently mask out porti… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: 9 pages, 6 figures, appendix

  10. arXiv:2012.11583  [pdf, other

    cs.CV cs.LG cs.RO cs.SD eess.AS

    Semantic Audio-Visual Navigation

    Authors: Changan Chen, Ziad Al-Halah, Kristen Grauman

    Abstract: Recent work on audio-visual navigation assumes a constantly-sounding target and restricts the role of audio to signaling the target's position. We introduce semantic audio-visual navigation, where objects in the environment make sounds consistent with their semantic meaning (e.g., toilet flushing, door creaking) and acoustic events are sporadic or short in duration. We propose a transformer-based… ▽ More

    Submitted 6 April, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: Project page: http://vision.cs.utexas.edu/projects/semantic-audio-visual-navigation

  11. arXiv:2011.09663  [pdf, other

    cs.CV cs.MM

    Modeling Fashion Influence from Photos

    Authors: Ziad Al-Halah, Kristen Grauman

    Abstract: The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively. We propose to discover and quantify fashion influences from catalog and social media photos. We explore fashion influence along two channels: geolocation and fashion brands. We introduce an approach that detects which of these entities influence which other entities in ter… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: To appear in the IEEE Transactions on Multimedia, 2020. Project page: https://www.cs.utexas.edu/~ziad/influence_from_photos.html. arXiv admin note: substantial text overlap with arXiv:2004.01316

  12. arXiv:2008.09622  [pdf, other

    cs.CV cs.AI cs.LG cs.RO cs.SD

    Learning to Set Waypoints for Audio-Visual Navigation

    Authors: Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

    Abstract: In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navig… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: Accepted to ICLR 2021

  13. arXiv:2008.09285  [pdf, other

    cs.CV

    Occupancy Anticipation for Efficient Exploration and Navigation

    Authors: Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awarene… ▽ More

    Submitted 25 August, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted in ECCV 2020. 19 pages, 6 figures, appendix at end

  14. arXiv:2005.01616  [pdf, other

    cs.CV cs.SD eess.AS

    VisualEchoes: Spatial Image Representation Learning through Echolocation

    Authors: Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

    Abstract: Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world. We explore the spatial cues contained in echoes and how they can benefit vision tasks that require spatial reasoning. First we capture echo responses in photo-realistic 3D… ▽ More

    Submitted 17 July, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Appears in ECCV 2020

  15. arXiv:2004.01316  [pdf, other

    cs.CV cs.SI

    From Paris to Berlin: Discovering Fashion Style Influences Around the World

    Authors: Ziad Al-Halah, Kristen Grauman

    Abstract: The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively. We propose to discover and quantify fashion influences from everyday images of people wearing clothes. We introduce an approach that detects which cities influence which other cities in terms of propagating their styles. We then leverage the discovered influence patterns t… ▽ More

    Submitted 8 August, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: CVPR 2020. Project page: https://www.cs.utexas.edu/~ziad/fashion_influence.html

  16. arXiv:1912.11474  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    SoundSpaces: Audio-Visual Navigation in 3D Environments

    Authors: Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

    Abstract: Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object. We propose a multi-modal deep reinforcement… ▽ More

    Submitted 21 August, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

    Comments: Accepted to ECCV 2020 (Spotlight). Project page: http://vision.cs.utexas.edu/projects/audio_visual_navigation/

  17. arXiv:1907.06160  [pdf, other

    cs.CV

    Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis

    Authors: Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero

    Abstract: Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that… ▽ More

    Submitted 8 August, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: International Conference on Computer Vision (ICCV 2019) Workshops. Project page and the Visual Smiley Dataset: https://www.cs.utexas.edu/~ziad/emoji_visual_sentiment.html

  18. arXiv:1905.12794  [pdf, other

    cs.CV

    Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

    Authors: Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, Rogerio Feris

    Abstract: Conversational interfaces for the detail-oriented retail fashion domain are more natural, expressive, and user friendly than classical keyword-based search interfaces. In this paper, we introduce the Fashion IQ dataset to support and advance research on interactive fashion image retrieval. Fashion IQ is the first fashion dataset to provide human-generated captions that distinguish similar pairs of… ▽ More

    Submitted 25 November, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

  19. arXiv:1812.00202  [pdf, other

    cs.CV

    Traversing the Continuous Spectrum of Image Retrieval with Deep Dynamic Models

    Authors: Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal

    Abstract: We introduce the first work to tackle the image retrieval problem as a continuous operation. While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex. Image similarity goes beyond this discrete vantage point and spans a continuous spectrum amo… ▽ More

    Submitted 31 March, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

  20. arXiv:1810.12819  [pdf, other

    cs.CV

    Informed Democracy: Voting-based Novelty Detection for Action Recognition

    Authors: Alina Roitberg, Ziad Al-Halah, Rainer Stiefelhagen

    Abstract: Novelty detection is crucial for real-life applications. While it is common in activity recognition to assume a closed-set setting, i.e. test samples are always of training categories, this assumption is impractical in a real-world scenario. Test samples can be of various categories including those never seen before during training. Thus, being able to know what we know and what we do not know is… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Published in BMVC 2018. First and second authors contributed equally to this work

  21. arXiv:1705.06394  [pdf, other

    cs.CV

    Fashion Forward: Forecasting Visual Style in Fashion

    Authors: Ziad Al-Halah, Rainer Stiefelhagen, Kristen Grauman

    Abstract: What is the future of fashion? Tackling this question from a data-driven vision perspective, we propose to forecast visual style trends before they occur. We introduce the first approach to predict the future popularity of styles discovered from fashion images in an unsupervised manner. Using these styles as a basis, we train a forecasting model to represent their trends over time. The resulting m… ▽ More

    Submitted 8 August, 2020; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: ICCV 2017. Project page: https://cvhci.anthropomatik.kit.edu/~zalhalah/prj_fashion_forecast.html

  22. arXiv:1704.03607  [pdf, other

    cs.CV

    Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

    Authors: Ziad Al-Halah, Rainer Stiefelhagen

    Abstract: Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily opti… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.

    Comments: Accepted as a conference paper at CVPR 2017

  23. arXiv:1611.07573  [pdf, other

    cs.CV

    Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning

    Authors: Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

    Abstract: The Earth Mover's Distance (EMD) computes the optimal cost of transforming one distribution into another, given a known transport metric between them. In deep learning, the EMD loss allows us to embed information during training about the output space structure like hierarchical or semantic relations. This helps in achieving better output smoothness and generalization. However EMD is computational… ▽ More

    Submitted 22 November, 2016; originally announced November 2016.

  24. arXiv:1610.04787  [pdf, other

    cs.CV

    Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

    Authors: Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

    Abstract: Collecting training images for all visual categories is not only expensive but also impractical. Zero-shot learning (ZSL), especially using attributes, offers a pragmatic solution to this problem. However, at test time most attribute-based methods require a full description of attribute associations for each unseen class. Providing these associations is time consuming and often requires domain spe… ▽ More

    Submitted 15 October, 2016; originally announced October 2016.

    Comments: Published as a conference paper at CVPR 2016

  25. How to Transfer? Zero-Shot Object Recognition via Hierarchical Transfer of Semantic Attributes

    Authors: Ziad Al-Halah, Rainer Stiefelhagen

    Abstract: Attribute based knowledge transfer has proven very successful in visual object analysis and learning previously unseen classes. However, the common approach learns and transfers attributes without taking into consideration the embedded structure between the categories in the source set. Such information provides important cues on the intra-attribute variations. We propose to capture these variatio… ▽ More

    Submitted 1 April, 2016; originally announced April 2016.

    Comments: Published as a conference paper at WACV 2015, modifications include new results with GoogLeNet features