Skip to main content

Showing 1–7 of 7 results for author: Sharir, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2204.11479  [pdf, other

    cs.SD cs.CV eess.AS

    End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network

    Authors: Avi Gazneli, Gadi Zimerman, Tal Ridnik, Gilad Sharir, Asaf Noy

    Abstract: While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio… ▽ More

    Submitted 5 July, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

  2. arXiv:2111.12933  [pdf, other

    cs.CV cs.LG

    ML-Decoder: Scalable and Versatile Classification Head

    Authors: Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, Asaf Noy

    Abstract: In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to u… ▽ More

    Submitted 31 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

  3. arXiv:2109.12499  [pdf, other

    cs.CV cs.LG

    PETA: Photo Albums Event Recognition using Transformers Attention

    Authors: Tamar Glaser, Emanuel Ben-Baruch, Gilad Sharir, Nadav Zamir, Asaf Noy, Lihi Zelnik-Manor

    Abstract: In recent years the amounts of personal photos captured increased significantly, giving rise to new challenges in multi-image understanding and high-level image understanding. Event recognition in personal photo albums presents one challenging scenario where life events are recognized from a disordered collection of images, including both relevant and irrelevant images. Event recognition in images… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: 8 pages, 10 including references, 3 figures, was submitted to WACV 2022

  4. arXiv:2103.13915  [pdf, other

    cs.CV

    An Image is Worth 16x16 Words, What is a Video Worth?

    Authors: Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor

    Abstract: Leading methods in the domain of action recognition try to distill information from both the spatial and temporal dimensions of an input video. Methods that reach State of the Art (SotA) accuracy, usually make use of 3D convolution layers as a way to abstract the temporal information from video frames. The use of such convolutions requires sampling short clips from the input video, where each clip… ▽ More

    Submitted 27 May, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  5. arXiv:2003.13630  [pdf, other

    cs.CV cs.LG eess.IV

    TResNet: High Performance GPU-Dedicated Architecture

    Authors: Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir, Itamar Friedman

    Abstract: Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are often seen as a proxy for network efficiency, when measuring actual GPU training and inference throughput, vanilla ResNet50 is usually significantly faster than its recent competitors, offering better throughput-accuracy trade-off. In this work… ▽ More

    Submitted 27 August, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

    Comments: 11 pages, 5 figures

  6. arXiv:1912.11850  [pdf, other

    cs.CV

    Graph Embedded Pose Clustering for Anomaly Detection

    Authors: Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, Shai Avidan

    Abstract: We propose a new method for anomaly detection of human actions. Our method works directly on human pose graphs that can be computed from an input video sequence. This makes the analysis independent of nuisance parameters such as viewpoint or illumination. We map these graphs to a latent space and cluster them. Each action is then represented by its soft-assignment to each of the clusters. This giv… ▽ More

    Submitted 10 April, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

    Comments: Code is available at https://github.com/amirmk89/gepc. CVPR 2020

  7. arXiv:1707.06545  [pdf, other

    cs.CV

    Video Object Segmentation using Tracked Object Proposals

    Authors: Gilad Sharir, Eddie Smolyansky, Itamar Friedman

    Abstract: We present an approach to semi-supervised video object segmentation, in the context of the DAVIS 2017 challenge. Our approach combines category-based object detection, category-independent object appearance segmentation and temporal object tracking. We are motivated by the fact that the objects semantic category tends not to change throughout the video while its appearance and location can vary co… ▽ More

    Submitted 20 July, 2017; originally announced July 2017.

    Comments: All authors contributed equally, CVPR-2017 workshop, DAVIS-2017 Challenge