Skip to main content

Showing 1–9 of 9 results for author: Alper, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16845  [pdf, other

    cs.CV cs.GR

    HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

    Authors: Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor

    Abstract: Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In constrained 3D domains, recent meth… ▽ More

    Submitted 14 February, 2024; originally announced April 2024.

    Comments: Eurographics 2024. Project page: https://tau-vailab.github.io/HaLo-NeRF/

  2. arXiv:2403.01306  [pdf, other

    cs.LG cs.CV

    ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

    Authors: Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes

    Abstract: Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concr… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024 (Finding). For Project webpage, see https://moranyanuka.github.io/icc/

  3. arXiv:2312.03631  [pdf, other

    cs.CV cs.AI

    Mitigating Open-Vocabulary Caption Hallucinations

    Authors: Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor

    Abstract: While recent years have seen rapid progress in image-conditioned text generation, image captioning still suffers from the fundamental issue of hallucinations, namely, the generation of spurious details that cannot be inferred from the given image. Existing methods largely use closed-vocabulary object lists to mitigate or evaluate hallucinations in image captioning, ignoring the long-tailed nature… ▽ More

    Submitted 19 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Website Link: https://assafbk.github.io/mocha/

  4. arXiv:2310.16781  [pdf, other

    cs.CV cs.CL cs.LG

    Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

    Authors: Morris Alper, Hadar Averbuch-Elor

    Abstract: Although the map** between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated… ▽ More

    Submitted 2 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023 (spotlight). Project webpage: https://kiki-bouba.github.io/

  5. arXiv:2309.14821  [pdf, other

    cs.DC cs.OS

    Expedited Data Transfers for Serverless Clouds

    Authors: Dmitrii Ustiugov, Shyam Jesalpura, Mert Bora Alper, Michal Baczun, Rustem Feyzkhanov, Edouard Bugnion, Boris Grot, Marios Kogias

    Abstract: Serverless computing has emerged as a popular cloud deployment paradigm. In serverless, the developers implement their application as a set of chained functions that form a workflow in which functions invoke each other. The cloud providers are responsible for automatically scaling the number of instances for each function on demand and forwarding the requests in a workflow to the appropriate funct… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: latest version

    MSC Class: 68 ACM Class: D.4.4

  6. arXiv:2304.14104  [pdf, other

    cs.CV cs.CL cs.LG

    Learning Human-Human Interactions in Images from Weak Textual Supervision

    Authors: Morris Alper, Hadar Averbuch-Elor

    Abstract: Interactions between humans are diverse and context-dependent, but previous works have treated them as categorical, disregarding the heavy tail of possible interactions. We propose a new paradigm of learning human-human interactions as free text from a single still image, allowing for flexibility in modeling the unlimited space of situations and relationships between people. To overcome the absenc… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: To be presented at ICCV 2023. Project webpage: https://learning-interactions.github.io

  7. arXiv:2303.12513  [pdf, other

    cs.CV cs.CL cs.LG

    Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

    Authors: Morris Alper, Michael Fiman, Hadar Averbuch-Elor

    Abstract: Most humans use visual imagination to understand and reason about language, but models such as BERT reason about language using knowledge acquired during text-only pretraining. In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. We propose a suite of… ▽ More

    Submitted 2 November, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023. Project webpage: https://isbertblind.github.io/

  8. arXiv:2206.08874  [pdf, other

    cs.RO

    SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

    Authors: Ayush Gupta, Ekaterina Dorzhieva, Ahmed Baza, Mert Alper, Aleksey Fedoseev, Dzmitry Tsetserukou

    Abstract: Heterogeneous teams of mobile robots and UAVs are offering a substantial benefit in an autonomous exploration of the environment. Nevertheless, although joint exploration scenarios for such systems are widely discussed, they are still suffering from low adaptability to changes in external conditions and faults of swarm agents during the UAV docking. We propose a novel vision-based drone swarm dock… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted paper at IEEE International Conference on Unmanned Aircraft System (ICUAS 2022), IEEE copyright

  9. arXiv:2206.08856  [pdf, other

    cs.RO

    SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

    Authors: Ayush Gupta, Ahmed Baza, Ekaterina Dorzhieva, Mert Alper, Mariia Makarova, Stepan Perminov, Aleksey Fedoseev, Dzmitry Tsetserukou

    Abstract: The paper focuses on a heterogeneous swarm of drones to achieve a dynamic landing of formation on a moving robot. This challenging task was not yet achieved by scientists. The key technology is that instead of facilitating each agent of the swarm of drones with computer vision that considerably increases the payload and shortens the flight time, we propose to install only one camera on the leader… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted paper at IEEE Vehicular Technology Conference 2022 (IEEE VTC 2022), IEEE copyright