Skip to main content

Showing 1–1 of 1 results for author: Bania, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03586  [pdf, other

    cs.CV cs.AI cs.LG

    CountCLIP -- [Re] Teaching CLIP to Count to Ten

    Authors: Harshvardhan Mestha, Tejas Agrawal, Karan Bania, Shreyas V, Yash Bhisikar

    Abstract: Large vision-language models (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good counting-aware representation. This paper conducts a reproducibility study of 'Teaching CLIP to Count to Ten' (Paiss et al., 2023), which presents a method… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.