Skip to main content

Showing 1–1 of 1 results for author: Yassin, Y

.
  1. arXiv:2405.13985  [pdf, other

    cs.CV

    LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

    Authors: Anthony Fuller, Daniel G. Kyrollos, Yousef Yassin, James R. Green

    Abstract: High-resolution images offer more information about scenes that can improve model accuracy. However, the dominant model architecture in computer vision, the vision transformer (ViT), cannot effectively leverage larger images without finetuning -- ViTs poorly extrapolate to more patches at test time, although transformers offer sequence length flexibility. We attribute this shortcoming to the curre… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.