Skip to main content

Showing 1–4 of 4 results for author: Steitz, J O

.
  1. arXiv:2406.06820  [pdf, other

    cs.CV cs.LG

    Adapters Strike Back

    Authors: Jan-Martin O. Steitz, Stefan Roth

    Abstract: Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms, including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for usi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: To appear at CVPR 2024. Code: https://github.com/visinf/adapter_plus

  2. arXiv:2109.06082  [pdf, other

    cs.CL

    xGQA: Cross-Lingual Visual Question Answering

    Authors: Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych

    Abstract: Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically divers… ▽ More

    Submitted 17 March, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Findings of ACL 2022

  3. arXiv:2109.04422  [pdf, other

    cs.CV cs.CL

    TxT: Crossmodal End-to-End Learning with Transformers

    Authors: Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, Stefan Roth

    Abstract: Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual r… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: To appear at the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021

  4. arXiv:1810.02344  [pdf, other

    cs.CV

    Multi-view X-ray R-CNN

    Authors: Jan-Martin O. Steitz, Faraz Saeedan, Stefan Roth

    Abstract: Motivated by the detection of prohibited objects in carry-on luggage as a part of avionic security screening, we develop a CNN-based object detection approach for multi-view X-ray image data. Our contributions are two-fold. First, we introduce a novel multi-view pooling layer to perform a 3D aggregation of 2D CNN-features extracted from each view. To that end, our pooling layer exploits the known… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

    Comments: To appear at the 40th German Conference on Pattern Recognition (GCPR) 2018