Skip to main content

Showing 1–23 of 23 results for author: Geirhos, R

.
  1. arXiv:2407.07530  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    How Aligned are Different Alignment Metrics?

    Authors: Jannis Ahlert, Thomas Klein, Felix Wichmann, Robert Geirhos

    Abstract: In recent years, various methods and benchmarks have been proposed to empirically evaluate the alignment of artificial neural networks to human neural and behavioral data. But how aligned are different alignment metrics? To answer this question, we analyze visual data from Brain-Score (Schrimpf et al., 2018), including metrics from the model-vs-human toolbox (Geirhos et al., 2021), together with h… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to the ICLR 2024 Workshop on Representational Alignment (Re-Align)

  2. arXiv:2403.09193  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

    Authors: Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuper

    Abstract: Vision language models (VLMs) have drastically changed the computer vision model landscape in only a few years, opening an exciting array of new applications from zero-shot image classification, over to image captioning, and visual question answering. Unlike pure vision models, they offer an intuitive way to access visual content through language prompting. The wide applicability of such models en… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2312.05355  [pdf, ps, other

    cs.LG cs.CV q-bio.NC

    Neither hype nor gloom do DNNs justice

    Authors: Felix A. Wichmann, Simon Kornblith, Robert Geirhos

    Abstract: Neither the hype exemplified in some exaggerated claims about deep neural networks (DNNs), nor the gloom expressed by Bowers et al. do DNNs as models in vision science justice: DNNs rapidly evolve, and today's limitations are often tomorrow's successes. In addition, providing explanations as well as prediction and image-computability are model desiderata; one should not be favoured at the expense… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Preprint version of a commentary published by Behavioral and Brain Sciences (https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/neither-hype-nor-gloom-do-dnns-justice/639AA5BC7F6E3B91E9B9EC8463D39F77)

  4. arXiv:2310.13018  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Getting aligned on representational alignment

    Authors: Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell , et al. (5 additional authors not shown)

    Abstract: Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Working paper, changes to be made in upcoming revisions

  5. arXiv:2309.16779  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC stat.ML

    Intriguing properties of generative classifiers

    Authors: Priyank Jaini, Kevin Clark, Robert Geirhos

    Abstract: What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysic… ▽ More

    Submitted 14 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Spotlight

  6. arXiv:2307.06304  [pdf, other

    cs.CV cs.AI cs.LG

    Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

    Authors: Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby

    Abstract: The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  7. arXiv:2306.04719  [pdf, other

    cs.CV cs.AI cs.HC cs.LG q-bio.NC

    Don't trust your eyes: on the (un)reliability of feature visualizations

    Authors: Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim

    Abstract: How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We star… ▽ More

    Submitted 6 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICML 2024 camera ready version

  8. arXiv:2305.17023  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Are Deep Neural Networks Adequate Behavioural Models of Human Visual Perception?

    Authors: Felix A. Wichmann, Robert Geirhos

    Abstract: Deep neural networks (DNNs) are machine learning algorithms that have revolutionised computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. We here review evidence regarding current DNNs as adequate behavioural mo… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Preprint version of article accepted by Annual Review of Vision Science (https://www.annualreviews.org/doi/abs/10.1146/annurev-vision-120522-031739). Posted with permission from the Annual Review of Vision Science, Volume 9 by Annual Reviews, http://www.annualreviews.org

  9. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  10. arXiv:2206.14486  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Beyond neural scaling laws: beating power law scaling via data pruning

    Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

    Abstract: Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scal… ▽ More

    Submitted 21 April, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric scores

  11. arXiv:2205.10144  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks

    Authors: Lukas S. Huber, Robert Geirhos, Felix A. Wichmann

    Abstract: In laboratory object recognition tasks based on undistorted photographs, both adult humans and Deep Neural Networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last two years have seen impressive gains in DNN… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Manuscript under review at Journal of Vision

  12. arXiv:2110.05922  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Trivial or impossible -- dichotomous data difficulty masks model differences (on ImageNet and beyond)

    Authors: Kristof Meding, Luca M. Schulze Buschoff, Robert Geirhos, Felix A. Wichmann

    Abstract: "The power of a generalization system follows directly from its biases" (Mitchell 1980). Today, CNNs are incredibly powerful generalisation systems -- but to what degree have we understood how their inductive bias influences model decisions? We here attempt to disentangle the various aspects that determine how a model decides. In particular, we ask: what makes one model decide differently from ano… ▽ More

    Submitted 27 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  13. arXiv:2106.12447  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

    Authors: Roland S. Zimmermann, Judy Borowski, Robert Geirhos, Matthias Bethge, Thomas S. A. Wallis, Wieland Brendel

    Abstract: A precise understanding of why units in an artificial network respond to certain stimuli would constitute a big step towards explainable artificial intelligence. One widely used approach towards this goal is to visualize unit responses via activation maximization. These synthetic feature visualizations are purported to provide humans with precise information about the image features that cause a u… ▽ More

    Submitted 12 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Presented at NeurIPS 2021. Shared first and last authorship. Project website at https://brendel-group.github.io/causal-understanding-via-visualizations/

  14. arXiv:2106.07411  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Partial success in closing the gap between human and machine vision

    Authors: Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian Thieringer, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

    Abstract: A few years ago, the first CNN surpassed human performance on ImageNet. However, it soon became clear that machines lack robustness on more challenging test cases, a major obstacle towards deploying machines "in the wild" and towards obtaining better computational models of human visual perception. Here we ask: Are we making progress in closing the gap between human and machine vision? To answer t… ▽ More

    Submitted 25 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Oral, camera ready version. A preliminary version of this work was presented as Oral at the 2020 NeurIPS workshop on "Shared Visual Representations in Human & Machine Intelligence" (arXiv:2010.08377)

  15. arXiv:2010.12606  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization

    Authors: Judy Borowski, Roland S. Zimmermann, Judith Schepers, Robert Geirhos, Thomas S. A. Wallis, Matthias Bethge, Wieland Brendel

    Abstract: Feature visualizations such as synthetic maximally activating images are a widely used explanation method to better understand the information processing of convolutional neural networks (CNNs). At the same time, there are concerns that these visualizations might not accurately represent CNNs' inner workings. Here, we measure how much extremely activating images help humans to predict CNN activati… ▽ More

    Submitted 2 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Published at ICLR 2021. Joint first and last authors. Code is available at https://bethgelab.github.io/testing_visualizations/

  16. arXiv:2010.08377  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    On the surprising similarities between supervised and self-supervised models

    Authors: Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

    Abstract: How do humans learn to acquire a powerful, flexible and robust representation of objects? While much of this process remains unknown, it is clear that humans do not require millions of object labels. Excitingly, recent algorithmic advancements in self-supervised learning now enable convolutional neural networks (CNNs) to learn useful visual object representations without supervised labels, too. In… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  17. arXiv:2006.16736  [pdf, other

    cs.CV cs.LG q-bio.NC q-bio.QM

    Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency

    Authors: Robert Geirhos, Kristof Meding, Felix A. Wichmann

    Abstract: A central problem in cognitive science and behavioural neuroscience as well as in machine learning and artificial intelligence research is to ascertain whether two or more decision makers (be they brains or algorithms) use the same strategy. Accuracy alone cannot distinguish between strategies: two systems may achieve similar accuracy with very different strategies. The need to differentiate beyon… ▽ More

    Submitted 18 December, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 camera ready

  18. arXiv:2004.07780  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Shortcut Learning in Deep Neural Networks

    Authors: Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix A. Wichmann

    Abstract: Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distill how many of deep learning's problems can be seen as different symptoms of the same underlying… ▽ More

    Submitted 21 November, 2023; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: perspective article published at Nature Machine Intelligence (https://doi.org/10.1038/s42256-020-00257-z)

  19. arXiv:1907.07484  [pdf, other

    cs.CV cs.LG stat.ML

    Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming

    Authors: Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S. Ecker, Matthias Bethge, Wieland Brendel

    Abstract: The ability to detect objects regardless of image distortions or weather conditions is crucial for real-world applications of deep learning like autonomous driving. We here provide an easy-to-use benchmark to assess how object detection models perform when image quality degrades. The three resulting benchmark datasets, termed Pascal-C, Coco-C and Cityscapes-C, contain a large variety of image corr… ▽ More

    Submitted 31 March, 2020; v1 submitted 17 July, 2019; originally announced July 2019.

    Comments: 21 pages, 10 figures, 1 dragon

  20. arXiv:1905.07234  [pdf, other

    cs.LG stat.ML

    Comparison-Based Framework for Psychophysics: Lab versus Crowdsourcing

    Authors: Siavash Haghiri, Patricia Rubisch, Robert Geirhos, Felix Wichmann, Ulrike von Luxburg

    Abstract: Traditionally, psychophysical experiments are conducted by repeated measurements on a few well-trained participants under well-controlled conditions, often resulting in, if done properly, high quality data. In recent years, however, crowdsourcing platforms are becoming increasingly popular means of data collection, measuring many participants at the potential cost of obtaining data of worse qualit… ▽ More

    Submitted 26 July, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

  21. arXiv:1811.12231  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC stat.ML

    ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

    Authors: Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

    Abstract: Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs ar… ▽ More

    Submitted 9 November, 2022; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Accepted at ICLR 2019 (oral)

  22. arXiv:1808.08750  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC stat.ML

    Generalisation in humans and deep neural networks

    Authors: Robert Geirhos, Carlos R. Medina Temme, Jonas Rauber, Heiko H. Schütt, Matthias Bethge, Felix A. Wichmann

    Abstract: We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns b… ▽ More

    Submitted 23 October, 2020; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: Added optimal probability aggregation method to appendix

  23. arXiv:1706.06969  [pdf, other

    cs.CV q-bio.NC stat.ML

    Comparing deep neural networks against humans: object recognition when the signal gets weaker

    Authors: Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, Felix A. Wichmann

    Abstract: Human visual object recognition is typically rapid and seemingly effortless, as well as largely independent of viewpoint and object orientation. Until very recently, animate visual systems were the only ones capable of this remarkable computational feat. This has changed with the rise of a class of computer vision algorithms called deep neural networks (DNNs) that achieve human-level classificatio… ▽ More

    Submitted 11 December, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: updated article with reference to resulting publication (Geirhos et al, NeurIPS 2018)