Skip to main content

Showing 1–11 of 11 results for author: Hanna, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18403  [pdf, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2405.10254  [pdf, other

    eess.IV cs.CV cs.LG

    PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology

    Authors: George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D. Kunz, Juan A. Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, Matthew Hanna, Michal Zelechowski, Julian Viret, Neil Tenenholtz, James Hall, Nicolo Fusi, Razik Yousfi, Peter Hamilton, William A. Moye, Eugene Vorontsov, Siqi Liu, Thomas J. Fuchs

    Abstract: Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately.… ▽ More

    Submitted 22 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  3. arXiv:2403.17806  [pdf, other

    cs.LG cs.CL

    Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

    Authors: Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

    Abstract: Many recent language model (LM) interpretability studies have adopted the circuits framework, which aims to find the minimal computational subgraph, or circuit, that explains LM behavior on a given task. Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size. Edge attribution patching (EAP),… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  4. arXiv:2402.12486  [pdf, other

    cs.CL

    Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!

    Authors: Frank Wildenburg, Michael Hanna, Sandro Pezzelle

    Abstract: In everyday language use, speakers frequently utter and interpret sentences that are semantically underspecified, namely, whose content is insufficient to fully convey their message or interpret them univocally. For example, to interpret the underspecified sentence "Don't spend too much", which leaves implicit what (not) to spend, additional linguistic context or outside knowledge is needed. In th… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  5. arXiv:2310.15004  [pdf, other

    cs.CL

    When Language Models Fall in Love: Animacy Processing in Transformer Language Models

    Authors: Michael Hanna, Yonatan Belinkov, Sandro Pezzelle

    Abstract: Animacy - whether an entity is alive and sentient - is fundamental to cognitive processing, impacting areas such as memory, vision, and language. However, animacy is not always expressed directly in language: in English it often manifests indirectly, in the form of selectional constraints on verbs and adjectives. This poses a potential issue for transformer language models (LMs): they often train… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: To appear at EMNLP 2023

  6. arXiv:2310.12611  [pdf, other

    cs.CL cs.AI

    Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model

    Authors: Abhijith Chintam, Rahel Beloch, Willem Zuidema, Michael Hanna, Oskar van der Wal

    Abstract: Language models (LMs) exhibit and amplify many types of undesirable biases learned from the training data, including gender bias. However, we lack tools for effectively and efficiently changing this behavior without hurting general language modeling performance. In this paper, we study three methods for identifying causal relations between LM components and particular output: causal mediation anal… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted at BlackboxNLP 2023

  7. arXiv:2310.11282  [pdf, other

    cs.CL

    ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation

    Authors: Jaap Jumelet, Michael Hanna, Marianne de Heer Kloots, Anna Langedijk, Charlotte Pouw, Oskar van der Wal

    Abstract: We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Part of the BabyLM challenge at CoNLL

  8. arXiv:2309.07778  [pdf, other

    eess.IV cs.CV cs.LG q-bio.TO

    Virchow: A Million-Slide Digital Pathology Foundation Model

    Authors: Eugene Vorontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Siqi Liu, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, Philippe Mathieu, Alexander van Eck, Donghun Lee, Julian Viret, Eric Robert, Yi Kan Wang, Jeremy D. Kunz, Matthew C. H. Lee, Jan Bernhard, Ran A. Godrich, Gerard Oakley, Ewan Millar, Matthew Hanna, Juan Retamero , et al. (6 additional authors not shown)

    Abstract: The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computati… ▽ More

    Submitted 17 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  9. arXiv:2308.09180  [pdf, other

    cs.CV cs.AI

    How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?

    Authors: Gregory Holste, Ziyu Jiang, Ajay Jaiswal, Maria Hanna, Shlomo Minkowitz, Alan C. Legasto, Joanna G. Escalon, Sharon Steinberger, Mark Bittman, Thomas C. Shen, Ying Ding, Ronald M. Summers, George Shih, Yifan Peng, Zhangyang Wang

    Abstract: Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous impli… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Early accepted to MICCAI 2023

  10. arXiv:2305.00586  [pdf, other

    cs.CL cs.AI cs.LG

    How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

    Authors: Michael Hanna, Ollie Liu, Alexandre Variengien

    Abstract: Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As… ▽ More

    Submitted 2 November, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  11. Deep Multi-Magnification Networks for Multi-Class Breast Cancer Image Segmentation

    Authors: David Joon Ho, Dig V. K. Yarlagadda, Timothy M. D'Alfonso, Matthew G. Hanna, Anne Grabenstetter, Peter Ntiamoah, Edi Brogi, Lee K. Tan, Thomas J. Fuchs

    Abstract: Pathologic analysis of surgical excision specimens for breast carcinoma is important to evaluate the completeness of surgical excision and has implications for future treatment. This analysis is performed manually by pathologists reviewing histologic slides prepared from formalin-fixed tissue. In this paper, we present Deep Multi-Magnification Network trained by partial annotation for automated mu… ▽ More

    Submitted 4 January, 2021; v1 submitted 28 October, 2019; originally announced October 2019.

    Comments: Accepted at Computerized Medical Imaging and Graphics