Increasing Trust in Language Models through the Reuse of Verified Circuits
Authors:
Philip Quirke,
Clement Neo,
Fazl Barez
Abstract:
Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this sta…
▽ More
Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify an auto-regressive transformer model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into a larger untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.
△ Less
Submitted 16 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
Understanding Addition in Transformers
Authors:
Philip Quirke,
Fazl Barez
Abstract:
Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition. Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different…
▽ More
Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition. Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different positions within the digits. Furthermore, we identify a rare scenario characterized by high loss, which we explain. By thoroughly elucidating the model's algorithm, we provide new insights into its functioning. These findings are validated through rigorous testing and mathematical modeling, thereby contributing to the broader fields of model understanding and interpretability. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.
△ Less
Submitted 23 April, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
Fully transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study
Authors:
Sophia J. Wagner,
Daniel Reisenbüchler,
Nicholas P. West,
Jan Moritz Niehues,
Gregory Patrick Veldhuizen,
Philip Quirke,
Heike I. Grabsch,
Piet A. van den Brandt,
Gordon G. A. Hutchins,
Susan D. Richman,
Tanwei Yuan,
Rupert Langer,
Josien Christina Anna Jenniskens,
Kelly Offermans,
Wolfram Mueller,
Richard Gray,
Stephen B. Gruber,
Joel K. Greenson,
Gad Rennert,
Joseph D. Bonner,
Daniel Schmolze,
Jacqueline A. James,
Maurice B. Loughrey,
Manuel Salto-Tellez,
Hermann Brenner
, et al. (6 additional authors not shown)
Abstract:
Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but…
▽ More
Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but have not been used for biomarker prediction in cancer at a large scale. In addition, most DL approaches have been trained on small patient cohorts, which limits their clinical utility. Methods: In this study, we developed a new fully transformer-based pipeline for end-to-end biomarker prediction from pathology slides. We combine a pre-trained transformer encoder and a transformer network for patch aggregation, capable of yielding single and multi-target prediction at patient level. We train our pipeline on over 9,000 patients from 10 colorectal cancer cohorts. Results: A fully transformer-based approach massively improves the performance, generalizability, data efficiency, and interpretability as compared with current state-of-the-art algorithms. After training on a large multicenter cohort, we achieve a sensitivity of 0.97 with a negative predictive value of 0.99 for MSI prediction on surgical resection specimens. We demonstrate for the first time that resection specimen-only training reaches clinical-grade performance on endoscopic biopsy tissue, solving a long-standing diagnostic problem. Interpretation: A fully transformer-based end-to-end pipeline trained on thousands of pathology slides yields clinical-grade performance for biomarker prediction on surgical resections and biopsies. Our new methods are freely available under an open source license.
△ Less
Submitted 1 March, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.