-
MM-Lego: Modular Biomedical Multimodal Models with Minimal Fine-Tuning
Authors:
Konstantin Hemker,
Nikola Simidjievski,
Mateja Jamnik
Abstract:
Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are…
▽ More
Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a modular and general-purpose fusion and model merging framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for unimodal encoders that enforces lightweight dimensionality assumptions between modalities and harmonises their representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a model merging method which achieves competitive performance with end-to-end fusion models without any fine-tuning, 2) can operate on any unimodal encoder, and 3) is a model fusion method that, with minimal fine-tuning, achieves state-of-the-art results on six benchmarked multimodal biomedical tasks.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
HEALNet -- Hybrid Multi-Modal Fusion for Heterogeneous Biomedical Data
Authors:
Konstantin Hemker,
Nikola Simidjievski,
Mateja Jamnik
Abstract:
Technological advances in medical data collection such as high-resolution histopathology and high-throughput genomic sequencing have contributed to the rising requirement for multi-modal biomedical modelling, specifically for image, tabular, and graph data. Most multi-modal deep learning approaches use modality-specific architectures that are trained separately and cannot capture the crucial cross…
▽ More
Technological advances in medical data collection such as high-resolution histopathology and high-throughput genomic sequencing have contributed to the rising requirement for multi-modal biomedical modelling, specifically for image, tabular, and graph data. Most multi-modal deep learning approaches use modality-specific architectures that are trained separately and cannot capture the crucial cross-modal information that motivates the integration of different data sources. This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet): a flexible multi-modal fusion architecture, which a) preserves modality-specific structural information, b) captures the cross-modal interactions and structural information in a shared latent space, c) can effectively handle missing modalities during training and inference, and d) enables intuitive model inspection by learning on the raw data input instead of opaque embeddings. We conduct multi-modal survival analysis on Whole Slide Images and Multi-omic data on four cancer cohorts of The Cancer Genome Atlas (TCGA). HEALNet achieves state-of-the-art performance, substantially improving over both uni-modal and recent multi-modal baselines, whilst being robust in scenarios with missing modalities.
△ Less
Submitted 20 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
CGXplain: Rule-Based Deep Neural Network Explanations Using Dual Linear Programs
Authors:
Konstantin Hemker,
Zohreh Shams,
Mateja Jamnik
Abstract:
Rule-based surrogate models are an effective and interpretable way to approximate a Deep Neural Network's (DNN) decision boundaries, allowing humans to easily understand deep learning models. Current state-of-the-art decompositional methods, which are those that consider the DNN's latent space to extract more exact rule sets, manage to derive rule sets at high accuracy. However, they a) do not gua…
▽ More
Rule-based surrogate models are an effective and interpretable way to approximate a Deep Neural Network's (DNN) decision boundaries, allowing humans to easily understand deep learning models. Current state-of-the-art decompositional methods, which are those that consider the DNN's latent space to extract more exact rule sets, manage to derive rule sets at high accuracy. However, they a) do not guarantee that the surrogate model has learned from the same variables as the DNN (alignment), b) only allow to optimise for a single objective, such as accuracy, which can result in excessively large rule sets (complexity), and c) use decision tree algorithms as intermediate models, which can result in different explanations for the same DNN (stability). This paper introduces the CGX (Column Generation eXplainer) to address these limitations - a decompositional method using dual linear programming to extract rules from the hidden representations of the DNN. This approach allows to optimise for any number of objectives and empowers users to tweak the explanation model to their needs. We evaluate our results on a wide variety of tasks and show that CGX meets all three criteria, by having exact reproducibility of the explanation model that guarantees stability and reduces the rule set size by >80% (complexity) at equivalent or improved accuracy and fidelity across tasks (alignment).
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition
Authors:
Jan Ittner,
Lukasz Bolikowski,
Konstantin Hemker,
Ricardo Kennedy
Abstract:
We offer a new formalism for global explanations of pairwise feature dependencies and interactions in supervised models. Building upon SHAP values and SHAP interaction values, our approach decomposes feature contributions into synergistic, redundant and independent components (S-R-I decomposition of SHAP vectors). We propose a geometric interpretation of the components and formally prove its basic…
▽ More
We offer a new formalism for global explanations of pairwise feature dependencies and interactions in supervised models. Building upon SHAP values and SHAP interaction values, our approach decomposes feature contributions into synergistic, redundant and independent components (S-R-I decomposition of SHAP vectors). We propose a geometric interpretation of the components and formally prove its basic properties. Finally, we demonstrate the utility of synergy, redundancy and independence by applying them to a constructed data set and model.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Bending, Nanoindentation and Plasticity Noise in FCC single and poly crystals
Authors:
Ryder Bolin,
Hakan Yavas,
Hengxu Song,
Kevin J. Hemker,
Stefanos Papanikolaou
Abstract:
We present a high-throughput nanoindentation study of in-situ bending effects on incipient plastic deformation behavior of polycrystalline and single-crystalline pure aluminum and pure copper at ultra-nano depths (<200nm). We find that hardness displays a statistically inverse dependence on in-plane stress for indentation depths smaller than 10nm, and the dependence disappears for larger indentati…
▽ More
We present a high-throughput nanoindentation study of in-situ bending effects on incipient plastic deformation behavior of polycrystalline and single-crystalline pure aluminum and pure copper at ultra-nano depths (<200nm). We find that hardness displays a statistically inverse dependence on in-plane stress for indentation depths smaller than 10nm, and the dependence disappears for larger indentation depths. In addition, plastic noise in the nanoindentation force and displacement displays statistically robust noise features, independently of applied stresses. Our experimental results suggest the existence of a regime in FCC crystals where ultra-nano hardness is sensitive to residual applied stresses, but plasticity pop-in noise is insensitive to it.
△ Less
Submitted 21 October, 2019; v1 submitted 27 June, 2017;
originally announced June 2017.