-
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials
Authors:
Valentine Perrin,
Nathan Noiry,
Nicolas Loiseau,
Alex Nowak
Abstract:
Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods,…
▽ More
Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods, notably for binary and continuous endpoints, similar systematic empirical evaluation of subgroup analysis for time-to-event endpoints are lacking. This work aims to fill this gap by evaluating several subgroup analysis algorithms in the context of time-to-event outcomes, by means of three different research questions: Is there heterogeneity? What are the biomarkers responsible for such heterogeneity? Who are the good responders to treatment? In this context, we propose a new synthetic and semi-synthetic data generation process that allows one to explore a wide range of heterogeneity scenarios with precise control on the level of heterogeneity. We provide an open source Python package, available on Github, containing our generation process and our comprehensive benchmark framework. We hope this package will be useful to the research community for future investigations of heterogeneity of treatment effects and subgroup analysis methods benchmarking.
△ Less
Submitted 23 January, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training
Authors:
Aleksandra I. Nowak,
Bram Grooten,
Decebal Constantin Mocanu,
Jacek Tabor
Abstract:
Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training proces…
▽ More
Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their impact on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper
△ Less
Submitted 29 November, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery
Authors:
Mateusz Olko,
Michał Zając,
Aleksandra Nowak,
Nino Scherrer,
Yashas Annadani,
Stefan Bauer,
Łukasz Kuciński,
Piotr Miłoś
Abstract:
Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to…
▽ More
Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.
△ Less
Submitted 3 April, 2024; v1 submitted 24 November, 2022;
originally announced November 2022.
-
On the relationship between disentanglement and multi-task learning
Authors:
Łukasz Maziarka,
Aleksandra Nowak,
Maciej Wołczyk,
Andrzej Bedychaj
Abstract:
One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter…
▽ More
One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing. We perform a thorough empirical study of the representations obtained by neural networks trained on automatically generated supervised tasks. Using a set of standard metrics we show that disentanglement appears naturally during the process of multi-task neural network training.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Neural networks adapting to datasets: learning network size and topology
Authors:
Romuald A. Janik,
Aleksandra Nowak
Abstract:
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a standard gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset. The obtained networks can also be trained from scratch and achieve virtually identical performance. We explore the properties of the network a…
▽ More
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a standard gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset. The obtained networks can also be trained from scratch and achieve virtually identical performance. We explore the properties of the network architectures for a number of datasets of varying difficulty observing systematic regularities. The obtained graphs can be therefore understood as encoding nontrivial characteristics of the particular classification tasks.
△ Less
Submitted 15 July, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Analyzing Neural Networks Based on Random Graphs
Authors:
Romuald A. Janik,
Aleksandra Nowak
Abstract:
We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types. We investigate various structural and numerical properties of the graphs in relation to neural network test accuracy. We find that none of the classical numerical graph invariants by itself allows to single out the best networks. Consequently, we introduce a new numerical graph ch…
▽ More
We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types. We investigate various structural and numerical properties of the graphs in relation to neural network test accuracy. We find that none of the classical numerical graph invariants by itself allows to single out the best networks. Consequently, we introduce a new numerical graph characteristic that selects a set of quasi-1-dimensional graphs, which are a majority among the best performing networks. We also find that networks with primarily short-range connections perform better than networks which allow for many long-range connections. Moreover, many resolution reducing pathways are beneficial. We provide a dataset of 1020 graphs and the test accuracies of their corresponding neural networks at https://github.com/rmldj/random-graph-nn-paper
△ Less
Submitted 2 December, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
WICA: nonlinear weighted ICA
Authors:
Andrzej Bedychaj,
Przemysław Spurek,
Aleksandra Nowak,
Jacek Tabor
Abstract:
Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients fo…
▽ More
Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients for normally weighted data. In addition, authors propose a new baseline nonlinear mixing to perform comparable experiments, and a~reliable measure which allows fair comparison of nonlinear models. Our code for WICA is available on Github https://github.com/gmum/wica.
△ Less
Submitted 9 December, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Non-linear ICA based on Cramer-Wold metric
Authors:
Przemysław Spurek,
Aleksandra Nowak,
Jacek Tabor,
Łukasz Maziarka,
Stanisław Jastrzębski
Abstract:
Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed--form optimization target instead of a discriminator--based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foreg…
▽ More
Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed--form optimization target instead of a discriminator--based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foregoing the need for adversarial training.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Set Aggregation Network as a Trainable Pooling Layer
Authors:
Łukasz Maziarka,
Marek Śmieja,
Aleksandra Nowak,
Jacek Tabor,
Łukasz Struski,
Przemysław Spurek
Abstract:
Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to e…
▽ More
Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer.
△ Less
Submitted 25 November, 2019; v1 submitted 3 October, 2018;
originally announced October 2018.
-
Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function
Authors:
Wojciech Tarnowski,
Piotr Warchoł,
Stanisław Jastrzębski,
Jacek Tabor,
Maciej A. Nowak
Abstract:
We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum…
▽ More
We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization.
△ Less
Submitted 4 March, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks
Authors:
Alex Nowak,
Soledad Villar,
Afonso S. Bandeira,
Joan Bruna
Abstract:
Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution.
In this revised note, we are interested in studying another…
▽ More
Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution.
In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These 'planted solutions' are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense.
We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.
△ Less
Submitted 30 August, 2018; v1 submitted 22 June, 2017;
originally announced June 2017.
-
Signal from noise retrieval from one and two-point Green's function - comparison
Authors:
Zbigniew Drogosz,
Jerzy Jurkiewicz,
Grzegorz Łukaszewski,
Maciej A. Nowak
Abstract:
We compare two methods of eigen-inference from large sets of data, based on the analysis of one-point and two-point Green's functions, respectively. Our analysis points at the superiority of eigen-inference based on one-point Green's function. First, the applied by us method based on Pad?e approximants is orders of magnitude faster comparing to the eigen-inference based on uctuations (two-point Gr…
▽ More
We compare two methods of eigen-inference from large sets of data, based on the analysis of one-point and two-point Green's functions, respectively. Our analysis points at the superiority of eigen-inference based on one-point Green's function. First, the applied by us method based on Pad?e approximants is orders of magnitude faster comparing to the eigen-inference based on uctuations (two-point Green's functions). Second, we have identified the source of potential instability of the two-point Green's function method, as arising from the spurious zero and negative modes of the estimator for a variance operator of the certain multidimensional Gaussian distribution, inherent for the two-point Green's function eigen-inference method. Third, we have presented the cases of eigen-inference based on negative spectral moments, for strictly positive spectra. Finally, we have compared the cases of eigen-inference of real-valued and complex-valued correlated Wishart distributions, reinforcing our conclusions on an advantage of the one-point Green's function method.
△ Less
Submitted 9 January, 2015;
originally announced January 2015.