Search | arXiv e-print repository

Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials

Authors: Valentine Perrin, Nathan Noiry, Nicolas Loiseau, Alex Nowak

Abstract: Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods,… ▽ More Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods, notably for binary and continuous endpoints, similar systematic empirical evaluation of subgroup analysis for time-to-event endpoints are lacking. This work aims to fill this gap by evaluating several subgroup analysis algorithms in the context of time-to-event outcomes, by means of three different research questions: Is there heterogeneity? What are the biomarkers responsible for such heterogeneity? Who are the good responders to treatment? In this context, we propose a new synthetic and semi-synthetic data generation process that allows one to explore a wide range of heterogeneity scenarios with precise control on the level of heterogeneity. We provide an open source Python package, available on Github, containing our generation process and our comprehensive benchmark framework. We hope this package will be useful to the research community for future investigations of heterogeneity of treatment effects and subgroup analysis methods benchmarking. △ Less

Submitted 23 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 9 pages, 8 figures, 2 tables. Code available at https://github.com/owkin/hte . Comments are welcome!

arXiv:2306.12230 [pdf, other]

Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Authors: Aleksandra I. Nowak, Bram Grooten, Decebal Constantin Mocanu, Jacek Tabor

Abstract: Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training proces… ▽ More Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their impact on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper △ Less

Submitted 29 November, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2211.13715 [pdf, other]

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery

Authors: Mateusz Olko, Michał Zając, Aleksandra Nowak, Nino Scherrer, Yashas Annadani, Stefan Bauer, Łukasz Kuciński, Piotr Miłoś

Abstract: Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to… ▽ More Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime. △ Less

Submitted 3 April, 2024; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2110.03498 [pdf, other]

On the relationship between disentanglement and multi-task learning

Authors: Łukasz Maziarka, Aleksandra Nowak, Maciej Wołczyk, Andrzej Bedychaj

Abstract: One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter… ▽ More One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing. We perform a thorough empirical study of the representations obtained by neural networks trained on automatically generated supervised tasks. Using a set of standard metrics we show that disentanglement appears naturally during the process of multi-task neural network training. △ Less

Submitted 7 October, 2021; originally announced October 2021.

arXiv:2006.12195 [pdf, other]

Neural networks adapting to datasets: learning network size and topology

Authors: Romuald A. Janik, Aleksandra Nowak

Abstract: We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a standard gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset. The obtained networks can also be trained from scratch and achieve virtually identical performance. We explore the properties of the network a… ▽ More We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a standard gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset. The obtained networks can also be trained from scratch and achieve virtually identical performance. We explore the properties of the network architectures for a number of datasets of varying difficulty observing systematic regularities. The obtained graphs can be therefore understood as encoding nontrivial characteristics of the particular classification tasks. △ Less

Submitted 15 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Fixed blank page

arXiv:2002.08104 [pdf, other]

Analyzing Neural Networks Based on Random Graphs

Authors: Romuald A. Janik, Aleksandra Nowak

Abstract: We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types. We investigate various structural and numerical properties of the graphs in relation to neural network test accuracy. We find that none of the classical numerical graph invariants by itself allows to single out the best networks. Consequently, we introduce a new numerical graph ch… ▽ More We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types. We investigate various structural and numerical properties of the graphs in relation to neural network test accuracy. We find that none of the classical numerical graph invariants by itself allows to single out the best networks. Consequently, we introduce a new numerical graph characteristic that selects a set of quasi-1-dimensional graphs, which are a majority among the best performing networks. We also find that networks with primarily short-range connections perform better than networks which allow for many long-range connections. Moreover, many resolution reducing pathways are beneficial. We provide a dataset of 1020 graphs and the test accuracies of their corresponding neural networks at https://github.com/rmldj/random-graph-nn-paper △ Less

Submitted 2 December, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Added new results and discussion

arXiv:2001.04147 [pdf, other]

WICA: nonlinear weighted ICA

Authors: Andrzej Bedychaj, Przemysław Spurek, Aleksandra Nowak, Jacek Tabor

Abstract: Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients fo… ▽ More Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients for normally weighted data. In addition, authors propose a new baseline nonlinear mixing to perform comparable experiments, and a~reliable measure which allows fair comparison of nonlinear models. Our code for WICA is available on Github https://github.com/gmum/wica. △ Less

Submitted 9 December, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

arXiv:1903.00201 [pdf, other]

doi 10.1007/978-3-030-63836-8_25

Non-linear ICA based on Cramer-Wold metric

Authors: Przemysław Spurek, Aleksandra Nowak, Jacek Tabor, Łukasz Maziarka, Stanisław Jastrzębski

Abstract: Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed--form optimization target instead of a discriminator--based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foreg… ▽ More Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed--form optimization target instead of a discriminator--based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foregoing the need for adversarial training. △ Less

Submitted 1 March, 2019; originally announced March 2019.

Journal ref: Neural Information Processing. ICONIP 2020

arXiv:1810.01868 [pdf, other]

doi 10.1007/978-3-030-36711-4_35

Set Aggregation Network as a Trainable Pooling Layer

Authors: Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, Przemysław Spurek

Abstract: Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to e… ▽ More Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer. △ Less

Submitted 25 November, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: ICONIP 2019

Journal ref: Neural Information Processing. ICONIP 2019

arXiv:1809.08848 [pdf, other]

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

Authors: Wojciech Tarnowski, Piotr Warchoł, Stanisław Jastrzębski, Jacek Tabor, Maciej A. Nowak

Abstract: We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum… ▽ More We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization. △ Less

Submitted 4 March, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

Journal ref: AISTATS 2019

arXiv:1706.07450 [pdf, ps, other]

Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks

Authors: Alex Nowak, Soledad Villar, Afonso S. Bandeira, Joan Bruna

Abstract: Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another… ▽ More Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These 'planted solutions' are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer. △ Less

Submitted 30 August, 2018; v1 submitted 22 June, 2017; originally announced June 2017.

Comments: Revised note to arXiv:1706.07450v1 that appeared in IEEE Data Science Workshop 2018

arXiv:1501.02108 [pdf, other]

doi 10.1103/PhysRevE.92.022111

Signal from noise retrieval from one and two-point Green's function - comparison

Authors: Zbigniew Drogosz, Jerzy Jurkiewicz, Grzegorz Łukaszewski, Maciej A. Nowak

Abstract: We compare two methods of eigen-inference from large sets of data, based on the analysis of one-point and two-point Green's functions, respectively. Our analysis points at the superiority of eigen-inference based on one-point Green's function. First, the applied by us method based on Pad?e approximants is orders of magnitude faster comparing to the eigen-inference based on uctuations (two-point Gr… ▽ More We compare two methods of eigen-inference from large sets of data, based on the analysis of one-point and two-point Green's functions, respectively. Our analysis points at the superiority of eigen-inference based on one-point Green's function. First, the applied by us method based on Pad?e approximants is orders of magnitude faster comparing to the eigen-inference based on uctuations (two-point Green's functions). Second, we have identified the source of potential instability of the two-point Green's function method, as arising from the spurious zero and negative modes of the estimator for a variance operator of the certain multidimensional Gaussian distribution, inherent for the two-point Green's function eigen-inference method. Third, we have presented the cases of eigen-inference based on negative spectral moments, for strictly positive spectra. Finally, we have compared the cases of eigen-inference of real-valued and complex-valued correlated Wishart distributions, reinforcing our conclusions on an advantage of the one-point Green's function method. △ Less

Submitted 9 January, 2015; originally announced January 2015.

Comments: 14 pages, 8 figures, 3 tables

Journal ref: Phys. Rev. E 92, 022111 (2015)

Showing 1–12 of 12 results for author: Nowak, A