Search | arXiv e-print repository

Borrowing from historical control data in a Bayesian time-to-event model with flexible baseline hazard function

Abstract: There is currently a focus on statistical methods which can use historical trial information to help accelerate the discovery, development and delivery of medicine. Bayesian methods can be constructed so that the borrowing is "dynamic" in the sense that the similarity of the data helps to determine how much information is used. In the time to event setting with one historical data set, a popular m… ▽ More There is currently a focus on statistical methods which can use historical trial information to help accelerate the discovery, development and delivery of medicine. Bayesian methods can be constructed so that the borrowing is "dynamic" in the sense that the similarity of the data helps to determine how much information is used. In the time to event setting with one historical data set, a popular model for a range of baseline hazards is the piecewise exponential model where the time points are fixed and a borrowing structure is imposed on the model. Although convenient for implementation this approach effects the borrowing capability of the model. We propose a Bayesian model which allows the time points to vary and a dependency to be placed between the baseline hazards. This serves to smooth the posterior baseline hazard improving both model estimation and borrowing characteristics. We explore a variety of prior structures for the borrowing within our proposed model and assess their performance against established approaches. We demonstrate that this leads to improved type I error in the presence of prior data conflict and increased power. We have developed accompanying software which is freely available and enables easy implementation of the approach. △ Less

Submitted 23 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Sampler for regression coefficients (beta and beta_0) updated on page 5 of the Supplementary Material

arXiv:2210.08645 [pdf, other]

An efficient deep neural network to find small objects in large 3D images

Authors: Jungkyu Park, Jakub Chłędowski, Stanisław Jastrzębski, Jan Witowski, Yanqi Xu, Linda Du, Sushma Gaddam, Eric Kim, Alana Lewin, Ujas Parikh, Anastasia Plaunova, Sardius Chen, Alexandra Millet, James Park, Kristine Pysarenko, Shalin Patel, Julia Goldberg, Melanie Wegener, Linda Moy, Laura Heacock, Beatriu Reig, Krzysztof J. Geras

Abstract: 3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alt… ▽ More 3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896). △ Less

Submitted 26 February, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

arXiv:2104.14008 [pdf, other]

doi 10.18637/jss.v100.i11

BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression

Authors: Zhi Zhao, Marco Banterle, Leonardo Bottolo, Sylvia Richardson, Alex Lewin, Manuela Zucknick

Abstract: In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced sev… ▽ More In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. We also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Journal ref: Journal of Statistical Software. 100 (2021) 1-32

arXiv:2101.05899 [pdf, other]

doi 10.1093/jrsssc/qlad102

Multivariate Bayesian structured variable selection for pharmacogenomic studies

Authors: Zhi Zhao, Marco Banterle, Alex Lewin, Manuela Zucknick

Abstract: Precision cancer medicine aims to determine the optimal treatment for each patient. In-vitro cancer drug sensitivity screens combined with multi-omics characterization of the cancer cells have become an important tool to achieve this aim. Analyzing such pharmacogenomic studies requires flexible and efficient joint statistical models for associating drug sensitivity with high-dimensional multi-omic… ▽ More Precision cancer medicine aims to determine the optimal treatment for each patient. In-vitro cancer drug sensitivity screens combined with multi-omics characterization of the cancer cells have become an important tool to achieve this aim. Analyzing such pharmacogenomic studies requires flexible and efficient joint statistical models for associating drug sensitivity with high-dimensional multi-omics data. We propose a multivariate Bayesian structured variable selection model for sparse identification of omics features associated with multiple correlated drug responses. Since many anti-cancer drugs are designed for specific molecular targets, our approach makes use of known structure between responses and predictors, e.g. molecular pathways and related omics features targeted by specific drugs, via a Markov random field (MRF) prior for the latent indicator variables of the coefficients in sparse seemingly unrelated regression. The structure information included in the MRF prior can improve the model performance, i.e. variable selection and response prediction, compared to other common priors. In addition, we employ random effects to capture heterogeneity between cancer types in a pan-cancer setting. The proposed approach is validated by simulation studies and applied to the Genomics of Drug Sensitivity in Cancer data, which includes pharmacological profiling and multi-omics characterization of a large set of heterogeneous cell lines. △ Less

Submitted 13 February, 2023; v1 submitted 14 January, 2021; originally announced January 2021.

Journal ref: Journal of the Royal Statistical Society, Series C. 2024, 73, 420-443

arXiv:1903.08297 [pdf, other]

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Authors: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung , et al. (7 additional authors not shown)

Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use… ▽ More We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/SkxYez76FE

arXiv:1512.00809 [pdf, ps, other]

doi 10.1080/00031305.2016.1277159

Optimal whitening and decorrelation

Authors: Agnan Kessy, Alex Lewin, Korbinian Strimmer

Abstract: Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component an… ▽ More Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables. △ Less

Submitted 17 December, 2016; v1 submitted 2 December, 2015; originally announced December 2015.

Comments: 14 pages, 2 tables

Journal ref: The American Statistician 2018, Vol. 72, No. 4, pp. 309-314

arXiv:1401.3603 [pdf]

doi 10.1002/anie.201201606

Tuning the critical temperature of cuprate superconductor films using self-assembled organic layers

Authors: I. Carmeli, A. Lewin, E. Flekser, I. Diamant, Q. Zhang, J. Shen, M. Gozin, S. Richter, Y. Dagan

Abstract: Many of the electronic properties of high-temperature cuprate superconductors (HTSC) are strongly dependent on the number of charge carriers put into the CuO$_2$ planes (do**). Superconductivity appears over a dome-shaped region of the do**-temperature phase diagram. The highest critical temperature (Tc) is obtained for the so-called "optimum do**". The do** mechanism is usually chemical;… ▽ More Many of the electronic properties of high-temperature cuprate superconductors (HTSC) are strongly dependent on the number of charge carriers put into the CuO$_2$ planes (do**). Superconductivity appears over a dome-shaped region of the do**-temperature phase diagram. The highest critical temperature (Tc) is obtained for the so-called "optimum do**". The do** mechanism is usually chemical; it can be done by cationic substitution. This is the case, for example, in La$_{2-x}$Sr$_x$CuO$_4$ where La3+ is replaced by Sr2+ thus adding a hole to the CuO$_2$ planes. A similar effect is achieved by adding oxygen as in the case of YBa$_2$Cu$_3$O$_{6+δ}$ where $δ$ represents the excess oxygen in the sample. In this paper we report on a different mechanism, one that enables the addition or removal of carriers from the surface of the HTSC. This method utilizes a self-assembled monolayer (SAM) of polar molecules adsorbed on the cuprate surface. In the case of optically active molecules, the polarity of the SAM can be modulated by shining light on the coated surface. This results in a light-induced modulation of the superconducting phase transition of the sample. The ability to control the superconducting transition temperature with the use of SAMs makes these surfaces practical for various devices such as switches and detectors based on high-Tc superconductors. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Angewandte Chemie International Edition 51 (29), 7162-7165 (2012)

arXiv:astro-ph/9908061 [pdf, ps, other]

doi 10.1103/PhysRevD.64.023514

Can inflationary models of cosmic perturbations evade the secondary oscillation test?

Authors: Alex Lewin, Andreas Albrecht

Abstract: We consider the consequences of an observed Cosmic Microwave Background (CMB) temperature anisotropy spectrum containing no secondary oscillations. While such a spectrum is generally considered to be a robust signature of active structure formation, we show that such a spectrum {\em can} be produced by (very unusual) inflationary models or other passive evolution models. However, we show that fo… ▽ More We consider the consequences of an observed Cosmic Microwave Background (CMB) temperature anisotropy spectrum containing no secondary oscillations. While such a spectrum is generally considered to be a robust signature of active structure formation, we show that such a spectrum {\em can} be produced by (very unusual) inflationary models or other passive evolution models. However, we show that for all these passive models the characteristic oscillations would show up in other observable spectra. Our work shows that when CMB polarization and matter power spectra are taken into account secondary oscillations are indeed a signature of even these very exotic passive models. We construct a measure of the observability of secondary oscillations in a given experiment, and show that even with foregrounds both the MAP and \pk satellites should be able to distinguish between models with and without oscillations. Thus we conclude that inflationary and other passive models can {\em not} evade the secondary oscillation test. △ Less

Submitted 25 April, 2001; v1 submitted 6 August, 1999; originally announced August 1999.

Comments: Final version accepted for publication in PRD. Minor improvements have been made to the discussion and new data has been included. The conclusions are unchagned

Journal ref: Phys.Rev.D64:023514,2001

arXiv:astro-ph/9804283 [pdf, ps, other]

doi 10.1046/j.1365-8711.1999.02104.x

A new statistic for picking out Non-Gaussianity in the CMB

Authors: Alex Lewin, Andreas Albrecht, Joao Magueijo

Abstract: In this paper we propose a new statistic capable of detecting non-Gaussianity in the CMB. The statistic is defined in Fourier space, and therefore naturally separates angular scales. It consists of taking another Fourier transform, in angle, over the Fourier modes within a given ring of scales. Like other Fourier space statistics, our statistic outdoes more conventional methods when faced with c… ▽ More In this paper we propose a new statistic capable of detecting non-Gaussianity in the CMB. The statistic is defined in Fourier space, and therefore naturally separates angular scales. It consists of taking another Fourier transform, in angle, over the Fourier modes within a given ring of scales. Like other Fourier space statistics, our statistic outdoes more conventional methods when faced with combinations of Gaussian processes (be they noise or signal) and a non-Gaussian signal which dominates only on some scales. However, unlike previous efforts along these lines, our statistic is successful in recognizing multiple non-Gaussian patterns in a single field. We discuss various applications, in which the Gaussian component may be noise or primordial signal, and the non-Gaussian component may be a cosmic string map, or some geometrical construction mimicking, say, small scale dust maps. △ Less

Submitted 9 April, 1999; v1 submitted 27 April, 1998; originally announced April 1998.

Comments: 8 pages, 14 figures Corrected typos

Report number: Imperial/TP/97-98/42

Journal ref: Mon.Not.Roy.Astron.Soc.302:131-138,1999

arXiv:astro-ph/9702131 [pdf, ps, other]

Non-Gaussian spectra and the search for cosmic strings

Authors: Joao Magueijo, Alex Lewin

Abstract: We present a new tool for relating theory and experiment suited for non-Gaussian theories: non-Gaussian spectra. It does for non-Gaussian theories what the angular power spectrum $C_\ell$ does for Gaussian theories. We then show how previous studies of cosmic strings have over rated their non-Gaussian signature. More realistic maps are not visually stringy. However non-Gaussian spectra will accu… ▽ More We present a new tool for relating theory and experiment suited for non-Gaussian theories: non-Gaussian spectra. It does for non-Gaussian theories what the angular power spectrum $C_\ell$ does for Gaussian theories. We then show how previous studies of cosmic strings have over rated their non-Gaussian signature. More realistic maps are not visually stringy. However non-Gaussian spectra will accuse their stringiness. We finally summarise the steps of an undergoing experimental project aiming at searching for cosmic strings by means of this technique. △ Less

Submitted 14 February, 1997; originally announced February 1997.

Comments: Contribution to the proceedings of ``Topological defects and CMB'', Rome, October 96

Showing 1–10 of 10 results for author: Lewin, A