Simple and Scalable Algorithms for Cluster-Aware Precision Medicine
Authors:
Amanda M. Buch,
Conor Liston,
Logan Grosenick
Abstract:
AI-enabled precision medicine promises a transformational improvement in healthcare outcomes by enabling data-driven personalized diagnosis, prognosis, and treatment. However, the well-known "curse of dimensionality" and the clustered structure of biomedical data together interact to present a joint challenge in the high dimensional, limited observation precision medicine regime. To overcome both…
▽ More
AI-enabled precision medicine promises a transformational improvement in healthcare outcomes by enabling data-driven personalized diagnosis, prognosis, and treatment. However, the well-known "curse of dimensionality" and the clustered structure of biomedical data together interact to present a joint challenge in the high dimensional, limited observation precision medicine regime. To overcome both issues simultaneously we propose a simple and scalable approach to joint clustering and embedding that combines standard embedding methods with a convex clustering penalty in a modular way. This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods, which we show with straightforward implementations of hierarchically clustered principal component analysis (PCA), locally linear embedding (LLE), and canonical correlation analysis (CCA). Through both numerical experiments and real-world examples, we demonstrate that our approach outperforms traditional and contemporary clustering methods on highly underdetermined problems (e.g., with just tens of observations) as well as on large sample datasets. Importantly, our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings. Thus our approach improves significantly on existing methods for identifying patient subgroups in multiomics and neuroimaging data, enabling scalable and interpretable biomarkers for precision medicine.
△ Less
Submitted 17 May, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience
Authors:
Randal Burns,
William Gray Roncal,
Dean Kleissas,
Kunal Lillaney,
Priya Manavalan,
Eric Perlman,
Daniel R. Berger,
Davi D. Bock,
Kwanghun Chung,
Logan Grosenick,
Narayanan Kasthuri,
Nicholas C. Weiler,
Karl Deisseroth,
Michael Kazhdan,
Jeff Lichtman,
R. Clay Reid,
Stephen J. Smith,
Alexander S. Szalay,
Joshua T. Vogelstein,
R. Jacob Vogelstein
Abstract:
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on hi…
▽ More
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at http://openconnecto.me.
The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.
△ Less
Submitted 18 June, 2013; v1 submitted 14 June, 2013;
originally announced June 2013.