Search | arXiv e-print repository

Informational Rescaling of PCA Maps with Application to Genetic Distance

Authors: Nassim Nicholas Taleb, Pierre Zalloua, Khaled Elbassioni, Andreas Henschel, Daniel E. Platt

Abstract: We discuss the inadequacy of covariances/correlations and other measures in L2 as relative distance metrics under some conditions. We propose a computationally simple heuristic to transform a map based on standard principal component analysis (PCA) (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Rescaling Principal Co… ▽ More We discuss the inadequacy of covariances/correlations and other measures in L2 as relative distance metrics under some conditions. We propose a computationally simple heuristic to transform a map based on standard principal component analysis (PCA) (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Rescaling Principal Component based distances using MI allows a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy rescaled PCA, while preserving order relationships (along a dimension), changes the relative distances to make them linear to information. We show the effect on the entire world population and some subsamples, which leads to significant differences with the results of current research. △ Less

Submitted 4 March, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2202.02164 [pdf, ps, other]

Group invariant machine learning by fundamental domain projections

Authors: Benjamin Aslan, Daniel Platt, David Sheard

Abstract: We approach the well-studied problem of supervised group invariant and equivariant machine learning from the point of view of geometric topology. We propose a novel approach using a pre-processing step, which involves projecting the input data into a geometric space which parametrises the orbits of the symmetry group. This new data can then be the input for an arbitrary machine learning model (neu… ▽ More We approach the well-studied problem of supervised group invariant and equivariant machine learning from the point of view of geometric topology. We propose a novel approach using a pre-processing step, which involves projecting the input data into a geometric space which parametrises the orbits of the symmetry group. This new data can then be the input for an arbitrary machine learning model (neural network, random forest, support-vector machine etc). We give an algorithm to compute the geometric projection, which is efficient to implement, and we illustrate our approach on some example machine learning problems (including the well-studied problem of predicting Hodge numbers of CICY matrices), in each case finding an improvement in accuracy versus others in the literature. The geometric topology viewpoint also allows us to give a unified description of so-called intrinsic approaches to group equivariant machine learning, which encompasses many other approaches in the literature. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 21 pages, 4 figures

MSC Class: 57R18 ACM Class: I.2.m

arXiv:2109.03667 [pdf, ps, other]

doi 10.1109/QRS-C55045.2021.00168

The Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work

Authors: Moritz Platt, Johannes Sedlmeir, Daniel Platt, Paolo Tasca, Jiahua Xu, Nikhil Vadgama, Juan Ignacio Ibañez

Abstract: Popular distributed ledger technology (DLT) systems using proof-of-work (PoW) for Sybil attack resistance have extreme energy requirements, drawing stern criticism from academia, businesses, and the media. DLT systems building on alternative consensus mechanisms, foremost proof-of-stake (PoS), aim to address this downside. In this paper, we take a first step towards comparing the energy requiremen… ▽ More Popular distributed ledger technology (DLT) systems using proof-of-work (PoW) for Sybil attack resistance have extreme energy requirements, drawing stern criticism from academia, businesses, and the media. DLT systems building on alternative consensus mechanisms, foremost proof-of-stake (PoS), aim to address this downside. In this paper, we take a first step towards comparing the energy requirements of such systems to understand whether they achieve this goal equally well. While multiple studies have been undertaken that analyze the energy demands of individual Blockchains, little comparative work has been done. We approach this research question by formalizing a basic consumption model for PoS blockchains. Applying this model to six archetypal blockchains generates three main findings: First, we confirm the concerns around the energy footprint of PoW by showing that Bitcoin's energy consumption exceeds the energy consumption of all PoS-based systems analyzed by at least three orders of magnitude. Second, we illustrate that there are significant differences in energy consumption among the PoSbased systems analyzed, with permissionless systems having an overall larger energy footprint. Third, we point out that the type of hardware that validators use has a considerable impact on whether PoS blockchains' energy consumption is comparable with or considerably larger than that of centralized, non-DLT systems. △ Less

Submitted 4 April, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Journal ref: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), 2021, pp. 1135-1144

arXiv:2101.07417 [pdf, other]

Inferring COVID-19 Biological Pathways from Clinical Phenotypes via Topological Analysis

Authors: Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida

Abstract: COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this… ▽ More COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to develo** the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways. △ Less

Submitted 1 May, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: Proceedings of the AAAI Workshop on Health Intelligence 2021

arXiv:1609.09430 [pdf, other]

CNN Architectures for Large-Scale Audio Classification

Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task. △ Less

Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

Showing 1–5 of 5 results for author: Platt, D