-
Identifying Entangled Physics Relationships through Sparse Matrix Decomposition to Inform Plasma Fusion Design
Authors:
M. Giselle Fernández-Godino,
Michael J. Grosskopf,
Julia B. Nakhleh,
Brandon M. Wilson,
John Kline,
Gowri Srinivasan
Abstract:
A sustainable burn platform through inertial confinement fusion (ICF) has been an ongoing challenge for over 50 years. Mitigating engineering limitations and improving the current design involves an understanding of the complex coupling of physical processes. While sophisticated simulations codes are used to model ICF implosions, these tools contain necessary numerical approximation but miss physi…
▽ More
A sustainable burn platform through inertial confinement fusion (ICF) has been an ongoing challenge for over 50 years. Mitigating engineering limitations and improving the current design involves an understanding of the complex coupling of physical processes. While sophisticated simulations codes are used to model ICF implosions, these tools contain necessary numerical approximation but miss physical processes that limit predictive capability. Identification of relationships between controllable design inputs to ICF experiments and measurable outcomes (e.g. yield, shape) from performed experiments can help guide the future design of experiments and development of simulation codes, to potentially improve the accuracy of the computational models used to simulate ICF experiments. We use sparse matrix decomposition methods to identify clusters of a few related design variables. Sparse principal component analysis (SPCA) identifies grou**s that are related to the physical origin of the variables (laser, hohlraum, and capsule). A variable importance analysis finds that in addition to variables highly correlated with neutron yield such as picket power and laser energy, variables that represent a dramatic change of the ICF design such as number of pulse steps are also very important. The obtained sparse components are then used to train a random forest (RF) surrogate for predicting total yield. The RF performance on the training and testing data compares with the performance of the RF surrogate trained using all design variables considered. This work is intended to inform design changes in future ICF experiments by augmenting the expert intuition and simulations results.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
EigenRank by Committee: A Data Subset Selection and Failure Prediction paradigm for Robust Deep Learning based Medical Image Segmentation
Authors:
Bilwaj Gaonkar,
Joel Beckett,
Mark Attiah,
Christine Ahn,
Matthew Edwards,
Bayard Wilson,
Azim Laiwalla,
Banafsheh Salehi,
Bryan Yoo,
Alex Bui,
Luke Macyszyn
Abstract:
Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert phys…
▽ More
Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert physicians to a likely segmentation failure. Here we propose a novel algorithm, named `Eigenrank' which addresses both of these challenges. Eigenrank can select for manual labeling, a subset of medical images from a large database, such that a U-Net trained on this subset is superior to one trained on a randomly selected subset of the same size. Eigenrank can also be used to pick out, cases in a large database, where deep learning segmentation will fail. We present our algorithm, followed by results and a discussion of how Eigenrank exploits the Von Neumann information to perform both data subset selection and failure prediction for medical image segmentation using deep learning.
△ Less
Submitted 18 January, 2021; v1 submitted 17 August, 2019;
originally announced August 2019.
-
Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information
Authors:
Charles B. Delahunt,
Mayoore S. Jaiswal,
Matthew P. Horning,
Samantha Janko,
Clay M. Thompson,
Sourabh Kulhare,
Liming Hu,
Travis Ostbye,
Grace Yun,
Roman Gebrehiwot,
Benjamin K. Wilson,
Earl Long,
Stephane Proux,
Dionicia Gamboa,
Peter Chiodini,
Jane Carter,
Mehul Dhorda,
David Isaboke,
Bernhards Ogutu,
Wellington Oyibo,
Elizabeth Villasis,
Kyaw Myo Tun,
Christine Bachman,
David Bell,
Courosh Mehanian
Abstract:
Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumb…
▽ More
Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.
△ Less
Submitted 11 September, 2022; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Predictive Inequity in Object Detection
Authors:
Benjamin Wilson,
Judy Hoffman,
Jamie Morgenstern
Abstract:
In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatri…
▽ More
In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone grou**s, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
Skip-gram word embeddings in hyperbolic space
Authors:
Matthias Leimeister,
Benjamin J. Wilson
Abstract:
Recent work has demonstrated that embeddings of tree-like graphs in hyperbolic space surpass their Euclidean counterparts in performance by a large margin. Inspired by these results and scale-free structure in the word co-occurrence graph, we present an algorithm for learning word embeddings in hyperbolic space from free text. An objective function based on the hyperbolic distance is derived and i…
▽ More
Recent work has demonstrated that embeddings of tree-like graphs in hyperbolic space surpass their Euclidean counterparts in performance by a large margin. Inspired by these results and scale-free structure in the word co-occurrence graph, we present an algorithm for learning word embeddings in hyperbolic space from free text. An objective function based on the hyperbolic distance is derived and included in the skip-gram negative-sampling architecture of word2vec. The hyperbolic word embeddings are then evaluated on word similarity and analogy benchmarks. The results demonstrate the potential of hyperbolic word embeddings, particularly in low dimensions, though without clear superiority over their Euclidean counterparts. We further discuss subtleties in the formulation of the analogy task in curved spaces.
△ Less
Submitted 27 May, 2019; v1 submitted 30 August, 2018;
originally announced September 2018.
-
MARVIN: An Open Machine Learning Corpus and Environment for Automated Machine Learning Primitive Annotation and Execution
Authors:
Chris A. Mattmann,
Sujen Shah,
Brian Wilson
Abstract:
In this demo paper, we introduce the DARPA D3M program for automatic machine learning (ML) and JPL's MARVIN tool that provides an environment to locate, annotate, and execute machine learning primitives for use in ML pipelines. MARVIN is a web-based application and associated back-end interface written in Python that enables composition of ML pipelines from hundreds of primitives from the world of…
▽ More
In this demo paper, we introduce the DARPA D3M program for automatic machine learning (ML) and JPL's MARVIN tool that provides an environment to locate, annotate, and execute machine learning primitives for use in ML pipelines. MARVIN is a web-based application and associated back-end interface written in Python that enables composition of ML pipelines from hundreds of primitives from the world of Scikit-Learn, Keras, DL4J and other widely used libraries. MARVIN allows for the creation of Docker containers that run on Kubernetes clusters within DARPA to provide an execution environment for automated machine learning. MARVIN currently contains over 400 datasets and challenge problems from a wide array of ML domains including routine classification and regression to advanced video/image classification and remote sensing.
△ Less
Submitted 11 August, 2018;
originally announced August 2018.