-
Automated Discovery of Anomalous Features in Ultra-Large Planetary Remote Sensing Datasets using Variational Autoencoders
Authors:
Adam Lesnikowski,
Valentin T. Bickel,
Daniel Angerhausen
Abstract:
The NASA Lunar Reconnaissance Orbiter (LRO) has returned petabytes of lunar high spatial resolution surface imagery over the past decade, impractical for humans to fully review manually. Here we develop an automated method using a deep generative visual model that rapidly retrieves scientifically interesting examples of LRO surface imagery representing the first planetary image anomaly detector. W…
▽ More
The NASA Lunar Reconnaissance Orbiter (LRO) has returned petabytes of lunar high spatial resolution surface imagery over the past decade, impractical for humans to fully review manually. Here we develop an automated method using a deep generative visual model that rapidly retrieves scientifically interesting examples of LRO surface imagery representing the first planetary image anomaly detector. We give quantitative experimental evidence that our method preferentially retrieves anomalous samples such as notable geological features and known human landing and spacecraft crash sites. Our method addresses a major capability gap in planetary science and presents a novel way to unlock insights hidden in ever-increasing remote sensing data archives, with numerous applications to other science domains. We publish our code and data along with this paper.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Synthetic Data and Simulators for Recommendation Systems: Current State and Future Directions
Authors:
Adam Lesnikowski,
Gabriel de Souza Pereira Moreira,
Sara Rabhi,
Karl Byleen-Higley
Abstract:
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and discuss a key trade-off between data fidelity and privacy in the past work on synthetic data and simulators for recommendation systems. For the important use case…
▽ More
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and discuss a key trade-off between data fidelity and privacy in the past work on synthetic data and simulators for recommendation systems. For the important use case of predicting algorithm rankings on real data from synthetic data, we provide motivation and current successes versus limitations. Finally we outline a number of exciting future directions for recommendation systems that we believe deserve further attention and work, including mixing real and synthetic data, feedback in dataset generation, robust simulations, and privacy-preserving methods.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Unsupervised Distribution Learning for Lunar Surface Anomaly Detection
Authors:
Adam Lesnikowski,
Valentin T. Bickel,
Daniel Angerhausen
Abstract:
In this work we show that modern data-driven machine learning techniques can be successfully applied on lunar surface remote sensing data to learn, in an unsupervised way, sufficiently good representations of the data distribution to enable lunar technosignature and anomaly detection. In particular we train an unsupervised distribution learning neural network model to find the Apollo 15 landing mo…
▽ More
In this work we show that modern data-driven machine learning techniques can be successfully applied on lunar surface remote sensing data to learn, in an unsupervised way, sufficiently good representations of the data distribution to enable lunar technosignature and anomaly detection. In particular we train an unsupervised distribution learning neural network model to find the Apollo 15 landing module in a testing dataset, with no dataset specific model or hyperparameter tuning. Sufficiently good unsupervised data density estimation has the promise of enabling myriad useful downstream tasks, including locating lunar resources for future space flight and colonization, finding new impact craters or lunar surface resha**, and algorithmically deciding the importance of unlabeled samples to send back from power- and bandwidth-constrained missions. We show in this work that such unsupervised learning can be successfully done in the lunar remote sensing and space science contexts.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
The Relevance of Bayesian Layer Positioning to Model Uncertainty in Deep Bayesian Active Learning
Authors:
Jiaming Zeng,
Adam Lesnikowski,
Jose M. Alvarez
Abstract:
One of the main challenges of deep learning tools is their inability to capture model uncertainty. While Bayesian deep learning can be used to tackle the problem, Bayesian neural networks often require more time and computational power to train than deterministic networks. Our work explores whether fully Bayesian networks are needed to successfully capture model uncertainty. We vary the number and…
▽ More
One of the main challenges of deep learning tools is their inability to capture model uncertainty. While Bayesian deep learning can be used to tackle the problem, Bayesian neural networks often require more time and computational power to train than deterministic networks. Our work explores whether fully Bayesian networks are needed to successfully capture model uncertainty. We vary the number and position of Bayesian layers in a network and compare their performance on active learning with the MNIST dataset. We found that we can fully capture the model uncertainty by using only a few Bayesian layers near the output of the network, combining the advantages of deterministic and Bayesian networks.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Large-Scale Visual Active Learning with Deep Probabilistic Ensembles
Authors:
Kashyap Chitta,
Jose M. Alvarez,
Adam Lesnikowski
Abstract:
Annotating the right data for training deep neural networks is an important challenge. Active learning using uncertainty estimates from Bayesian Neural Networks (BNNs) could provide an effective solution to this. Despite being theoretically principled, BNNs require approximations to be applied to large-scale problems, where both performance and uncertainty estimation are crucial. In this paper, we…
▽ More
Annotating the right data for training deep neural networks is an important challenge. Active learning using uncertainty estimates from Bayesian Neural Networks (BNNs) could provide an effective solution to this. Despite being theoretically principled, BNNs require approximations to be applied to large-scale problems, where both performance and uncertainty estimation are crucial. In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses a regularized ensemble to approximate a deep BNN. We conduct a series of large-scale visual active learning experiments to evaluate DPEs on classification with the CIFAR-10, CIFAR-100 and ImageNet datasets, and semantic segmentation with the BDD100k dataset. Our models require significantly less training data to achieve competitive performances, and steadily improve upon strong active learning baselines as the annotation budget is increased.
△ Less
Submitted 20 February, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Deep Probabilistic Ensembles: Approximate Variational Inference through KL Regularization
Authors:
Kashyap Chitta,
Jose M. Alvarez,
Adam Lesnikowski
Abstract:
In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses a regularized ensemble to approximate a deep Bayesian Neural Network (BNN). We do so by incorporating a KL divergence penalty term into the training objective of an ensemble, derived from the evidence lower bound used in variational inference. We evaluate the uncertainty estimates obtained from our mode…
▽ More
In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses a regularized ensemble to approximate a deep Bayesian Neural Network (BNN). We do so by incorporating a KL divergence penalty term into the training objective of an ensemble, derived from the evidence lower bound used in variational inference. We evaluate the uncertainty estimates obtained from our models for active learning on visual classification. Our approach steadily improves upon active learning baselines as the annotation budget is increased.
△ Less
Submitted 30 November, 2018; v1 submitted 6 November, 2018;
originally announced November 2018.
-
How Much Did it Rain? Predicting Real Rainfall Totals Based on Radar Data
Authors:
Adam Lesnikowski
Abstract:
We applied a variety of parametric and non-parametric machine learning models to predict the probability distribution of rainfall based on 1M training examples over a single year across several U.S. states. Our top performing model based on a squared loss objective was a cross-validated parametric k-nearest-neighbor predictor that took about six days to compute, and was competitive in a world-wide…
▽ More
We applied a variety of parametric and non-parametric machine learning models to predict the probability distribution of rainfall based on 1M training examples over a single year across several U.S. states. Our top performing model based on a squared loss objective was a cross-validated parametric k-nearest-neighbor predictor that took about six days to compute, and was competitive in a world-wide competition.
△ Less
Submitted 6 August, 2016;
originally announced August 2016.