-
nbi: the Astronomer's Package for Neural Posterior Estimation
Authors:
Keming Zhang,
Joshua S. Bloom,
Stéfan van der Walt,
Nina Hernitschek
Abstract:
Despite the promise of Neural Posterior Estimation (NPE) methods in astronomy, the adaptation of NPE into the routine inference workflow has been slow. We identify three critical issues: the need for custom featurizer networks tailored to the observed data, the inference inexactness, and the under-specification of physical forward models. To address the first two issues, we introduce a new framewo…
▽ More
Despite the promise of Neural Posterior Estimation (NPE) methods in astronomy, the adaptation of NPE into the routine inference workflow has been slow. We identify three critical issues: the need for custom featurizer networks tailored to the observed data, the inference inexactness, and the under-specification of physical forward models. To address the first two issues, we introduce a new framework and open-source software nbi (Neural Bayesian Inference), which supports both amortized and sequential NPE. First, nbi provides built-in "featurizer" networks with demonstrated efficacy on sequential data, such as light curve and spectra, thus obviating the need for this customization on the user end. Second, we introduce a modified algorithm SNPE-IS, which facilities asymptotically exact inference by using the surrogate posterior under NPE only as a proposal distribution for importance sampling. These features allow nbi to be applied off-the-shelf to astronomical inference problems involving light curves and spectra. We discuss how nbi may serve as an effective alternative to existing methods such as Nested Sampling. Our package is at https://github.com/kmzzhang/nbi.
△ Less
Submitted 21 December, 2023; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Tails: Chasing Comets with the Zwicky Transient Facility and Deep Learning
Authors:
Dmitry A. Duev,
Bryce T. Bolin,
Matthew J. Graham,
Michael S. P. Kelley,
Ashish Mahabal,
Eric C. Bellm,
Michael W. Coughlin,
Richard Dekany,
George Helou,
Shrinivas R. Kulkarni,
Frank J. Masci,
Thomas A. Prince,
Reed Riddle,
Maayane T. Soumagnac,
Stéfan J. van der Walt
Abstract:
We present Tails, an open-source deep-learning framework for the identification and localization of comets in the image data of the Zwicky Transient Facility (ZTF), a robotic optical time-domain survey currently in operation at the Palomar Observatory in California, USA. Tails employs a custom EfficientDet-based architecture and is capable of finding comets in single images in near real time, rath…
▽ More
We present Tails, an open-source deep-learning framework for the identification and localization of comets in the image data of the Zwicky Transient Facility (ZTF), a robotic optical time-domain survey currently in operation at the Palomar Observatory in California, USA. Tails employs a custom EfficientDet-based architecture and is capable of finding comets in single images in near real time, rather than requiring multiple epochs as with traditional methods. The system achieves state-of-the-art performance with 99% recall, 0.01% false positive rate, and 1-2 pixel root mean square error in the predicted position. We report the initial results of the Tails efficiency evaluation in a production setting on the data of the ZTF Twilight survey, including the first AI-assisted discovery of a comet (C/2020 T2) and the recovery of a comet (P/2016 J3 = P/2021 A3).
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
A reusable pipeline for large-scale fiber segmentation on unidirectional fiber beds using fully convolutional neural networks
Authors:
Alexandre Fioravante de Siqueira,
Daniela Mayumi Ushizima,
Stéfan van der Walt
Abstract:
Fiber-reinforced ceramic-matrix composites are advanced materials resistant to high temperatures, with application to aerospace engineering. Their analysis depends on the detection of embedded fibers, with semi-supervised techniques usually employed to separate fibers within the fiber beds. Here we present an open computational pipeline to detect fibers in ex-situ X-ray computed tomography fiber b…
▽ More
Fiber-reinforced ceramic-matrix composites are advanced materials resistant to high temperatures, with application to aerospace engineering. Their analysis depends on the detection of embedded fibers, with semi-supervised techniques usually employed to separate fibers within the fiber beds. Here we present an open computational pipeline to detect fibers in ex-situ X-ray computed tomography fiber beds. To separate the fibers in these samples, we tested four different architectures of fully convolutional neural networks. When comparing our neural network approach to a semi-supervised one, we obtained Dice and Matthews coefficients greater than $92.28 \pm 9.65\%$, reaching up to $98.42 \pm 0.03 \%$, showing that the network results are close to the human-supervised ones in these fiber beds, in some cases separating fibers that human-curated algorithms could not find. The software we generated in this project is open source, released under a permissive license, and can be freely adapted and re-used in other domains. All data and instructions on how to download and use it are also available.
△ Less
Submitted 14 January, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Array Programming with NumPy
Authors:
Charles R. Harris,
K. Jarrod Millman,
Stéfan J. van der Walt,
Ralf Gommers,
Pauli Virtanen,
David Cournapeau,
Eric Wieser,
Julian Taylor,
Sebastian Berg,
Nathaniel J. Smith,
Robert Kern,
Matti Picus,
Stephan Hoyer,
Marten H. van Kerkwijk,
Matthew Brett,
Allan Haldane,
Jaime Fernández del Río,
Mark Wiebe,
Pearu Peterson,
Pierre Gérard-Marchant,
Kevin Sheppard,
Tyler Reddy,
Warren Weckesser,
Hameer Abbasi,
Christoph Gohlke
, et al. (1 additional authors not shown)
Abstract:
Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material sci…
▽ More
Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science, engineering, finance, and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves and the first imaging of a black hole. Here we show how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring, and analyzing scientific data. NumPy is the foundation upon which the entire scientific Python universe is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Because of its central position in the ecosystem, NumPy increasingly plays the role of an interoperability layer between these new array computation libraries.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python
Authors:
Pauli Virtanen,
Ralf Gommers,
Travis E. Oliphant,
Matt Haberland,
Tyler Reddy,
David Cournapeau,
Evgeni Burovski,
Pearu Peterson,
Warren Weckesser,
Jonathan Bright,
Stéfan J. van der Walt,
Matthew Brett,
Joshua Wilson,
K. Jarrod Millman,
Nikolay Mayorov,
Andrew R. J. Nelson,
Eric Jones,
Robert Kern,
Eric Larson,
CJ Carey,
İlhan Polat,
Yu Feng,
Eric W. Moore,
Jake VanderPlas,
Denis Laxalde
, et al. (10 additional authors not shown)
Abstract:
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent reposit…
▽ More
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
cesium: Open-Source Platform for Time-Series Inference
Authors:
Brett Naul,
Stéfan van der Walt,
Arien Crellin-Quick,
Joshua S. Bloom,
Fernando Pérez
Abstract:
Inference on time series data is a common requirement in many scientific disciplines and internet of things (IoT) applications, yet there are few resources available to domain scientists to easily, robustly, and repeatably build such complex inference workflows: traditional statistical models of time series are often too rigid to explain complex time domain behavior, while popular machine learning…
▽ More
Inference on time series data is a common requirement in many scientific disciplines and internet of things (IoT) applications, yet there are few resources available to domain scientists to easily, robustly, and repeatably build such complex inference workflows: traditional statistical models of time series are often too rigid to explain complex time domain behavior, while popular machine learning packages require already-featurized dataset inputs. Moreover, the software engineering tasks required to instantiate the computational platform are daunting. cesium is an end-to-end time series analysis framework, consisting of a Python library as well as a web front-end interface, that allows researchers to featurize raw data and apply modern machine learning techniques in a simple, reproducible, and extensible way. Users can apply out-of-the-box feature engineering workflows as well as save and replay their own analyses. Any steps taken in the front end can also be exported to a Jupyter notebook, so users can iterate between possible models within the front end and then fine-tune their analysis using the additional capabilities of the back-end library. The open-source packages make us of many use modern Python toolkits, including xarray, dask, Celery, Flask, and scikit-learn.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
scikit-image: Image processing in Python
Authors:
Stefan van der Walt,
Johannes L. Schönberger,
Juan Nunez-Iglesias,
François Boulogne,
Joshua D. Warner,
Neil Yager,
Emmanuelle Gouillart,
Tony Yu,
the scikit-image contributors
Abstract:
scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal "Modified BSD" open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators. In this paper we highlight the advantages of open sou…
▽ More
scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal "Modified BSD" open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators. In this paper we highlight the advantages of open source to achieve the goals of the scikit-image library, and we showcase several real-world image processing applications that use scikit-image.
△ Less
Submitted 23 July, 2014;
originally announced July 2014.
-
A polygon-based interpolation operator for super-resolution imaging
Authors:
Stéfan J. van der Walt,
B. M. Herbst
Abstract:
We outline the super-resolution reconstruction problem posed as a maximization of probability. We then introduce an interpolation method based on polygonal pixel overlap, express it as a linear operator, and use it to improve reconstruction. Polygon interpolation outperforms the simpler bilinear interpolation operator and, unlike Gaussian modeling of pixels, requires no parameter estimation. A fre…
▽ More
We outline the super-resolution reconstruction problem posed as a maximization of probability. We then introduce an interpolation method based on polygonal pixel overlap, express it as a linear operator, and use it to improve reconstruction. Polygon interpolation outperforms the simpler bilinear interpolation operator and, unlike Gaussian modeling of pixels, requires no parameter estimation. A free software implementation that reproduces the results shown is provided.
△ Less
Submitted 15 October, 2012; v1 submitted 11 October, 2012;
originally announced October 2012.
-
The NumPy array: a structure for efficient numerical computation
Authors:
Stefan Van Der Walt,
S. Chris Colbert,
Gaël Varoquaux
Abstract:
In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied to improve performance: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. We first present the NumPy array structure, t…
▽ More
In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied to improve performance: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. We first present the NumPy array structure, then show how to use it for efficient computation, and finally how to share array data with other libraries.
△ Less
Submitted 8 February, 2011;
originally announced February 2011.