Search | arXiv e-print repository

Spherinator and HiPSter: Representation Learning for Unbiased Knowledge Discovery from Simulations

Authors: Kai L. Polsterer, Bernd Doser, Andreas Fehlner, Sebastian Trujillo-Gomez

Abstract: Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's la… ▽ More Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's largest simulations and will be essential to solve the extreme data exploration and analysis challenges posed by the Exascale era. Furthermore, this concept is so flexible, that it will also enable explorative access to observed data. Our concept is based on applying nonlinear dimensionality reduction to learn compact representations of the data in a low-dimensional space. The simulation data is projected onto this space for interactive inspection, visual interpretation, sample selection, and local analysis. We present a prototype using a rotational invariant hyperspherical variational convolutional autoencoder, utilizing a power distribution in the latent space, and trained on galaxies from IllustrisTNG simulation. Thereby, we obtain a natural Hubble tuning fork like similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in Aladin Lite. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 4 pages, 1 figure

arXiv:2304.05536 [pdf, other]

doi 10.1051/0004-6361/202345932

A Gaussian process cross-correlation approach to time delay estimation in active galactic nuclei

Authors: F. Pozo Nuñez, N. Gianniotis, K. L. Polsterer

Abstract: We present a probabilistic cross-correlation approach to estimate time delays in the context of reverberation map** (RM) of Active Galactic Nuclei (AGN). We reformulate the traditional interpolated cross-correlation method as a statistically principled model that delivers a posterior distribution for the delay. The method employs Gaussian processes as a model for observed AGN light curves. We de… ▽ More We present a probabilistic cross-correlation approach to estimate time delays in the context of reverberation map** (RM) of Active Galactic Nuclei (AGN). We reformulate the traditional interpolated cross-correlation method as a statistically principled model that delivers a posterior distribution for the delay. The method employs Gaussian processes as a model for observed AGN light curves. We describe the mathematical formalism and demonstrate the new approach using both simulated light curves and available RM observations. The proposed method delivers a posterior distribution for the delay that accounts for observational noise and the non-uniform sampling of the light curves. This feature allow us to fully quantify its uncertainty and propagate it to subsequent calculations of dependent physical quantities, e.g., black hole masses. It delivers out-of-sample predictions, which enables us to subject it to model selection and it can calculate the joint posterior delay for more than two light curves. Because of the numerous advantages of our reformulation and the simplicity of its application, we anticipate that our method will find favour not only in the specialised community of RM, but in all fields where cross-correlation analysis is performed. We provide the algorithms and examples of their application as part of our Julia GPCC package. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 13 pages, 16 figures, Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 674, A83 (2023)

arXiv:2212.09161 [pdf, other]

doi 10.1093/mnras/stad286

Modeling photometric reverberation map** data for the next generation of big data surveys. Quasar accretion disks sizes with the LSST

Authors: F. Pozo Nuñez, C. Bruckmann, S. Desamutara, B. Czerny, S. Panda, A. P. Lobban, G. Pietrzyński, K. L. Polsterer

Abstract: Photometric reverberation map** can detect the radial extent of the accretion disc (AD) in Active Galactic Nuclei by measuring the time delays between light curves observed in different continuum bands. Quantifying the constraints on the efficiency and accuracy of the delay measurements is important for recovering the AD size-luminosity relation, and potentially using quasars as standard candles… ▽ More Photometric reverberation map** can detect the radial extent of the accretion disc (AD) in Active Galactic Nuclei by measuring the time delays between light curves observed in different continuum bands. Quantifying the constraints on the efficiency and accuracy of the delay measurements is important for recovering the AD size-luminosity relation, and potentially using quasars as standard candles. We have explored the possibility of determining the AD size of quasars using next-generation Big Data surveys. We focus on the Legacy Survey of Space and Time (LSST) at the Vera C. Rubin Observatory, which will observe several thousand quasars with the Deep Drilling Fields and up to 10 million quasars for the main survey in six broadband filter during its 10-year operational lifetime. We have developed extensive simulations that take into account the characteristics of the LSST survey and the intrinsic properties of the quasars. The simulations are used to characterise the light curves from which AD sizes are determined using various algorithms. We find that the time delays can be recovered with an accuracy of 5 and 15% for light curves with a time sampling of 2 and 5 days, respectively. The results depend strongly on the redshift of the source and the relative contribution of the emission lines to the bandpasses. Assuming an optically thick and geometrically thin AD, the recovered time-delay spectrum is consistent with black hole masses derived with 30% uncertainty. △ Less

Submitted 24 January, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: 19 pages, 17 figures, Accepted for publication in Monthly Notices of the Royal Astronomical Society

arXiv:2109.03619 [pdf, other]

doi 10.1051/0004-6361/202141710

Disentangling the optical AGN and Host-galaxy luminosity with a probabilistic Flux Variation Gradient

Authors: N. Gianniotis, F. Pozo Nuñez, K. L. Polsterer

Abstract: We present a novel Probabilistic Flux Variation Gradient (PFVG) approach to to separate the contributions of active galactic nuclei (AGN) and host galaxies in the context of photometric reverberation map** (PRM) of AGN. We explored the ability of recovering the fractional contribution in a model-independent way using the entire set of light curves obtained through different filters and photometr… ▽ More We present a novel Probabilistic Flux Variation Gradient (PFVG) approach to to separate the contributions of active galactic nuclei (AGN) and host galaxies in the context of photometric reverberation map** (PRM) of AGN. We explored the ability of recovering the fractional contribution in a model-independent way using the entire set of light curves obtained through different filters and photometric apertures simultaneously. The method is based on the observed bluer when brighter phenomenon that is attributed to the superimposition of a two-component structure; the red host galaxy, which is constant in time, and the varying blue AGN. We describe the PFVG mathematical formalism and demonstrate its performance using simulated light curves and available PRM observations. The new probabilistic approach is able to recover host-galaxy fluxes to within 1% precision as long as the light curves do not show a significant contribution from time delays. This represents a significant improvement with respect to previous applications of the traditional FVG method to PRM data. The proposed PFVG provides an efficient and accurate way to separate the AGN and host-galaxy luminosities in PRM monitoring data. The method will be especially helpful in the case of large upcoming photometric survey telescopes such as the public optical/near-infrared Legacy Survey of Space and Time (LSST) at the Vera C. Rubin Observatory. Finally, we have made the algorithms freely available as part of our Julia PFVG package. △ Less

Submitted 28 October, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 657, A126 (2022)

arXiv:2103.03780 [pdf, ps, other]

From Photometric Redshifts to Improved Weather Forecasts: machine learning and proper scoring rules as a basis for interdisciplinary work

Authors: Kai Lars Polsterer, Antonio D'Isanto, Sebastian Lerch

Abstract: The amount, size, and complexity of astronomical data-sets and databases are growing rapidly in the last decades, due to new technologies and dedicated survey telescopes. Besides dealing with poly-structured and complex data, sparse data has become a field of growing scientific interest. A specific field of Astroinformatics research is the estimation of redshifts of extra-galactic sources by using… ▽ More The amount, size, and complexity of astronomical data-sets and databases are growing rapidly in the last decades, due to new technologies and dedicated survey telescopes. Besides dealing with poly-structured and complex data, sparse data has become a field of growing scientific interest. A specific field of Astroinformatics research is the estimation of redshifts of extra-galactic sources by using sparse photometric observations. Many techniques have been developed to produce those estimates with increasing precision. In recent years, models have been favored which instead of providing a point estimate only, are able to generate probabilistic density functions (PDFs) in order to characterize and quantify the uncertainties of their estimates. Crucial to the development of those models is a proper, mathematically principled way to evaluate and characterize their performances, based on scoring functions as well as on tools for assessing calibration. Still, in literature inappropriate methods are being used to express the quality of the estimates that are often not sufficient and can potentially generate misleading interpretations. In this work we summarize how to correctly evaluate errors and forecast quality when dealing with PDFs. We describe the use of the log-likelihood, the continuous ranked probability score (CRPS) and the probability integral transform (PIT) to characterize the calibration as well as the sharpness of predicted PDFs. We present what we achieved when using proper scoring rules to train deep neural networks as well as to evaluate the model estimates and how this work led from well calibrated redshift estimates to improvements in probabilistic weather forecasting. The presented work is an example of interdisciplinarity in data-science and illustrates how methods can help to bridge gaps between different fields of application. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: Presented at ADASS XXX 2020

arXiv:2011.06001 [pdf, other]

doi 10.1051/0004-6361/202038500

Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps

Authors: Rafaël I. J. Mostert, Kenneth J. Duncan, Huub J. A. Röttgering, Kai L. Polsterer, Philip N. Best, Marisa Brienza, Marcus Brüggen, Martin J. Hardcastle, Nika Jurlin, Beatriz Mingo, Raffaella Morganti, Tim Shimwell, Dan Smith, Wendy L. Williams

Abstract: The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radio continuum survey of the Northern sky at an unparalleled resolution and sensitivity. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classific… ▽ More The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radio continuum survey of the Northern sky at an unparalleled resolution and sensitivity. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the $\sim$25k extended radio continuum sources in the LoTSS first data release, which is only $\sim$2 percent of the final LoTSS survey. We made use of \textsc{PINK}, a code which extends the SOM algorithm with rotation and flip** invariance, increasing its suitability and effectiveness for training on astronomical sources. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky ($\sim$5300 square degrees) outside the training data. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: 26 pages; accepted for publication in A&A

Journal ref: A&A 645, A89 (2021)

arXiv:1912.10319 [pdf, other]

doi 10.1093/mnras/stz2830

Optical continuum photometric reverberation map** of the Seyfert-1 galaxy Mrk509

Authors: F. Pozo Nuñez, N. Gianniotis, J. Blex, T. Lisow, R. Chini, K. L. Polsterer, J. -U. Pott, J. Esser, G. Pietrzyński

Abstract: We present the results of a two year optical continuum photometric reverberation map** campaign carried out on the nucleus of the Seyfert-1 galaxy Mrk509. Specially designed narrow-band filters were used in order to mitigate the line and pseudo-continuum contamination of the signal from the broad line region, while allowing for high-accuracy flux-calibration over a large field of view. We obtain… ▽ More We present the results of a two year optical continuum photometric reverberation map** campaign carried out on the nucleus of the Seyfert-1 galaxy Mrk509. Specially designed narrow-band filters were used in order to mitigate the line and pseudo-continuum contamination of the signal from the broad line region, while allowing for high-accuracy flux-calibration over a large field of view. We obtained light curves with a sub-day time sampling and typical flux uncertainties of $1\%$. The high photometric precision allowed us to measure inter-band continuum time delays of up to $\sim 2$ days across the optical range. The time delays are consistent with the relation $τ\propto λ^{4/3}$ predicted for an optically thick and geometrically thin accretion disk model. The size of the disk is, however, a factor of 1.8 larger than predictions based on the standard thin-disk theory. We argue that, for the particular case of Mrk509, a larger black hole mass due to the unknown geometry scaling factor can reconcile the difference between the observations and theory. △ Less

Submitted 21 December, 2019; originally announced December 2019.

Comments: 20 pages, 11 figures, published on Monthly Notices of the Royal Astronomical Society

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 490, Issue 3, December 2019, Pages 3936-3951

arXiv:1904.02876 [pdf, other]

doi 10.1088/1538-3873/ab150b

Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps

Authors: T. J. Galvin, M. Huynh, R. P. Norris, X. R. Wang, E. Hopkins, O. I. Wong, S. Shabala, L. Rudnick, M. J. Alger, K. L. Polsterer

Abstract: With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and intro… ▽ More With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and introduce a proper transfer mechanism via quantile random forest regression. By using parallelized rotation and flip** invariant Kohonen-maps, image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio continuum and WISE infrared all sky surveys are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. We find that these prototypes have reconstructed physically meaningful processes across two channel images at radio and infrared wavelengths in an unsupervised manner. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. These heat-maps have reduced the feature space by a factor of 248 and are able to be used as the basis for subsequent ML methods. Using an ensemble of decision trees we achieve upwards of 85.7% and 80.7% accuracy when predicting the number of components and peaks in an image, respectively, using these heat-maps. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits. △ Less

Submitted 5 April, 2019; originally announced April 2019.

arXiv:1803.10032 [pdf, ps, other]

doi 10.1051/0004-6361/201833103

Return of the features. Efficient feature selection and interpretation for photometric redshifts

Authors: Antonio D'Isanto, Stefano Cavuoti, Fabian Gieseke, Kai Lars Polsterer

Abstract: The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive datasets. Machine learning has proved particularly useful to perform this task. Fully automatized methods have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provi… ▽ More The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive datasets. Machine learning has proved particularly useful to perform this task. Fully automatized methods have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study. We synthetically created 4,520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the SDSS. We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a kNN algorithm, leading to a tree of feature sets. The branches of the tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented. The methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task. △ Less

Submitted 9 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: 21 pages, 11 figures, accepted for publication on A&A, final version after language revision

Journal ref: A&A 616, A97 (2018)

arXiv:1706.02467 [pdf, ps, other]

doi 10.1051/0004-6361/201731326

Photometric redshift estimation via deep learning

Authors: Antonio D'Isanto, Kai Lars Polsterer

Abstract: The need to analyze the available large synoptic multi-band surveys drives the development of new data-analysis methods. Photometric redshift estimation is one field of application where such new methods improved the results, substantially. Up to now, the vast majority of applied redshift estimation methods have utilized photometric features. We aim to develop a method to derive probabilistic phot… ▽ More The need to analyze the available large synoptic multi-band surveys drives the development of new data-analysis methods. Photometric redshift estimation is one field of application where such new methods improved the results, substantially. Up to now, the vast majority of applied redshift estimation methods have utilized photometric features. We aim to develop a method to derive probabilistic photometric redshift directly from multi-band imaging data, rendering pre-classification of objects and feature extraction obsolete. A modified version of a deep convolutional network was combined with a mixture density network. The estimates are expressed as Gaussian mixture models representing the probability density functions (PDFs) in the redshift space. In addition to the traditional scores, the continuous ranked probability score (CRPS) and the probability integral transform (PIT) were applied as performance criteria. We have adopted a feature based random forest and a plain mixture density network to compare performances on experiments with data from SDSS (DR9). We show that the proposed method is able to predict redshift PDFs independently from the type of source, for example galaxies, quasars or stars. Thereby the prediction performance is better than both presented reference methods and is comparable to results from the literature. The presented method is extremely general and allows us to solve of any kind of probabilistic regression problems based on imaging data, for example estimating metallicity or star formation rate of galaxies. This kind of methodology is tremendously important for the next generation of surveys. △ Less

Submitted 8 September, 2017; v1 submitted 8 June, 2017; originally announced June 2017.

Comments: 16 pages, 12 figures, 6 tables. Accepted for publication on A&A

Journal ref: A&A 609, A111 (2018)

arXiv:1608.08016 [pdf, ps, other]

Uncertain Photometric Redshifts

Authors: Kai Lars Polsterer, Antonio D'Isanto, Fabian Gieseke

Abstract: Photometric redshifts play an important role as a measure of distance for various cosmological topics. Spectroscopic redshifts are only available for a very limited number of objects but can be used for creating statistical models. A broad variety of photometric catalogues provide uncertain low resolution spectral information for galaxies and quasars that can be used to infer a redshift. Many diff… ▽ More Photometric redshifts play an important role as a measure of distance for various cosmological topics. Spectroscopic redshifts are only available for a very limited number of objects but can be used for creating statistical models. A broad variety of photometric catalogues provide uncertain low resolution spectral information for galaxies and quasars that can be used to infer a redshift. Many different techniques have been developed to produce those redshift estimates with increasing precision. Instead of providing a point estimate only, astronomers start to generate probabilistic density functions (PDFs) which should provide a characterisation of the uncertainties of the estimation. In this work we present two simple approaches on how to generate those PDFs. We use the example of generating the photometric redshift PDFs of quasars from SDSS(DR7) to validate our approaches and to compare them with point estimates. We do not aim for presenting a new best performing method, but we choose an intuitive approach that is based on well known machine learning algorithms. Furthermore we introduce proper tools for evaluating the performance of PDFs in the context of astronomy. The continuous ranked probability score (CRPS) and the probability integral transform (PIT) are well accepted in the weather forecasting community. Both tools reflect how well the PDFs reproduce the real values of the analysed objects. As we show, nearly all currently used measures in astronomy show severe weaknesses when used to evaluate PDFs. △ Less

Submitted 29 August, 2016; originally announced August 2016.

Comments: 11 pages, 8 figures, submitted to MNRAS on July 12, 2016

arXiv:1606.06094 [pdf, other]

A Spectral Model for Multimodal Redshift Estimation

Authors: Sven D. Kugler, Nikolaos Gianniotis, Kai L. Polsterer

Abstract: We present a physically inspired model for the problem of redshift estimation. Typically, redshift estimation has been treated as a regression problem that takes as input magnitudes and maps them to a single target redshift. In this work we acknowledge the fact that observed magnitudes may actually admit multiple plausible redshifts, i.e. the distribution of redshifts explaining the observed magni… ▽ More We present a physically inspired model for the problem of redshift estimation. Typically, redshift estimation has been treated as a regression problem that takes as input magnitudes and maps them to a single target redshift. In this work we acknowledge the fact that observed magnitudes may actually admit multiple plausible redshifts, i.e. the distribution of redshifts explaining the observed magnitudes (or colours) is multimodal. Hence, employing one of the standard regression models, as is typically done, is insufficient for this kind of problem, as most models implement either one-to-one or many-to-one map**s. The observed multimodality of solutions is a direct consequence of (a) the variety of physical mechanisms that give rise to the observations, (b) the limited number of measurements available and (c) the presence of noise in photometric measurements. Our proposed solution consists in formulating a model from first principles capable of generating spectra. The generated spectra are integrated over filter curves to produce magnitudes which are then matched to the observed magnitudes. The resulting model naturally expresses a multimodal posterior over possible redshifts, includes measurement uncertainty (e.g. missing values) and is shown to perform favourably on a real dataset. △ Less

Submitted 20 June, 2016; originally announced June 2016.

arXiv:1601.05654 [pdf, ps, other]

Model-Coupled Autoencoder for Time Series Visualisation

Authors: Nikolaos Gianniotis, Sven D. Kügler, Peter Tiňo, Kai L. Polsterer

Abstract: We present an approach for the visualisation of a set of time series that combines an echo state network with an autoencoder. For each time series in the dataset we train an echo state network, using a common and fixed reservoir of hidden neurons, and use the optimised readout weights as the new representation. Dimensionality reduction is then performed via an autoencoder on the readout weight rep… ▽ More We present an approach for the visualisation of a set of time series that combines an echo state network with an autoencoder. For each time series in the dataset we train an echo state network, using a common and fixed reservoir of hidden neurons, and use the optimised readout weights as the new representation. Dimensionality reduction is then performed via an autoencoder on the readout weight representations. The crux of the work is to equip the autoencoder with a loss function that correctly interprets the reconstructed readout weights by associating them with a reconstruction error measured in the data space of sequences. This essentially amounts to measuring the predictive performance that the reconstructed readout weights exhibit on their corresponding sequences when plugged back into the echo state network with the same fixed reservoir. We demonstrate that the proposed visualisation framework can deal both with real valued sequences as well as binary sequences. We derive magnification factors in order to analyse distance preservations and distortions in the visualisation space. The versatility and advantages of the proposed method are demonstrated on datasets of time series that originate from diverse domains. △ Less

Submitted 21 January, 2016; originally announced January 2016.

arXiv:1508.03482 [pdf, other]

doi 10.1093/mnras/stv2604

An Explorative Approach for Inspecting Kepler Data

Authors: S. D. Kügler, N. Gianniotis, K. L. Polsterer

Abstract: The Kepler survey has provided a wealth of astrophysical knowledge by continuously monitoring over 150,000 stars. The resulting database contains thousands of examples of known variability types and at least as many that cannot be classified yet. In order to reveal the knowledge hidden in the database, we introduce a new visualisation method that allows us to inspect time series exploratively. To… ▽ More The Kepler survey has provided a wealth of astrophysical knowledge by continuously monitoring over 150,000 stars. The resulting database contains thousands of examples of known variability types and at least as many that cannot be classified yet. In order to reveal the knowledge hidden in the database, we introduce a new visualisation method that allows us to inspect time series exploratively. To that end, we propose dimensionality reduction on the parameters of a model capable of representing time series as fixed-length vector representation. We show that a more refined objective function can be chosen by minimising the prediction error of the data reconstruction instead of the reconstruction of the model parameters. The proposed visualisation exhibits a strong correlation between the variability behaviour of the light curves and their physical properties. As a consequence, temperature and surface gravity can, for some stars, be directly inferred from non- (or quasi-) periodic light curves. △ Less

Submitted 4 November, 2015; v1 submitted 14 August, 2015; originally announced August 2015.

Comments: 7 pages, 8 figures, accepted for publication in MNRAS

arXiv:1504.04455 [pdf, other]

Featureless Classification of Light Curves

Authors: Sven Dennis Kügler, Nikos Gianniotis, Kai Lars Polsterer

Abstract: In the era of rapidly increasing amounts of time series data, classification of variable objects has become the main objective of time-domain astronomy. Classification of irregularly sampled time series is particularly difficult because the data cannot be represented naturally as a vector which can be directly fed into a classifier. In the literature, various statistical features serve as vector r… ▽ More In the era of rapidly increasing amounts of time series data, classification of variable objects has become the main objective of time-domain astronomy. Classification of irregularly sampled time series is particularly difficult because the data cannot be represented naturally as a vector which can be directly fed into a classifier. In the literature, various statistical features serve as vector representations. In this work, we represent time series by a density model. The density model captures all the information available, including measurement errors. Hence, we view this model as a generalisation to the static features which directly can be derived, e.g., as moments from the density. Similarity between each pair of time series is quantified by the distance between their respective models. Classification is performed on the obtained distance matrix. In the numerical experiments, we use data from the OGLE and ASAS surveys and demonstrate that the proposed representation performs up to par with the best cur- rently used feature-based approaches. The density representation preserves all static information present in the observational data, in contrast to a less complete description by features. The density representation is an upper boundary in terms of information made available to the classifier. Consequently, the predictive power of the proposed classification depends on the choice of similarity measure and classifier, only. Due to its principled nature, we advocate that this new approach of representing time series has potential in tasks beyond classification, e.g., unsupervised learning. △ Less

Submitted 20 May, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

Comments: Accepted for publication in MNRAS

arXiv:1210.7071 [pdf, ps, other]

doi 10.1093/mnras/sts017

Finding New High-Redshift Quasars by Asking the Neighbours

Authors: Kai Lars Polsterer, Peter-Christian Zinn, Fabian Gieseke

Abstract: Quasars with a high redshift (z) are important to understand the evolution processes of galaxies in the early universe. However only a few of these distant objects are known to this date. The costs of building and operating a 10-metre class telescope limit the number of facilities and, thus, the available observation time. Therefore an efficient selection of candidates is mandatory. This paper pre… ▽ More Quasars with a high redshift (z) are important to understand the evolution processes of galaxies in the early universe. However only a few of these distant objects are known to this date. The costs of building and operating a 10-metre class telescope limit the number of facilities and, thus, the available observation time. Therefore an efficient selection of candidates is mandatory. This paper presents a new approach to select quasar candidates with high redshift (z>4.8) based on photometric catalogues. We have chosen to use the z>4.8 limit for our approach because the dominant Lyman alpha emission line of a quasar can only be found in the Sloan i and z-band filters. As part of the candidate selection approach, a photometric redshift estimator is presented, too. Three of the 120,000 generated candidates have been spectroscopically analysed in follow-up observations and a new z=5.0 quasar was found. This result is consistent with the estimated detection ratio of about 50 per cent and we expect 60,000 high-redshift quasars to be part of our candidate sample. The created candidates are available for download at MNRAS or at http://www.astro.rub.de/polsterer/quasar-candidates.csv. △ Less

Submitted 26 October, 2012; originally announced October 2012.

Comments: 10 pages, 9 figures, accepted (MNRAS)

arXiv:1108.4696 [pdf, other]

doi 10.1109/ICMLA.2010.59

Detecting Quasars in Large-Scale Astronomical Surveys

Authors: Fabian Gieseke, Kai Lars Polsterer, Andreas Thom, Peter-Christian Zinn, Dominik Bomanns, Ralf-Jürgen Dettmar, Oliver Kramer, Jan Vahrenhold

Abstract: We present a classification-based approach to identify quasi-stellar radio sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance on a manually labeled training set. While reasonable results can already be obtained via approaches working only on photometric data, our experiments indicate that simple but problem-specific features extracted from spectroscopic data can signifi… ▽ More We present a classification-based approach to identify quasi-stellar radio sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance on a manually labeled training set. While reasonable results can already be obtained via approaches working only on photometric data, our experiments indicate that simple but problem-specific features extracted from spectroscopic data can significantly improve the classification performance. Since our approach works orthogonal to existing classification schemes used for building the spectroscopic catalogs, our classification results are well suited for a mutual assessment of the approaches' accuracies. △ Less

Submitted 23 August, 2011; originally announced August 2011.

Comments: 6 pages, 8 figures, published in proceedings of 2010 Ninth International Conference on Machine Learning and Applications (ICMLA) of the IEEE

Journal ref: 2010 Ninth International Conference on Machine Learning and Applications, 2010, pp.352-357

Showing 1–17 of 17 results for author: Polsterer, K L