-
Augmenting machine learning photometric redshifts with Gaussian mixture models
Authors:
P. W. Hatfield,
I. A. Almosallam,
M. J. Jarvis,
N. Adams,
R. A. A. Bowler,
Z. Gomes,
S. J. Roberts,
C. Schreiber
Abstract:
Wide-area imaging surveys are one of the key ways of advancing our understanding of cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies - almost all of which must be derived from photometry rather than spectroscopy. In this paper…
▽ More
Wide-area imaging surveys are one of the key ways of advancing our understanding of cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies - almost all of which must be derived from photometry rather than spectroscopy. In this paper we investigate how using statistical models to understand the populations that make up the colour-magnitude distribution of galaxies can be combined with machine learning photometric redshift codes to improve redshift estimates. In particular we combine the use of Gaussian Mixture Models with the high performing machine learning photo-z algorithm GPz and show that modelling and accounting for the different colour-magnitude distributions of training and test data separately can give improved redshift estimates, reduce the bias on estimates by up to a half, and speed up the run-time of the algorithm. These methods are illustrated using data from deep optical and near infrared data in two separate deep fields, where training and test data of different colour-magnitude distributions are constructed from the galaxies with known spectroscopic redshifts, derived from several heterogeneous surveys.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Improving Photometric Redshift Estimation using GPz: size information, post processing and improved photometry
Authors:
Zahra Gomes,
Matt J. Jarvis,
Ibrahim A. Almosallam,
Stephen J. Roberts
Abstract:
The next generation of large scale imaging surveys (such as those conducted with the Large Synoptic Survey Telescope and Euclid) will require accurate photometric redshifts in order to optimally extract cosmological information. Gaussian Processes for photometric redshift estimation (GPz) is a promising new method that has been proven to provide efficient, accurate photometric redshift estimations…
▽ More
The next generation of large scale imaging surveys (such as those conducted with the Large Synoptic Survey Telescope and Euclid) will require accurate photometric redshifts in order to optimally extract cosmological information. Gaussian Processes for photometric redshift estimation (GPz) is a promising new method that has been proven to provide efficient, accurate photometric redshift estimations with reliable variance predictions. In this paper, we investigate a number of methods for improving the photometric redshift estimations obtained using GPz (but which are also applicable to others). We use spectroscopy from the Galaxy and Mass Assembly Data Release 2 with a limiting magnitude of r<19.4 along with corresponding Sloan Digital Sky Survey visible (ugriz) photometry and the UKIRT Infrared Deep Sky Survey Large Area Survey near-IR (YJHK) photometry. We evaluate the effects of adding near-IR magnitudes and angular size as features for the training, validation and testing of GPz and find that these improve the accuracy of the results by ~15-20 per cent. In addition, we explore a post-processing method of shifting the probability distributions of the estimated redshifts based on their Quantile-Quantile plots and find that it improves the bias by ~40 per cent. Finally, we investigate the effects of using more precise photometry obtained from the Hyper Suprime-Cam Subaru Strategic Program Data Release 1 and find that it produces significant improvements in accuracy, similar to the effect of including additional features.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
A Framework for Assessing the Performance of Pulsar Search Pipelines
Authors:
E. van Heerden,
A. Karastergiou,
S. J. Roberts
Abstract:
In this paper, we present a framework for assessing the effect of non-stationary Gaussian noise and radio frequency interference (RFI) on the signal to noise ratio, the number of false positives detected per true positive and the sensitivity of standard pulsar search pipelines. The results highlight the necessity to develop algorithms that are able to identify and remove non-stationary variations…
▽ More
In this paper, we present a framework for assessing the effect of non-stationary Gaussian noise and radio frequency interference (RFI) on the signal to noise ratio, the number of false positives detected per true positive and the sensitivity of standard pulsar search pipelines. The results highlight the necessity to develop algorithms that are able to identify and remove non-stationary variations from the data before RFI excision and searching is performed in order to limit false positive detections. The results also show that the spectrum whitening algorithms currently employed, severely affect the efficiency of pulsar search pipelines by reducing their sensitivity to long period pulsars.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
GPz: Non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts
Authors:
Ibrahim A. Almosallam,
Matt J. Jarvis,
Stephen J. Roberts
Abstract:
The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucia…
▽ More
The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucial, as they can be used to optimize the sample of galaxies for the specific experiment (e.g. weak lensing, baryon acoustic oscillations, supernovae), trading off between completeness and reliability in the galaxy sample. The various sources of uncertainty in measurements of the photometry and redshifts put a lower bound on the accuracy that any model can hope to achieve. The intrinsic uncertainty associated with estimates is often non-uniform and input-dependent, commonly known in statistics as heteroscedastic noise. However, existing approaches are susceptible to outliers and do not take into account variance induced by non-uniform data density and in most cases require manual tuning of many parameters. In this paper, we present a Bayesian machine learning approach that jointly optimizes the model with respect to both the predictive mean and variance we refer to as Gaussian processes for photometric redshifts (GPz). The predictive variance of the model takes into account both the variance due to data density and photometric noise. Using the SDSS DR12 data, we show that our approach substantially outperforms other machine learning methods for photo-z estimation and their associated variance, such as TPZ and ANNz2. We provide a Matlab and Python implementations that are available to download at https://github.com/OxfordML/GPz .
△ Less
Submitted 16 June, 2016; v1 submitted 12 April, 2016;
originally announced April 2016.
-
Emission-rotation correlation in pulsars: new discoveries with optimal techniques
Authors:
P. R. Brook,
A. Karastergiou,
S. Johnston,
M. Kerr,
R. M. Shannon,
S. J. Roberts
Abstract:
Pulsars are known to display short-term variability. Recently, examples of longer-term emission variability have emerged that are often correlated with changes in the rotational properties of the pulsar. To further illuminate this relationship, we have developed techniques to identify emission and rotation variability in pulsar data, and determine correlations between the two. Individual observati…
▽ More
Pulsars are known to display short-term variability. Recently, examples of longer-term emission variability have emerged that are often correlated with changes in the rotational properties of the pulsar. To further illuminate this relationship, we have developed techniques to identify emission and rotation variability in pulsar data, and determine correlations between the two. Individual observations may be too noisy to identify subtle changes in the pulse profile. We use Gaussian process (GP) regression to model noisy observations and produce a continuous map of pulse profile variability. Generally, multiple observing epochs are required to obtain the pulsar spin frequency derivative ($\dotν$). GP regression is, therefore, also used to obtain $\dotν$, under the hypothesis that pulsar timing noise is primarily caused by unmodelled changes in $\dotν$. Our techniques distinguish between two types of variability: changes in the total flux density versus changes in the pulse shape. We have applied these techniques to 168 pulsars observed by the Parkes radio telescope, and see that although variations in flux density are ubiquitous, substantial changes in the shape of the pulse profile are rare. We reproduce previously published results and present examples of profile shape changing in seven pulsars; in particular, a clear new example of correlated changes in profile shape and rotation is found in PSR~J1602$-$5100. In the shape changing pulsars, a more complex picture than the previously proposed two state model emerges. We conclude that our simple assumption that all timing noise can be interpreted as $\dotν$ variability is insufficient to explain our dataset.
△ Less
Submitted 18 November, 2015; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Ghost in the time series: no planet for Alpha Cen B
Authors:
Vinesh Rajpaul,
Suzanne Aigrain,
Stephen J. Roberts
Abstract:
We re-analyse the publicly available radial velocity (RV) measurements for Alpha Cen B, a star hosting an Earth-mass planet candidate, Alpha Cen Bb, with 3.24 day orbital period. We demonstrate that the 3.24 d signal observed in the Alpha Cen B data almost certainly arises from the window function (time sampling) of the original data. We show that when stellar activity signals are removed from the…
▽ More
We re-analyse the publicly available radial velocity (RV) measurements for Alpha Cen B, a star hosting an Earth-mass planet candidate, Alpha Cen Bb, with 3.24 day orbital period. We demonstrate that the 3.24 d signal observed in the Alpha Cen B data almost certainly arises from the window function (time sampling) of the original data. We show that when stellar activity signals are removed from the RV variations, other significant peaks in the power spectrum of the window function are coincidentally suppressed, leaving behind a spurious yet apparently-significant 'ghost' of a signal that was present in the window function's power spectrum to begin with. Even when fitting synthetic data with time sampling identical to the original data, but devoid of any genuine periodicities close to that of the planet candidate, the original model used to infer the presence of Alpha Cen Bb leads to identical conclusions: viz., the 3$σ$ detection of a half-a-metre-per-second signal with 3.236 day period. Our analysis underscores the difficulty of detecting weak planetary signals in RV data, and the importance of understanding in detail how every component of an RV data set, including its time sampling, influences final statistical inference.
△ Less
Submitted 19 October, 2015;
originally announced October 2015.
-
A Gaussian process framework for modelling stellar activity signals in radial velocity data
Authors:
Vinesh Rajpaul,
Suzanne Aigrain,
Michael A. Osborne,
Steven Reece,
Stephen J. Roberts
Abstract:
To date, the radial velocity (RV) method has been one of the most productive techniques for detecting and confirming extrasolar planetary candidates. Unfortunately, stellar activity can induce RV variations which can drown out or even mimic planetary signals - and it is notoriously difficult to model and thus mitigate the effects of these activity-induced nuisance signals. This is expected to be a…
▽ More
To date, the radial velocity (RV) method has been one of the most productive techniques for detecting and confirming extrasolar planetary candidates. Unfortunately, stellar activity can induce RV variations which can drown out or even mimic planetary signals - and it is notoriously difficult to model and thus mitigate the effects of these activity-induced nuisance signals. This is expected to be a major obstacle to using next-generation spectrographs to detect lower mass planets, planets with longer periods, and planets around more active stars. Enter Gaussian processes (GPs) which, we note, have a number of attractive features that make them very well suited to disentangling stellar activity signals from planetary signals. We present here a GP framework we developed to model RV time series jointly with ancillary activity indicators (e.g. bisector velocity spans, line widths, chromospheric activity indices), allowing the activity component of RV time series to be constrained and disentangled from e.g. planetary components. We discuss the mathematical details of our GP framework, and present results illustrating its encouraging performance on both synthetic and real RV datasets, including the publicly-available Alpha Centauri B dataset.
△ Less
Submitted 24 June, 2015;
originally announced June 2015.
-
A Sparse Gaussian Process Framework for Photometric Redshift Estimation
Authors:
Ibrahim A. Almosallam,
Sam N. Lindsay,
Matt J. Jarvis,
Stephen J. Roberts
Abstract:
Accurate photometric redshifts are a lynchpin for many future experiments to pin down the cosmological model and for studies of galaxy evolution. In this study, a novel sparse regression framework for photometric redshift estimation is presented. Simulated and real data from SDSS DR12 were used to train and test the proposed models. We show that approaches which include careful data preparation an…
▽ More
Accurate photometric redshifts are a lynchpin for many future experiments to pin down the cosmological model and for studies of galaxy evolution. In this study, a novel sparse regression framework for photometric redshift estimation is presented. Simulated and real data from SDSS DR12 were used to train and test the proposed models. We show that approaches which include careful data preparation and model design offer a significant improvement in comparison with several competing machine learning algorithms. Standard implementations of most regression algorithms have as the objective the minimization of the sum of squared errors. For redshift inference, however, this induces a bias in the posterior mean of the output distribution, which can be problematic. In this paper we directly target minimizing $Δz = (z_\textrm{s} - z_\textrm{p})/(1+z_\textrm{s})$ and address the bias problem via a distribution-based weighting scheme, incorporated as part of the optimization objective. The results are compared with other machine learning algorithms in the field such as Artificial Neural Networks (ANN), Gaussian Processes (GPs) and sparse GPs. The proposed framework reaches a mean absolute $Δz = 0.0026(1+z_\textrm{s})$, over the redshift range of $0 \le z_\textrm{s} \le 2$ on the simulated data, and $Δz = 0.0178(1+z_\textrm{s})$ over the entire redshift range on the SDSS DR12 survey, outperforming the standard ANNz used in the literature. We also investigate how the relative size of the training set affects the photometric redshift accuracy. We find that a training set of \textgreater 30 per cent of total sample size, provides little additional constraint on the photometric redshifts, and note that our GP formalism strongly outperforms ANNz in the sparse data regime for the simulated data set.
△ Less
Submitted 19 October, 2015; v1 submitted 20 May, 2015;
originally announced May 2015.
-
Precise time-series photometry for the Kepler-2.0 mission
Authors:
Suzanne Aigrain,
Simon T. Hodgkin,
Michael J. Irwin,
Jim R. Lewis,
Stephen J. Roberts
Abstract:
The recently approved NASA K2 mission has the potential to multiply by an order of magnitude the number of short-period transiting planets found by Kepler around bright and low-mass stars, and to revolutionise our understanding of stellar variability in open clusters. However, the data processing is made more challenging by the reduced pointing accuracy of the satellite, which has only two functio…
▽ More
The recently approved NASA K2 mission has the potential to multiply by an order of magnitude the number of short-period transiting planets found by Kepler around bright and low-mass stars, and to revolutionise our understanding of stellar variability in open clusters. However, the data processing is made more challenging by the reduced pointing accuracy of the satellite, which has only two functioning reaction wheels. We present a new method to extract precise light curves from K2 data, combining list-driven, soft-edged aperture photometry with a star-by-star correction of systematic effects associated with the drift in the roll-angle of the satellite about its boresight. The systematics are modelled simultaneously with the stars' intrinsic variability using a semi-parametric Gaussian process model. We test this method on a week of data collected during an engineering test in January 2014, perform checks to verify that our method does not alter intrinsic variability signals, and compute the precision as a function of magnitude on long-cadence (30-min) and planetary transit (2.5-hour) timescales. In both cases, we reach photometric precisions close to the precision reached during the nominal Kepler mission for stars fainter than 12th magnitude, and between 40 and 80 parts per million for brighter stars. These results confirm the bright prospects for planet detection and characterisation, asteroseismology and stellar variability studies with K2. Finally, we perform a basic transit search on the light curves, detecting 2 bona fide transit-like events, 7 detached eclipsing binaries and 13 classical variables.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Evidence of an asteroid encountering a pulsar
Authors:
P. R. Brook,
A. Karastergiou,
S. Buchner,
S. J. Roberts,
M. J. Keith,
S. Johnston,
R. M. Shannon
Abstract:
Debris disks and asteroid belts are expected to form around young pulsars due to fallback material from their original supernova explosions. Disk material may migrate inwards and interact with a pulsar's magnetosphere, causing changes in torque and emission. Long term monitoring of PSR J0738-4042 reveals both effects. The pulse shape changes multiple times between 1988 and 2012. The torque, inferr…
▽ More
Debris disks and asteroid belts are expected to form around young pulsars due to fallback material from their original supernova explosions. Disk material may migrate inwards and interact with a pulsar's magnetosphere, causing changes in torque and emission. Long term monitoring of PSR J0738-4042 reveals both effects. The pulse shape changes multiple times between 1988 and 2012. The torque, inferred via the derivative of the rotational period, changes abruptly from September 2005. This change is accompanied by an emergent radio component that drifts with respect to the rest of the pulse. No known intrinsic pulsar processes can explain these timing and radio emission signatures. The data lead us to postulate that we are witnessing an encounter with an asteroid or in-falling debris from a disk.
△ Less
Submitted 14 November, 2013;
originally announced November 2013.
-
A transient component in the pulse profile of PSR J0738-4042
Authors:
Aris Karastergiou,
Steve J. Roberts,
Simon Johnston,
Hyoung-joo Lee,
Patrick Weltevrede,
Michael Kramer
Abstract:
One of the tenets of the radio pulsar observational picture is that the integrated pulse profiles are constant with time. This assumption underpins much of the fantastic science made possible via pulsar timing. Over the past few years, however, this assumption has come under question with a number of pulsars showing pulse shape changes on a range of timescales. Here, we show the dramatic appearanc…
▽ More
One of the tenets of the radio pulsar observational picture is that the integrated pulse profiles are constant with time. This assumption underpins much of the fantastic science made possible via pulsar timing. Over the past few years, however, this assumption has come under question with a number of pulsars showing pulse shape changes on a range of timescales. Here, we show the dramatic appearance of a bright component in the pulse profile of PSR J0738-4042 (B0736-40). The component arises on the leading edge of the profile. It was not present in 2004 but strongly present in 2006 and all observations thereafter. A subsequent search through the literature shows the additional component varies in flux density over timescales of decades. We show that the polarization properties of the transient component are consistent with the picture of competing orthogonal polarization modes. Faced with the general problem of identifying and characterising average profile changes, we outline and apply a statistical technique based on a Hidden Markov Model. The value of this technique is established through simulations, and is shown to work successfully in the case of low signal-to-noise profiles.
△ Less
Submitted 11 March, 2011;
originally announced March 2011.