-
Unsupervised Star Galaxy Classification with Cascade Variational Auto-Encoder
Authors:
Hao Sun,
Jiadong Guo,
Edward J. Kim,
Robert J. Brunner
Abstract:
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled data set. In this work, we propose a novel unsupervised approach for the star-galaxy recognition task, namely Cascade Variational Auto-Encoder (CasVAE…
▽ More
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled data set. In this work, we propose a novel unsupervised approach for the star-galaxy recognition task, namely Cascade Variational Auto-Encoder (CasVAE). Our empirical results show our method outperforms the baseline model in both accuracy and stability.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Extended Isolation Forest
Authors:
Sahand Hariri,
Matias Carrasco Kind,
Robert J. Brunner
Abstract:
We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this proble…
▽ More
We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF.
△ Less
Submitted 8 July, 2020; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Vizic: A Jupyter-based Interactive Visualization Tool for Astronomical Catalogs
Authors:
W. Yu,
M. Carrasco Kind,
R. J. Brunner
Abstract:
The ever-growing datasets in observational astronomy have challenged scientists in many aspects, including an efficient and interactive data exploration and visualization. Many tools have been developed to confront this challenge. However, they usually focus on displaying the actual images or focus on visualizing patterns within catalogs in a predefined way. In this paper we introduce Vizic, a Pyt…
▽ More
The ever-growing datasets in observational astronomy have challenged scientists in many aspects, including an efficient and interactive data exploration and visualization. Many tools have been developed to confront this challenge. However, they usually focus on displaying the actual images or focus on visualizing patterns within catalogs in a predefined way. In this paper we introduce Vizic, a Python visualization library that builds the connection between images and catalogs through an interactive map of the sky region. Vizic visualizes catalog data over a custom background canvas using the shape, size and orientation of each object in the catalog. The displayed objects in the map are highly interactive and customizable comparing to those in the images. These objects can be filtered by or colored by their properties, such as redshift and magnitude. They also can be sub-selected using a lasso-like tool for further analysis using standard Python functions from inside a Jupyter notebook. Furthermore, Vizic allows custom overlays to be appended dynamically on top of the sky map. We have initially implemented several overlays, namely, Voronoi, Delaunay, Minimum Spanning Tree and HEALPix grid layers, which are helpful for visualizing large-scale structure. All these overlays can be generated, added or removed interactively with one line of code. The catalog data is stored in a non-relational database, and the interfaces were developed in JavaScript and Python to work within Jupyter Notebook, which allows to create custom widgets, user generated scripts to analyze and plot the data selected/displayed in the interactive map. This unique design makes Vizic a very powerful and flexible interactive analysis tool. Vizic can be adopted in variety of exercises, for example, data inspection, clustering analysis, galaxy alignment studies, outlier identification or simply large-scale visualizations.
△ Less
Submitted 6 May, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Star-galaxy Classification Using Deep Convolutional Neural Networks
Authors:
Edward J. Kim,
Robert J. Brunner
Abstract:
Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework tha…
▽ More
Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework that uses deep convolutional neural networks (ConvNets) directly on the reduced, calibrated pixel values. Using data from the Sloan Digital Sky Survey (SDSS) and the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS), we demonstrate that ConvNets are able to produce accurate and well-calibrated probabilistic classifications that are competitive with conventional machine learning techniques. Future advances in deep learning may bring more success with current and forthcoming photometric surveys, such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope (LSST), because deep neural networks require very little, manual feature engineering.
△ Less
Submitted 13 October, 2016; v1 submitted 15 August, 2016;
originally announced August 2016.
-
Teaching Data Science
Authors:
Robert J. Brunner,
Edward J. Kim
Abstract:
We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming exper…
▽ More
We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming experience. This introductory course was designed to cover a wide range of topics, from the nature of data, to storage, to visualization, to probability and statistical analysis, to cloud and high performance computing, without becoming overly focused on any one subject. We conclude this article with a discussion of lessons learned and our plans to develop new data science courses.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
Galaxy clustering with photometric surveys using PDF redshift information
Authors:
J. Asorey,
M. Carrasco Kind,
I. Sevilla-Noarbe,
R. J. Brunner,
J. Thaler
Abstract:
Photometric surveys produce large-area maps of the galaxy distribution, but with less accurate redshift information than is obtained from spectroscopic methods. Modern photometric redshift (photo-z) algorithms use galaxy magnitudes, or colors, that are obtained through multi-band imaging to produce a probability density function (PDF) for each galaxy in the map. We used simulated data to study the…
▽ More
Photometric surveys produce large-area maps of the galaxy distribution, but with less accurate redshift information than is obtained from spectroscopic methods. Modern photometric redshift (photo-z) algorithms use galaxy magnitudes, or colors, that are obtained through multi-band imaging to produce a probability density function (PDF) for each galaxy in the map. We used simulated data to study the effect of using different photo-z estimators to assign galaxies to redshift bins in order to compare their effects on angular clustering and galaxy bias measurements. We found that if we use the entire PDF, rather than a single-point (mean or mode) estimate, the deviations are less biased, especially when using narrow redshift bins. When the redshift bin widths are $Δz=0.1$, the use of the entire PDF reduces the typical measurement bias from 5%, when using single point estimates, to 3%.
△ Less
Submitted 13 June, 2016; v1 submitted 3 January, 2016;
originally announced January 2016.
-
The Dark Energy Survey: more than dark energy - an overview
Authors:
Dark Energy Survey Collaboration,
T. Abbott,
F. B. Abdalla,
J. Aleksic,
S. Allam,
A. Amara,
D. Bacon,
E. Balbinot,
M. Banerji,
K. Bechtol,
A. Benoit-Levy,
G. M. Bernstein,
E. Bertin,
J. Blazek,
C. Bonnett,
S. Bridle,
D. Brooks,
R. J. Brunner,
E. Buckley-Geer,
D. L. Burke,
G. B. Caminha,
D. Capozzi,
J. Carlsen,
A. Carnero-Rosell,
M. Carollo
, et al. (116 additional authors not shown)
Abstract:
This overview article describes the legacy prospect and discovery potential of the Dark Energy Survey (DES) beyond cosmological studies, illustrating it with examples from the DES early data. DES is using a wide-field camera (DECam) on the 4m Blanco Telescope in Chile to image 5000 sq deg of the sky in five filters (grizY). By its completion the survey is expected to have generated a catalogue of…
▽ More
This overview article describes the legacy prospect and discovery potential of the Dark Energy Survey (DES) beyond cosmological studies, illustrating it with examples from the DES early data. DES is using a wide-field camera (DECam) on the 4m Blanco Telescope in Chile to image 5000 sq deg of the sky in five filters (grizY). By its completion the survey is expected to have generated a catalogue of 300 million galaxies with photometric redshifts and 100 million stars. In addition, a time-domain survey search over 27 sq deg is expected to yield a sample of thousands of Type Ia supernovae and other transients. The main goals of DES are to characterise dark energy and dark matter, and to test alternative models of gravity; these goals will be pursued by studying large scale structure, cluster counts, weak gravitational lensing and Type Ia supernovae. However, DES also provides a rich data set which allows us to study many other aspects of astrophysics. In this paper we focus on additional science with DES, emphasizing areas where the survey makes a difference with respect to other current surveys. The paper illustrates, using early data (from `Science Verification', and from the first, second and third seasons of observations), what DES can tell us about the solar system, the Milky Way, galaxy evolution, quasars, and other topics. In addition, we show that if the cosmological model is assumed to be Lambda+ Cold Dark Matter (LCDM) then important astrophysics can be deduced from the primary DES probes. Highlights from DES early data include the discovery of 34 Trans Neptunian Objects, 17 dwarf satellites of the Milky Way, one published z > 6 quasar (and more confirmed) and two published superluminous supernovae (and more confirmed).
△ Less
Submitted 19 August, 2016; v1 submitted 3 January, 2016;
originally announced January 2016.
-
Observation and Confirmation of Six Strong Lensing Systems in The Dark Energy Survey Science Verification Data
Authors:
B. Nord,
E. Buckley-Geer,
H. Lin,
H. T. Diehl,
J. Helsby,
N. Kuropatkin,
A. Amara,
T. Collett,
S. Allam,
G. Caminha,
C. De Bom,
S. Desai,
H. Dúmet-Montoya,
M. Elidaiana da S. Pereira,
D. A. Finley,
B. Flaugher,
C. Furlanetto,
H. Gaitsch,
M. Gill,
K. W. Merritt,
A. More,
D. Tucker,
E. S. Rykoff,
E. Rozo,
F. B. Abdalla
, et al. (67 additional authors not shown)
Abstract:
We report the observation and confirmation of the first group- and cluster-scale strong gravitational lensing systems found in Dark Energy Survey (DES) data. Through visual inspection of data from the Science Verification (SV) season, we identified 53 candidate systems. We then obtained spectroscopic follow-up of 21 candidates using the Gemini Multi-Object Spectrograph (GMOS) at the Gemini South t…
▽ More
We report the observation and confirmation of the first group- and cluster-scale strong gravitational lensing systems found in Dark Energy Survey (DES) data. Through visual inspection of data from the Science Verification (SV) season, we identified 53 candidate systems. We then obtained spectroscopic follow-up of 21 candidates using the Gemini Multi-Object Spectrograph (GMOS) at the Gemini South telescope and the Inamori-Magellan Areal Camera and Spectrograph (IMACS) at the Magellan/Baade telescope. With this follow-up, we confirmed six candidates as gravitational lenses: Three of the systems are newly discovered, and the remaining three were previously known. Of the 21 observed candidates, the remaining 15 were either not detected in spectroscopic observations, were observed and did not exhibit continuum emission (or spectral features), or were ruled out as lensing systems. The confirmed sample consists of one group-scale and five galaxy cluster-scale lenses. The lensed sources range in redshift z ~ 0.80-3.2, and in i-band surface brightness i_{SB} ~ 23-25 mag/sq.-arcsec. (2" aperture). For each of the six systems, we estimate the Einstein radius and the enclosed mass, which have ranges ~ 5.0 - 8.6" and ~ 7.5 x 10^{12} - 6.4 x 10^{13} solar masses, respectively.
△ Less
Submitted 9 December, 2015;
originally announced December 2015.
-
Creating updated, scientifically-calibrated mosaic images for the RC3 catalogue
Authors:
Jung Lin Lee,
Robert J. Brunner
Abstract:
The Third Reference Catalogue of Bright Galaxies (RC3) is a reasonably complete listing of 23,011 nearby, large, bright galaxies. By using the final imaging data release from the Sloan Digital Sky Survey, we generate scientifically-calibrated FITS mosaics by using the montage program for all SDSS imaging bands for all RC3 galaxies that lie within the survey footprint. We further combine the SDSS g…
▽ More
The Third Reference Catalogue of Bright Galaxies (RC3) is a reasonably complete listing of 23,011 nearby, large, bright galaxies. By using the final imaging data release from the Sloan Digital Sky Survey, we generate scientifically-calibrated FITS mosaics by using the montage program for all SDSS imaging bands for all RC3 galaxies that lie within the survey footprint. We further combine the SDSS g, r, and i band FITS mosaics for these galaxies to create color-composite images by using the STIFF program. We generalized this software framework to make FITS mosaics and color-composite images for an arbitrary catalog and imaging data set. Due to positional inaccuracies inherent in the RC3 catalog, we employ a recursive algorithm in our mosaicking pipeline that first determines the correct location for each galaxy, and subsequently applies the mosaicking procedure. As an additional test of this new software pipeline and to obtain mosaic images of a larger sample of RC3 galaxies, we also applied this pipeline to photographic data taken by the Second Palomar Observatory Sky Survey with $B_J$, $R_F$, and $I_N$ plates. We publicly release all generated data, accessible via a web search form, and the software pipeline to enable others to make galaxy mosaics by using other catalogs or surveys.
△ Less
Submitted 3 December, 2015;
originally announced December 2015.
-
Machine Learning and Cosmological Simulations II: Hydrodynamical Simulations
Authors:
Harshil M. Kamdar,
Matthew J. Turk,
Robert J. Brunner
Abstract:
We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorit…
▽ More
We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically and physically robust and follows the same fundamental observational constraints. Machine learning offers an intriguing and promising technique to create quick mock galaxy catalogs in the future.
△ Less
Submitted 12 January, 2016; v1 submitted 26 October, 2015;
originally announced October 2015.
-
Machine Learning and Cosmological Simulations I: Semi-Analytical Models
Authors:
Harshil M. Kamdar,
Matthew J. Turk,
Robert J. Brunner
Abstract:
We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use th…
▽ More
We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated machine learning algorithms (k-Nearest Neighbors, decision trees, random forests and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of $M>10^{12} M_{\odot}$ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.
△ Less
Submitted 21 October, 2015;
originally announced October 2015.
-
Galaxy clustering, photometric redshifts and diagnosis of systematics in the DES Science Verification data
Authors:
M. Crocce,
J. Carretero,
A. H. Bauer,
A. J. Ross,
I. Sevilla-Noarbe,
T. Giannantonio,
F. Sobreira,
J. Sanchez,
E. Gaztanaga,
M. Carrasco Kind,
C. Sanchez,
C. Bonnett,
A. Benoit-Levy,
R. J. Brunner,
A. Carnero Rosell,
R. Cawthon,
P. Fosalba,
W. Hartley,
E. J. Kim,
B. Leistedt,
R. Miquel,
H. V. Peiris,
W. J. Percival,
R. Rosenfeld,
E. S. Rykoff
, et al. (62 additional authors not shown)
Abstract:
We study the clustering of galaxies detected at $i<22.5$ in the Science Verification observations of the Dark Energy Survey (DES). Two-point correlation functions are measured using $2.3\times 10^6$ galaxies over a contiguous 116 deg$^2$ region in five bins of photometric redshift width $Δz = 0.2$ in the range $0.2 < z < 1.2.$ The impact of photometric redshift errors are assessed by comparing res…
▽ More
We study the clustering of galaxies detected at $i<22.5$ in the Science Verification observations of the Dark Energy Survey (DES). Two-point correlation functions are measured using $2.3\times 10^6$ galaxies over a contiguous 116 deg$^2$ region in five bins of photometric redshift width $Δz = 0.2$ in the range $0.2 < z < 1.2.$ The impact of photometric redshift errors are assessed by comparing results using a template-based photo-$z$ algorithm (BPZ) to a machine-learning algorithm (TPZ). A companion paper (Leistedt et al 2015) presents maps of several observational variables (e.g. seeing, sky brightness) which could modulate the galaxy density. Here we characterize and mitigate systematic errors on the measured clustering which arise from these observational variables, in addition to others such as Galactic dust and stellar contamination. After correcting for systematic effects we measure galaxy bias over a broad range of linear scales relative to mass clustering predicted from the Planck $Λ$CDM model, finding agreement with CFHTLS measurements with $χ^2$ of 4.0 (8.7) with 5 degrees of freedom for the TPZ (BPZ) redshifts. We test a "linear bias" model, in which the galaxy clustering is a fixed multiple of the predicted non-linear dark-matter clustering. The precision of the data allow us to determine that the linear bias model describes the observed galaxy clustering to $2.5\%$ accuracy down to scales at least $4$ to $10$ times smaller than those on which linear theory is expected to be sufficient.
△ Less
Submitted 15 December, 2015; v1 submitted 19 July, 2015;
originally announced July 2015.
-
A Hybrid Ensemble Learning Approach to Star-Galaxy Classification
Authors:
Edward J. Kim,
Robert J. Brunner,
Matias Carrasco Kind
Abstract:
There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learn…
▽ More
There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template fitting method. Using data from the CFHTLenS survey, we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2, SDSS, VIPERS, and VVDS, and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, strategies that combine the predictions of different classifiers may prove to be optimal in currently ongoing and forthcoming photometric surveys, such as the Dark Energy Survey and the Large Synoptic Survey Telescope.
△ Less
Submitted 14 July, 2015; v1 submitted 8 May, 2015;
originally announced May 2015.
-
On the Clustering of Compact Galaxy Pairs in Dark Matter Haloes
Authors:
Yiran Wang,
R. J. Brunner
Abstract:
We analyze the clustering of photometrically selected galaxy pairs by using the halo-occupation distribution (HOD) model. We measure the angular two-point auto-correlation function, $ω(θ)$, for galaxies and galaxy pairs in three volume-limited samples and develop an HOD to model their clustering. Our results are successfully fit by these HOD models, and we see the separation of "1-halo" and "2-hal…
▽ More
We analyze the clustering of photometrically selected galaxy pairs by using the halo-occupation distribution (HOD) model. We measure the angular two-point auto-correlation function, $ω(θ)$, for galaxies and galaxy pairs in three volume-limited samples and develop an HOD to model their clustering. Our results are successfully fit by these HOD models, and we see the separation of "1-halo" and "2-halo" clustering terms for both single galaxies and galaxy pairs. Our clustering measurements and HOD model fits for the single galaxy samples are consistent with previous results. We find that the galaxy pairs generally have larger clustering amplitudes than single galaxies, and the quantities computed during the HOD fitting, e.g., effective halo mass, $M_{eff}$, and linear bias, $b_{g}$, are also larger for galaxy pairs. We find that the central fractions for galaxy pairs are significantly higher than single galaxies, which confirms that galaxy pairs are formed at the center of more massive dark matter haloes. We also model the clustering dependence of the galaxy pair correlation function on redshift, galaxy type, and luminosity. We find early-early pairs (bright galaxy pairs) cluster more strongly than late-late pairs (dim galaxy pairs), and that the clustering does not depend on the luminosity contrast between the two galaxies in the compact group.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
Sparse Representation of Photometric Redshift PDFs: Preparing for Petascale Astronomy
Authors:
M. Carrasco Kind,
R. J. Brunner
Abstract:
One of the consequences of entering the era of precision cosmology is the widespread adoption of photometric redshift probability density functions (PDFs). Both current and future photometric surveys are expected to obtain images of billions of distinct galaxies. As a result, storing and analyzing all of these PDFs will be non-trivial and even more severe if a survey plans to compute and store mul…
▽ More
One of the consequences of entering the era of precision cosmology is the widespread adoption of photometric redshift probability density functions (PDFs). Both current and future photometric surveys are expected to obtain images of billions of distinct galaxies. As a result, storing and analyzing all of these PDFs will be non-trivial and even more severe if a survey plans to compute and store multiple different PDFs. In this paper we propose the use of a sparse basis representation to fully represent individual photo-$z$ PDFs. By using an Orthogonal Matching Pursuit algorithm and a combination of Gaussian and Voigt basis functions, we demonstrate how our approach is superior to a multi-Gaussian fitting, as we require approximately half of the parameters for the same fitting accuracy with the additional advantage that an entire PDF can be stored by using a 4-byte integer per basis function, and we can achieve better accuracy by increasing the number of bases. By using data from the CFHTLenS, we demonstrate that only ten to twenty points per galaxy are sufficient to reconstruct both the individual PDFs and the ensemble redshift distribution, $N(z)$, to an accuracy of 99.9% when compared to the one built using the original PDFs computed with a resolution of $δz = 0.01$, reducing the required storage of two hundred original values by a factor of ten to twenty. Finally, we demonstrate how this basis representation can be directly extended to a cosmological analysis, thereby increasing computational performance without losing resolution nor accuracy.
△ Less
Submitted 25 April, 2014;
originally announced April 2014.
-
Exhausting the Information: Novel Bayesian Combination of Photometric Redshift PDFs
Authors:
M. Carrasco Kind,
R. J. Brunner
Abstract:
The estimation and utilization of photometric redshift probability density functions (photo-$z$ PDFs) has become increasingly important over the last few years and currently there exist a wide variety of algorithms to compute photo-$z$'s, each with their own strengths and weaknesses. In this paper, we present a novel and efficient Bayesian framework that combines the results from different photo-…
▽ More
The estimation and utilization of photometric redshift probability density functions (photo-$z$ PDFs) has become increasingly important over the last few years and currently there exist a wide variety of algorithms to compute photo-$z$'s, each with their own strengths and weaknesses. In this paper, we present a novel and efficient Bayesian framework that combines the results from different photo-$z$ techniques into a more powerful and robust estimate by maximizing the information from the photometric data. To demonstrate this we use a supervised machine learning technique based on random forest, an unsupervised method based on self-organizing maps, and a standard template fitting method but can be easily extend to other existing techniques. We use data from the DEEP2 and the SDSS surveys to explore different methods for combining the predictions from these techniques. By using different performance metrics, we demonstrate that we can improve the accuracy of our final photo-$z$ estimate over the best input technique, that the fraction of outliers is reduced, and that the identification of outliers is significantly improved when we apply a Naïve Bayes Classifier to this combined information. Our more robust and accurate photo-$z$ PDFs will allow even more precise cosmological constraints to be made by using current and future photometric surveys. These improvements are crucial as we move to analyze photometric data that push to or even past the limits of the available training data, which will be the case with the Large Synoptic Survey Telescope.
△ Less
Submitted 4 June, 2014; v1 submitted 28 February, 2014;
originally announced March 2014.
-
SOMz: photometric redshift PDFs with self organizing maps and random atlas
Authors:
M. Carrasco Kind,
R. J. Brunner
Abstract:
In this paper we explore the applicability of the unsupervised machine learning technique of Self Organizing Maps (SOM) to estimate galaxy photometric redshift probability density functions (PDFs). This technique takes a spectroscopic training set, and maps the photometric attributes, but not the redshifts, to a two dimensional surface by using a process of competitive learning where neurons compe…
▽ More
In this paper we explore the applicability of the unsupervised machine learning technique of Self Organizing Maps (SOM) to estimate galaxy photometric redshift probability density functions (PDFs). This technique takes a spectroscopic training set, and maps the photometric attributes, but not the redshifts, to a two dimensional surface by using a process of competitive learning where neurons compete to more closely resemble the training data multidimensional space. The key feature of a SOM is that it retains the topology of the input set, revealing correlations between the attributes that are not easily identified. We test three different 2D topological map**: rectangular, hexagonal, and spherical, by using data from the DEEP2 survey. We also explore different implementations and boundary conditions on the map and also introduce the idea of a random atlas where a large number of different maps are created and their individual predictions are aggregated to produce a more robust photometric redshift PDF. We also introduced a new metric, the $I$-score, which efficiently incorporates different metrics, making it easier to compare different results (from different parameters or different photometric redshift codes). We find that by using a spherical topology map** we obtain a better representation of the underlying multidimensional topology, which provides more accurate results that are comparable to other, state-of-the-art machine learning algorithms. Our results illustrate that unsupervised approaches have great potential for many astronomical problems, and in particular for the computation of photometric redshifts.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.
-
Narrow absorption line variability in repeat quasar observations from the Sloan Digital Sky Survey
Authors:
Troy L. Hacker,
Robert J. Brunner,
Britt F. Lundgren,
Donald G. York
Abstract:
We present the results from a time domain study of absorption lines detected in quasar spectra with repeat observations from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7). Beginning with over 4500 unique time separation baselines of various absorption line species identified in the SDSS DR7 quasar spectra, we create a catalogue of 2522 quasar absorption line systems with two to eight repe…
▽ More
We present the results from a time domain study of absorption lines detected in quasar spectra with repeat observations from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7). Beginning with over 4500 unique time separation baselines of various absorption line species identified in the SDSS DR7 quasar spectra, we create a catalogue of 2522 quasar absorption line systems with two to eight repeat observations, representing the largest collection of unbiased and homogeneous multi-epoch absorption systems ever published. To investigate these systems for time variability of narrow absorption lines, we refine this sample based on the reliability of the system detection, the proximity of pixels with bright sky contamination to individual absorption lines, and the quality of the continuum fit. Variability measurements of this sub-sample based on the absorption line equivalent widths yield a total of 33 systems with indications of significantly variable absorption strengths on time-scales ranging from one day to several years in the rest frame of the absorption system. Of these, at least 10 are from a class known as intervening absorption systems caused by foreground galaxies along the line of sight to the background quasar. This is the first evidence of possible absorption line variability detected in intervening systems, and their short time-scale variations suggest that small-scale structures (~10-100 au) are likely to exist in their host foreground galaxies.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Dark energy with gravitational lens time delays
Authors:
T. Treu,
P. J. Marshall,
F. -Y. Cyr-Racine,
C. D. Fassnacht,
C. R. Keeton,
E. V. Linder,
L. A. Moustakas,
M. Bradac,
E. Buckley-Geer,
T. Collett,
F. Courbin,
G. Dobler,
D. A. Finley,
J. Hjorth,
C. S. Kochanek,
E. Komatsu,
L. V. E. Koopmans,
G. Meylan,
P. Natarajan,
M. Oguri,
S. H. Suyu,
M. Tewes,
K. C. Wong,
A. I. Zabludoff,
D. Zaritsky
, et al. (13 additional authors not shown)
Abstract:
Strong lensing gravitational time delays are a powerful and cost effective probe of dark energy. Recent studies have shown that a single lens can provide a distance measurement with 6-7 % accuracy (including random and systematic uncertainties), provided sufficient data are available to determine the time delay and reconstruct the gravitational potential of the deflector. Gravitational-time delays…
▽ More
Strong lensing gravitational time delays are a powerful and cost effective probe of dark energy. Recent studies have shown that a single lens can provide a distance measurement with 6-7 % accuracy (including random and systematic uncertainties), provided sufficient data are available to determine the time delay and reconstruct the gravitational potential of the deflector. Gravitational-time delays are a low redshift (z~0-2) probe and thus allow one to break degeneracies in the interpretation of data from higher-redshift probes like the cosmic microwave background in terms of the dark energy equation of state. Current studies are limited by the size of the sample of known lensed quasars, but this situation is about to change. Even in this decade, wide field imaging surveys are likely to discover thousands of lensed quasars, enabling the targeted study of ~100 of these systems and resulting in substantial gains in the dark energy figure of merit. In the next decade, a further order of magnitude improvement will be possible with the 10000 systems expected to be detected and measured with LSST and Euclid. To fully exploit these gains, we identify three priorities. First, support for the development of software required for the analysis of the data. Second, in this decade, small robotic telescopes (1-4m in diameter) dedicated to monitoring of lensed quasars will transform the field by delivering accurate time delays for ~100 systems. Third, in the 2020's, LSST will deliver 1000's of time delays; the bottleneck will instead be the aquisition and analysis of high resolution imaging follow-up. Thus, the top priority for the next decade is to support fast high resolution imaging capabilities, such as those enabled by the James Webb Space Telescope and next generation adaptive optics systems on large ground based telescopes.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
TPZ : Photometric redshift PDFs and ancillary information by using prediction trees and random forests
Authors:
M. Carrasco Kind,
R. J. Brunner
Abstract:
With the growth of large photometric surveys, accurately estimating photometric redshifts, preferably as a probability density function (PDF), and fully understanding the implicit systematic uncertainties in this process has become increasingly important. In this paper, we present a new, publicly available, parallel, machine learning algorithm that generates photometric redshift PDFs by using pred…
▽ More
With the growth of large photometric surveys, accurately estimating photometric redshifts, preferably as a probability density function (PDF), and fully understanding the implicit systematic uncertainties in this process has become increasingly important. In this paper, we present a new, publicly available, parallel, machine learning algorithm that generates photometric redshift PDFs by using prediction trees and random forest techniques, which we have named TPZ. This new algorithm incorporates measurement errors into the calculation while also dealing efficiently with missing values in the data. In addition, our implementation of this algorithm provides supplementary information regarding the data being analyzed, including unbiased estimates of the accuracy of the technique without resorting to a validation data set, identification of poor photometric redshift areas within the parameter space occupied by the spectroscopic training data, a quantification of the relative importance of the variables used to construct the PDF, and a robust identification of outliers. This extra information can be used to optimally target new spectroscopic observations and to improve the overall efficacy of the redshift estimation. We have tested TPZ on galaxy samples drawn from the SDSS main galaxy sample and from the DEEP2 survey, obtaining excellent results in each case. We also have tested our implementation by participating in the PHAT1 project, which is a blind photometric redshift contest, finding that TPZ performs comparable to if not better than other empirical photometric redshift algorithms. Finally, we discuss the various parameters that control the operation of TPZ, the specific limitations of this approach and an application of photometric redshift PDFs.
△ Less
Submitted 28 March, 2013;
originally announced March 2013.
-
The SDSS Galaxy Angular Two-Point Correlation Function
Authors:
Y. Wang,
R. J. Brunner,
J. C. Dolence
Abstract:
We present the galaxy two-point angular correlation function for galaxies selected from the seventh data release of the Sloan Digital Sky Survey. The galaxy sample was selected with $r$-band apparent magnitudes between 17 and 21; and we measure the correlation function for the full sample as well as for the four magnitude ranges: 17-18, 18-19, 19-20, and 20-21. We update the flag criteria to selec…
▽ More
We present the galaxy two-point angular correlation function for galaxies selected from the seventh data release of the Sloan Digital Sky Survey. The galaxy sample was selected with $r$-band apparent magnitudes between 17 and 21; and we measure the correlation function for the full sample as well as for the four magnitude ranges: 17-18, 18-19, 19-20, and 20-21. We update the flag criteria to select a clean galaxy catalog and detail specific tests that we perform to characterize systematic effects, including the effects of seeing, Galactic extinction, and the overall survey uniformity. Notably, we find that optimally we can use observed regions with seeing $< 1\farcs5$, and $r$-band extinction < 0.13 magnitudes, smaller than previously published results. Furthermore, we confirm that the uniformity of the SDSS photometry is minimally affected by the stripe geometry. We find that, overall, the two-point angular correlation function can be described by a power law, $ω(θ) = A_ωθ^{(1-γ)}$ with $γ\simeq 1.72$, over the range $0\fdg005$--$10\degr$. We also find similar relationships for the four magnitude subsamples, but the amplitude within the same angular interval for the four subsamples is found to decrease with fainter magnitudes, in agreement with previous results. We find that the systematic signals are well below the galaxy angular correlation function for angles less than approximately $5\degr$, which limits the modeling of galaxy angular correlations on larger scales. Finally, we present our custom, highly parallelized two-point correlation code that we used in this analysis.
△ Less
Submitted 12 March, 2013; v1 submitted 11 March, 2013;
originally announced March 2013.
-
Evolution of the Clustering of Photometrically Selected SDSS Galaxies
Authors:
Ashley J. Ross,
Will J. Percival,
Robert J. Brunner
Abstract:
We measure the angular auto-correlation functions (w) of SDSS galaxies selected to have photometric redshifts 0.1 < z < 0.4 and absolute r-band magnitudes Mr < -21.2. We split these galaxies into five overlap** redshift shells of width 0.1 and measure w in each subsample in order to investigate the evolution of SDSS galaxies. We find that the bias increases substantially with redshift - much m…
▽ More
We measure the angular auto-correlation functions (w) of SDSS galaxies selected to have photometric redshifts 0.1 < z < 0.4 and absolute r-band magnitudes Mr < -21.2. We split these galaxies into five overlap** redshift shells of width 0.1 and measure w in each subsample in order to investigate the evolution of SDSS galaxies. We find that the bias increases substantially with redshift - much more so than one would expect for a passively evolving sample. We use halo-model analysis to determine the best-fit halo-occupation-distribution (HOD) for each subsample, and the best-fit models allow us to interpret the change in bias physically. In order to properly interpret our best-fit HODs, we convert each halo mass to its z = 0 passively evolved bias (bo), enabling a direct comparison of the best-fit HODs at different redshifts. We find that the minimum halo bo required to host a galaxy decreases as the redshift decreases, suggesting that galaxies with Mr < -21.2 are forming in halos at the low-mass end of the HODs over our redshift range. We use the best-fit HODs to determine the change in occupation number divided by the change in mass of halos with constant bo and we find a sharp peak at bo ~ 0.9 - corresponding to an average halo mass of ~ 10^12Msol/h. We thus present the following scenario: the bias of galaxies with Mr < -21.2 decreases as the Universe evolves because these galaxies form in halos of mass ~ 10^12Msol/h (independent of redshift), and the bias of these halos naturally decreases as the Universe evolves.
△ Less
Submitted 26 April, 2010; v1 submitted 8 February, 2010;
originally announced February 2010.
-
LSST Science Book, Version 2.0
Authors:
LSST Science Collaboration,
Paul A. Abell,
Julius Allison,
Scott F. Anderson,
John R. Andrew,
J. Roger P. Angel,
Lee Armus,
David Arnett,
S. J. Asztalos,
Tim S. Axelrod,
Stephen Bailey,
D. R. Ballantyne,
Justin R. Bankert,
Wayne A. Barkhouse,
Jeffrey D. Barr,
L. Felipe Barrientos,
Aaron J. Barth,
James G. Bartlett,
Andrew C. Becker,
Jacek Becla,
Timothy C. Beers,
Joseph P. Bernstein,
Rahul Biswas,
Michael R. Blanton,
Joshua S. Bloom
, et al. (223 additional authors not shown)
Abstract:
A survey that can cover the sky in optical bands over wide fields to faint magnitudes with a fast cadence will enable many of the exciting science opportunities of the next decade. The Large Synoptic Survey Telescope (LSST) will have an effective aperture of 6.7 meters and an imaging camera with field of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over 20,000 deg^2 south…
▽ More
A survey that can cover the sky in optical bands over wide fields to faint magnitudes with a fast cadence will enable many of the exciting science opportunities of the next decade. The Large Synoptic Survey Telescope (LSST) will have an effective aperture of 6.7 meters and an imaging camera with field of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over 20,000 deg^2 south of +15 deg. Each pointing will be imaged 2000 times with fifteen second exposures in six broad bands from 0.35 to 1.1 microns, to a total point-source depth of r~27.5. The LSST Science Book describes the basic parameters of the LSST hardware, software, and observing plans. The book discusses educational and outreach opportunities, then goes on to describe a broad range of science that LSST will revolutionize: map** the inner and outer Solar System, stellar populations in the Milky Way and nearby galaxies, the structure of the Milky Way disk and halo and other objects in the Local Volume, transient and variable objects both at low and high redshift, and the properties of normal and active galaxies at low and high redshift. It then turns to far-field cosmological topics, exploring properties of supernovae to z~1, strong and weak lensing, the large-scale distribution of galaxies and baryon oscillations, and how these different probes may be combined to constrain cosmological models and the physics of dark energy.
△ Less
Submitted 1 December, 2009;
originally announced December 2009.
-
Halo-model Analysis of the Clustering of Photometrically Selected Galaxies from SDSS
Authors:
Ashley J Ross,
Robert J. Brunner
Abstract:
We measure the angular 2-point correlation functions of galaxies in a volume limited, photometrically selected galaxy sample from the fifth data release of the Sloan Digital Sky Survey. We split the sample both by luminosity and galaxy type and use a halo-model analysis to find halo-occupation distributions that can simultaneously model the clustering of all, early-, and late-type galaxies in a…
▽ More
We measure the angular 2-point correlation functions of galaxies in a volume limited, photometrically selected galaxy sample from the fifth data release of the Sloan Digital Sky Survey. We split the sample both by luminosity and galaxy type and use a halo-model analysis to find halo-occupation distributions that can simultaneously model the clustering of all, early-, and late-type galaxies in a given sample. Our results for the full galaxy sample are generally consistent with previous results using the SDSS spectroscopic sample, taking the differences between the median redshifts of the photometric and spectroscopic samples into account. We find that our early- and late- type measurements cannot be fit by a model that allows early- and late-type galaxies to be well-mixed within halos. Instead, we introduce a new model that segregates early- and late-type galaxies into separate halos to the maximum allowed extent. We determine that, in all cases, it provides a good fit to our data and thus provides a new statistical description of the manner in which early- and late-type galaxies occupy halos.
△ Less
Submitted 26 June, 2009;
originally announced June 2009.
-
Data Mining and Machine Learning in Astronomy
Authors:
Nicholas M. Ball,
Robert J. Brunner
Abstract:
We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be littl…
▽ More
We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
△ Less
Submitted 10 August, 2010; v1 submitted 11 June, 2009;
originally announced June 2009.
-
Clustering of Low-Redshift (z <= 2.2) Quasars from the Sloan Digital Sky Survey
Authors:
Nicholas P. Ross,
Yue Shen,
Michael A. Strauss,
Daniel E. Vanden Berk,
Andrew J. Connolly,
Gordon T. Richards,
Donald P. Schneider,
David H. Weinberg,
Patrick B. Hall,
Neta A. Bahcall,
Robert J. Brunner
Abstract:
We present measurements of the quasar two-point correlation function, ξ_{Q}, over the redshift range z=0.3-2.2 based upon data from the SDSS. Using a homogeneous sample of 30,239 quasars with spectroscopic redshifts from the DR5 Quasar Catalogue, our study represents the largest sample used for this type of investigation to date. With this redshift range and an areal coverage of approx 4,000 deg…
▽ More
We present measurements of the quasar two-point correlation function, ξ_{Q}, over the redshift range z=0.3-2.2 based upon data from the SDSS. Using a homogeneous sample of 30,239 quasars with spectroscopic redshifts from the DR5 Quasar Catalogue, our study represents the largest sample used for this type of investigation to date. With this redshift range and an areal coverage of approx 4,000 deg^2, we sample over 25 h^-3 Gpc^3 (comoving) assuming the current LCDM cosmology. Over this redshift range, we find that the redshift-space correlation function, xi(s), is adequately fit by a single power-law, with s_{0}=5.95+/-0.45 h^-1 Mpc and γ_{s}=1.16+0.11-0.16 when fit over s=1-25 h^-1 Mpc. Using the projected correlation function we calculate the real-space correlation length, r_{0}=5.45+0.35-0.45 h^-1 Mpc and γ=1.90+0.04-0.03, over scales of rp=1-130 h^-1 Mpc. Dividing the sample into redshift slices, we find very little, if any, evidence for the evolution of quasar clustering, with the redshift-space correlation length staying roughly constant at s_{0} ~ 6-7 h^-1 Mpc at z<2.2 (and only increasing at redshifts greater than this). Comparing our clustering measurements to those reported for X-ray selected AGN at z=0.5-1, we find reasonable agreement in some cases but significantly lower correlation lengths in others. We find that the linear bias evolves from b~1.4 at z=0.5 to b~3 at z=2.2, with b(z=1.27)=2.06+/-0.03 for the full sample. We compare our data to analytical models and infer that quasars inhabit dark matter haloes of constant mass M ~2 x 10^12 h^-1 M_Sol from redshifts z~2.5 (the peak of quasar activity) to z~0. [ABRIDGED]
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
A Cross-Correlation Analysis of Mg II Absorption Line Systems and Luminous Red Galaxies from the SDSS DR5
Authors:
Britt F. Lundgren,
Robert J. Brunner,
Donald G. York,
Ashley J. Ross,
Jean M. Quashnock,
Adam D. Myers,
Donald P. Schneider,
Yusra AlSayyad,
Neta Bahcall
Abstract:
We analyze the cross-correlation of 2,705 unambiguously intervening Mg II (2796,2803A) quasar absorption line systems with 1,495,604 luminous red galaxies (LRGs) from the Fifth Data Release of the Sloan Digital Sky Survey within the redshift range 0.36<=z<=0.8. We confirm with high precision a previously reported weak anti-correlation of equivalent width and dark matter halo mass, measuring the…
▽ More
We analyze the cross-correlation of 2,705 unambiguously intervening Mg II (2796,2803A) quasar absorption line systems with 1,495,604 luminous red galaxies (LRGs) from the Fifth Data Release of the Sloan Digital Sky Survey within the redshift range 0.36<=z<=0.8. We confirm with high precision a previously reported weak anti-correlation of equivalent width and dark matter halo mass, measuring the average masses to be log M_h(M_[solar]h^-1)=11.29 [+0.36,-0.62] and log M_h(M_[solar]h^-1)=12.70 [+0.53,-1.16] for systems with W[2796A]>=1.4A and 0.8A<=W[2796A]<1.4A, respectively. Additionally, we investigate the significance of a number of potential sources of bias inherent in absorber-LRG cross-correlation measurements, including absorber velocity distributions and the weak lensing of background quasars, which we determine is capable of producing a 20-30% bias in angular cross-correlation measurements on scales less than 2'. We measure the Mg II - LRG cross-correlation for 719 absorption systems with v<60,000 km s^-1 in the quasar rest frame and find that these associated absorbers typically reside in dark matter haloes that are ~10-100 times more massive than those hosting unambiguously intervening Mg II absorbers. Furthermore, we find evidence for evolution of the redshift number density, dN/dz, with 2-sigma significance for the strongest (W>2.0A) absorbers in the DR5 sample. This width-dependent dN/dz evolution does not significantly affect the recovered equivalent width-halo mass anti-correlation and adds to existing evidence that the strongest Mg II absorption systems are correlated with an evolving population of field galaxies at z<0.8, while the non-evolving dN/dz of the weakest absorbers more closely resembles that of the LRG population.
△ Less
Submitted 26 May, 2009; v1 submitted 23 February, 2009;
originally announced February 2009.
-
The 2dF-SDSS LRG and QSO Survey: The spectroscopic QSO catalogue
Authors:
Scott M. Croom,
Gordon T. Richards,
Tom Shanks,
Brian J. Boyle,
Robert G. Sharp,
Joss Bland-Hawthorn,
Terry Bridges,
Robert J. Brunner,
Russell Cannon,
Daniel Carson,
Kuenley Chiu,
Matthew Colless,
Warrick Couch,
Roberto De Propris,
Michael J. Drinkwater,
Alastair Edge,
Stephen Fine,
Jon Loveday,
Lance Miller,
Adam D. Myers,
Robert C. Nichol,
Phil Outram,
Kevin Pimbblet,
Isaac Roseboom,
Nicholas Ross
, et al. (5 additional authors not shown)
Abstract:
We present the final spectroscopic QSO catalogue from the 2dF-SDSS LRG and QSO (2SLAQ) Survey. This is a deep, 18<g<21.85 (extinction corrected), sample aimed at probing in detail the faint end of the broad line AGN luminosity distribution at z<2.6. The candidate QSOs were selected from SDSS photometry and observed spectroscopically with the 2dF spectrograph on the Anglo-Australian Telescope. Th…
▽ More
We present the final spectroscopic QSO catalogue from the 2dF-SDSS LRG and QSO (2SLAQ) Survey. This is a deep, 18<g<21.85 (extinction corrected), sample aimed at probing in detail the faint end of the broad line AGN luminosity distribution at z<2.6. The candidate QSOs were selected from SDSS photometry and observed spectroscopically with the 2dF spectrograph on the Anglo-Australian Telescope. This sample covers an area of 191.9 deg^2 and contains new spectra of 16326 objects, of which 8764 are QSOs, and 7623 are newly discovered (the remainder were previously identified by the 2QZ and SDSS surveys). The full QSO sample (including objects previously observed in the SDSS and 2QZ surveys) contains 12702 QSOs. The new 2SLAQ spectroscopic data set also contains 2343 Galactic stars, including 362 white dwarfs, and 2924 narrow emission line galaxies with a median redshift of z=0.22. We present detailed completeness estimates for the survey, based on modelling of QSO colours, including host galaxy contributions. This calculation shows that at g~21.85 QSO colours are significantly affected by the presence of a host galaxy up to redshift z~1 in the SDSS ugriz bands. In particular we see a significant reddening of the objects in g-i towards fainter g-band magnitudes. This reddening is consistent with the QSO host galaxies being dominated by a stellar population of age at least 2-3 Gyr. The full catalogue, including completeness estimates, is available on-line at http://www.2slaq.info/
△ Less
Submitted 27 October, 2008;
originally announced October 2008.
-
Quasar Clustering from SDSS DR5: Dependences on Physical Properties
Authors:
Yue Shen,
Michael A. Strauss,
Nicholas P. Ross,
Patrick B. Hall,
Yen-Ting Lin,
Gordon T. Richards,
Donald P. Schneider,
David H. Weinberg,
Andrew J. Connolly,
Xiaohui Fan,
Joseph F. Hennawi,
Francesco Shankar,
Daniel E. Vanden Berk,
Neta A. Bahcall,
Robert J. Brunner
Abstract:
Using a homogenous sample of 38,208 quasars with a sky coverage of $4000 {\rm deg^2}$ drawn from the SDSS Data Release Five quasar catalog, we study the dependence of quasar clustering on luminosity, virial black hole mass, quasar color, and radio loudness. At $z<2.5$, quasar clustering depends weakly on luminosity and virial black hole mass, with typical uncertainty levels $\sim 10%$ for the me…
▽ More
Using a homogenous sample of 38,208 quasars with a sky coverage of $4000 {\rm deg^2}$ drawn from the SDSS Data Release Five quasar catalog, we study the dependence of quasar clustering on luminosity, virial black hole mass, quasar color, and radio loudness. At $z<2.5$, quasar clustering depends weakly on luminosity and virial black hole mass, with typical uncertainty levels $\sim 10%$ for the measured correlation lengths. These weak dependences are consistent with models in which substantial scatter between quasar luminosity, virial black hole mass and the host dark matter halo mass has diluted any clustering difference, where halo mass is assumed to be the relevant quantity that best correlates with clustering strength. However, the most luminous and most massive quasars are more strongly clustered (at the $\sim 2σ$ level) than the remainder of the sample, which we attribute to the rapid increase of the bias factor at the high-mass end of host halos. We do not observe a strong dependence of clustering strength on quasar colors within our sample. On the other hand, radio-loud quasars are more strongly clustered than are radio-quiet quasars matched in redshift and optical luminosity (or virial black hole mass), consistent with local observations of radio galaxies and radio-loud type 2 AGN. Thus radio-loud quasars reside in more massive and denser environments in the biased halo clustering picture. Using the Sheth et al.(2001) formula for the linear halo bias, the estimated host halo mass for radio-loud quasars is $\sim 10^{13} h^{-1}M_\odot$, compared to $\sim 2\times 10^{12} h^{-1}M_\odot$ for radio-quiet quasar hosts at $z\sim 1.5$.
△ Less
Submitted 13 December, 2008; v1 submitted 22 October, 2008;
originally announced October 2008.
-
Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection
Authors:
Gordon T. Richards,
Rajesh P. Deo,
Mark Lacy,
Adam D. Myers,
Robert C. Nichol,
Nadia L. Zakamska,
Robert J. Brunner,
W. N. Brandt,
Alexander G. Gray,
John K. Parejko,
Andrew Ptak,
Donald P. Schneider,
Lisa J. Storrie-Lombardi,
Alexander S. Szalay
Abstract:
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um dept…
▽ More
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um depth of 56 uJy over an area of ~24 sq. deg; ~70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. Our selection recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5<z<5 quasars. Even using only the two shortest wavelength IRAC bandpasses, it is possible to use our Bayesian techniques to select quasars with 97% completeness and as little as 10% contamination. This sample has a photometric redshift accuracy of 93.6% (Delta Z +/-0.3), remaining roughly constant when the two reddest MIR bands are excluded. While our methods are designed to find type 1 (unobscured) quasars, as many as 1200 of the objects are type 2 (obscured) quasar candidates. Coupling deep optical imaging data with deep mid-IR data could enable selection of quasars in significant numbers past the peak of the quasar luminosity function (QLF) to at least z~4. Such a sample would constrain the shape of the QLF and enable quasar clustering studies over the largest range of redshift and luminosity to date, yielding significant gains in our understanding of quasars and the evolution of galaxies.
△ Less
Submitted 25 February, 2009; v1 submitted 20 October, 2008;
originally announced October 2008.
-
Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: II. ~1,000,000 Quasars from Data Release Six
Authors:
Gordon T. Richards,
Adam D. Myers,
Alexander G. Gray,
Ryan N. Riegel,
Robert C. Nichol,
Robert J. Brunner,
Alexander S. Szalay,
Donald P. Schneider,
Scott F. Anderson
Abstract:
We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i=21.3 from 8417 sq. deg. of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both UV-excess and high-redshift qu…
▽ More
We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i=21.3 from 8417 sq. deg. of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both UV-excess and high-redshift quasars. While the addition of high-redshift candidates reduces the overall efficiency (quasars:quasar candidates) of the catalog to ~80%, it is expected to contain no fewer than 850,000 bona fide quasars -- ~8 times the number of our previous sample, and ~10 times the size of the largest spectroscopic quasar catalog. Cross-matching between our photometric catalog and spectroscopic quasar catalogs from both the SDSS and 2dF Surveys, yields 88,879 spectroscopically confirmed quasars. For judicious selection of the most robust UV-excess sources (~500,000 objects in all), the efficiency is nearly 97% -- more than sufficient for detailed statistical analyses. The catalog's completeness to type 1 (broad-line) quasars is expected to be no worse than 70%, with most missing objects occurring at z<0.7 and 2.5<z<3.0. In addition to classification information, we provide photometric redshift estimates (typically good to Delta z +/- 0.3 [2 sigma]) and cross-matching with radio, X-ray, and proper motion catalogs. Finally, we consider the catalog's utility for determining the optical luminosity function of quasars and are able to confirm the flattening of the bright-end slope of the quasar luminosity function at z~4 as compared to z~2.
△ Less
Submitted 23 September, 2008;
originally announced September 2008.
-
Mitrion-C Application Development on SGI Altix 350/RC100
Authors:
Volodymyr V. Kindratenko,
Robert J. Brunner,
Adam D. Myers
Abstract:
This paper provides an evaluation of SGI RASCTM RC100 technology from a computational science software developer's perspective. A brute force implementation of a two-point angular correlation function is used as a test case application. The computational kernel of this test case algorithm is ported to the Mitrion-C programming language and compiled, targeting the RC100 hardware. We explore sever…
▽ More
This paper provides an evaluation of SGI RASCTM RC100 technology from a computational science software developer's perspective. A brute force implementation of a two-point angular correlation function is used as a test case application. The computational kernel of this test case algorithm is ported to the Mitrion-C programming language and compiled, targeting the RC100 hardware. We explore several code optimization techniques and report performance results for different designs. We conclude the paper with an analysis of this system based on our observations while implementing the test case. Overall, the hardware platform and software development tools were found to be satisfactory for accelerating computationally intensive applications, however, several system improvements are desirable.
△ Less
Submitted 14 May, 2008;
originally announced May 2008.
-
Robust Machine Learning Applied to Terascale Astronomical Datasets
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers
Abstract:
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not…
▽ More
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limiting factor unless the necessary infrastructure is implemented.
△ Less
Submitted 21 April, 2008;
originally announced April 2008.
-
Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers,
Natalie E. Strand,
Stacey L. Alberts,
David Tcheng
Abstract:
We apply machine learning in the form of a nearest neighbor instance-based algorithm (NN) to generate full photometric redshift probability density functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS DR5). We use a conceptually simple but novel application of NN to generate the PDFs - perturbing the object colors by their measurement error - and using the…
▽ More
We apply machine learning in the form of a nearest neighbor instance-based algorithm (NN) to generate full photometric redshift probability density functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS DR5). We use a conceptually simple but novel application of NN to generate the PDFs - perturbing the object colors by their measurement error - and using the resulting instances of nearest neighbor distributions to generate numerous individual redshifts. When the redshifts are compared to existing SDSS spectroscopic data, we find that the mean value of each PDF has a dispersion between the photometric and spectroscopic redshift consistent with other machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The PDFs allow the selection of subsets with improved statistics. For quasars, the improvement is dramatic: for those with a single peak in their probability distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010, and the photometric redshift is within 0.3 of the spectroscopic redshift for 99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can virtually eliminate 'catastrophic' photometric redshift estimates. In addition to the SDSS sample, we incorporate ultraviolet photometry from the Third Data Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3) to create PDFs for objects seen in both surveys. For quasars, the increased coverage of the observed frame UV of the SED results in significant improvement over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that this improvement is genuine. [Abridged]
△ Less
Submitted 21 April, 2008;
originally announced April 2008.
-
Normalization of the Matter Power Spectrum via Higher-Order Angular Correlations of Luminous Red Galaxies
Authors:
Ashley J. Ross,
Robert J. Brunner,
Adam D. Myers
Abstract:
We present a novel technique to measure $σ_8$, by measuring the dependence of the second-order bias of a density field on $σ_8$ using two separate techniques. Each technique employs area-averaged angular correlation functions ($\barω_N$), one relying on the shape of $\barω_2$, the other relying on the amplitude of $s_3$ ($s_3 =\barω_3/\barω_2^2$). We confirm the validity of the method by testing…
▽ More
We present a novel technique to measure $σ_8$, by measuring the dependence of the second-order bias of a density field on $σ_8$ using two separate techniques. Each technique employs area-averaged angular correlation functions ($\barω_N$), one relying on the shape of $\barω_2$, the other relying on the amplitude of $s_3$ ($s_3 =\barω_3/\barω_2^2$). We confirm the validity of the method by testing it on a mock catalog drawn from Millennium Simulation data and finding $σ_8^{measured}- σ_8^{true} = -0.002 \pm 0.062$. We create a catalog of photometrically selected LRGs from SDSS DR5 and separate it into three distinct data sets by photometric redshift, with median redshifts of 0.47, 0.53, and 0.61. Measurements of $c_2$, and $σ_8$ are made for each data set, assuming flat geometry and WMAP3 best-fit priors on $Ω_m$, $h$, and $Γ$. We find, with increasing redshfit, $c_2 = 0.09 \pm 0.04$, $0.09 \pm 0.05$, and $0.09 \pm 0.03$ and $σ_8 = 0.78 \pm 0.08$, $0.80 \pm 0.09$, and $0.80 \pm 0.09$. We combine these three consistent $σ_8$ measurements to produce the result $σ_8 = 0.79 \pm 0.05$. Allowing the parameters $Ω_m$, $h$, and $Γ$ to vary within their WMAP3 1$σ$ error, we find that the best-fit $σ_8$ does not change by more than 8% and we are thus confident our measurement is accurate to within 10%. We anticipate that future surveys, such as Pan-STARRS, DES, and LSST, will be able to employ this method to measure $σ_8$ to great precision, and will serve as an important check, complementary, on the values determined via more established methods.
△ Less
Submitted 21 April, 2008;
originally announced April 2008.
-
AGN Environments in the Sloan Digital Sky Survey I: Dependence on Type, Redshift, and Luminosity
Authors:
Natalie E. Strand,
Robert J. Brunner,
Adam D. Myers
Abstract:
We explore how the local environment is related to the redshift, type, and luminosity of active galactic nuclei (AGN). Recent simulations and observations are converging on the view that the extreme luminosity of quasars is fueled in major mergers of gas-rich galaxies. In such a picture, quasars are expected to be located in regions with a higher density of galaxies on small scales where mergers…
▽ More
We explore how the local environment is related to the redshift, type, and luminosity of active galactic nuclei (AGN). Recent simulations and observations are converging on the view that the extreme luminosity of quasars is fueled in major mergers of gas-rich galaxies. In such a picture, quasars are expected to be located in regions with a higher density of galaxies on small scales where mergers are more likely to take place. However, in this picture, the activity observed in low-luminosity AGN is due to secular processes that are less dependent on the local galaxy density. To test this hypothesis, we compare the local photometric galaxy density on kiloparsec scales around spectroscopic Type I and Type II quasars to the local density around lower luminosity spectroscopic Type I and Type II AGN. To minimize projection effects and evolution in the photometric galaxy sample we use to characterize AGN environments, we place our random control sample at the same redshift as our AGN and impose a narrow redshift window around both the AGN and control targets. We find that higher luminosity AGN have more overdense environments compared to lower luminosity AGN on all scales out to our $2\Mpchseventy$ limit. Additionally, in the range $0.3\leqslant z\leqslant 0.6$, Type II quasars have similarly overdense environments to those of bright Type I quasars on all scales out to our $2\Mpchseventy$ limit, while the environment of dimmer Type I quasars appears to be less overdense than the environment of Type II quasars. We see increased overdensity for Type II AGN compared to Type I AGN on scales out to our limit of $2\Mpchseventy$ in overlap** redshift ranges. We also detect marginal evidence for evolution in the number of galaxies within $2\Mpchseventy$ of a quasar with redshift.
△ Less
Submitted 24 July, 2008; v1 submitted 14 December, 2007;
originally announced December 2007.
-
On the variability of quasars: a link between Eddington ratio and optical variability?
Authors:
B. C. Wilhite,
R. J. Brunner,
C. J. Grier,
D. P. Schneider,
D. E. Vanden Berk
Abstract:
Repeat scans by the Sloan Digital Sky Survey (SDSS) of a 278 square degree stripe along the Celestial equator have yielded an average of over 10 observations each for nearly 8,000 spectroscopically confirmed quasars. Over 2500 of these quasars are in the redshift range such that the CIV emission line is visible in the SDSS spectrum. Utilising the width of these CIV lines and the luminosity of th…
▽ More
Repeat scans by the Sloan Digital Sky Survey (SDSS) of a 278 square degree stripe along the Celestial equator have yielded an average of over 10 observations each for nearly 8,000 spectroscopically confirmed quasars. Over 2500 of these quasars are in the redshift range such that the CIV emission line is visible in the SDSS spectrum. Utilising the width of these CIV lines and the luminosity of the nearby continuum, we estimate black hole masses for these objects. In an effort to isolate the effects of black hole mass and luminosity on the photometric variability of our dataset, we create several subsamples by binning in these two physical parameters. By comparing the ensemble structure functions of the quasars in these bins, we are able to reproduce the well-known anticorrelation between luminosity and variability, now showing that this anticorrelation is independent of the black hole mass. In addition, we find a correlation between variability and the mass of the central black hole. By combining these two relations, we identify the Eddington ratio as a possible driver of quasar variability, most likely due to differences in accretion efficiency.
△ Less
Submitted 29 November, 2007;
originally announced November 2007.
-
Develo** and Deploying Advanced Algorithms to Novel Supercomputing Hardware
Authors:
Robert J. Brunner,
Volodymyr V. Kindratenko,
Adam D. Myers
Abstract:
The objective of our research is to demonstrate the practical usage and orders of magnitude speedup of real-world applications by using alternative technologies to support high performance computing. Currently, the main barrier to the widespread adoption of this technology is the lack of development tools and case studies that typically impede non-specialists that might otherwise develop applica…
▽ More
The objective of our research is to demonstrate the practical usage and orders of magnitude speedup of real-world applications by using alternative technologies to support high performance computing. Currently, the main barrier to the widespread adoption of this technology is the lack of development tools and case studies that typically impede non-specialists that might otherwise develop applications that could leverage these technologies. By partnering with the Innovative Systems Laboratory at the National Center for Supercomputing, we have obtained access to several novel technologies, including several Field-Programmable Gate Array (FPGA) systems, NVidia Graphics Processing Units (GPUs), and the STI Cell BE platform. Our goal is to not only demonstrate the capabilities of these systems, but to also serve as guides for others to follow in our path. To date, we have explored the efficacy of the SRC-6 MAP-C and MAP-E and SGI RASC Athena and RC100 reconfigurable computing platforms in supporting a two-point correlation function which is used in a number of different scientific domains. In a brute force test, the FPGA based single-processor system has achieved an almost two orders of magnitude speedup over a single-processor CPU system. We are now develo** implementations of this algorithm on other platforms, including one using a GPU. Given the considerable efforts of the cosmology community in optimizing these classes of algorithms, we are currently working to implement an optimized version of the basic family of correlation functions by using tree-based data structures. Finally, we are also exploring other algorithms, such as instance-based classifiers, power spectrum estimators, and higher-order correlation functions that are also commonly used in a wide range of scientific disciplines.
△ Less
Submitted 21 November, 2007;
originally announced November 2007.
-
Dynamic load-balancing on multi-FPGA systems: a case study
Authors:
Volodymyr V. Kindratenko,
Robert J. Brunner,
Adam D. Myers
Abstract:
In this case study, we investigate the impact of workload balance on the performance of multi-FPGA codes. We start with an application in which two distinct kernels run in parallel on two SRC-6 MAP processors. We observe that one of the MAP processors is idle 18% of the time while the other processor is fully utilized. We investigate a task redistribution schema which serializes the execution of…
▽ More
In this case study, we investigate the impact of workload balance on the performance of multi-FPGA codes. We start with an application in which two distinct kernels run in parallel on two SRC-6 MAP processors. We observe that one of the MAP processors is idle 18% of the time while the other processor is fully utilized. We investigate a task redistribution schema which serializes the execution of the two kernels, yet parallelizes execution of each individual kernel by spreading the workload between two MAP processors. This implementation results in a near 100% utilization of both MAP processors and the overall application performance is improved by 9%.
△ Less
Submitted 13 November, 2007;
originally announced November 2007.
-
Robust Machine Learning Applied to Terascale Astronomical Datasets
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers
Abstract:
We present recent results from the Laboratory for Cosmological Data Mining (http://lcdm.astro.uiuc.edu) at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic…
▽ More
We present recent results from the Laboratory for Cosmological Data Mining (http://lcdm.astro.uiuc.edu) at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.
△ Less
Submitted 24 October, 2007;
originally announced October 2007.
-
Quasar Clustering at $25\kpch$ from a Complete Sample of Binaries
Authors:
Adam D. Myers,
Gordon T. Richards,
Robert J. Brunner,
Donald P. Schneider,
Natalie E. Strand,
Patrick B. Hall,
Jeffrey A. Blomquist,
Donald G. York
Abstract:
We present spectroscopy of binary quasar candidates selected from Data Release 4 of the Sloan Digital Sky Survey (SDSS DR4) using Kernel Density Estimation (KDE). We present 27 new sets of observations, 10 of which are binary quasars, roughly doubling the number of known $g < 21$ binaries with component separations of 3 to 6". Only 3 of 49 spectroscopically identified objects are non-quasars, co…
▽ More
We present spectroscopy of binary quasar candidates selected from Data Release 4 of the Sloan Digital Sky Survey (SDSS DR4) using Kernel Density Estimation (KDE). We present 27 new sets of observations, 10 of which are binary quasars, roughly doubling the number of known $g < 21$ binaries with component separations of 3 to 6". Only 3 of 49 spectroscopically identified objects are non-quasars, confirming that the quasar selection efficiency of the KDE technique is $\sim95$%. Several of our observed binaries are wide-separation lens candidates that merit additional higher-resolution observations. One interesting pair may be an M star binary, or an M star-binary quasar superposition. Our candidates are initially selected by UV-excess ($u-g < 1$), but are otherwise selected irrespective of the relative colors of the quasar pair, and we thus use them to suggest optimal color similarity and photometric redshift approaches for targeting binary quasars, or projected quasar pairs. From a sample that is complete on proper scales of $23.7 < R_{prop} < 29.7\kpch$, we determine the projected quasar correlation function to be $W_p=24.0 \pm^{16.9}_{10.8}$, which is $2σ$ lower than recent estimates. We argue that our low $W_p$ estimates may indicate redshift evolution in the quasar correlation function from $z\sim1.9$ to $z\sim1.4$ on scales of $R_{prop} \sim25\kpch$. The size of this evolution broadly tracks quasar clustering on larger scales, consistent with merger-driven models of quasar origin. Although our sample alone is insufficient to detect evolution in quasar clustering on small scales, an $i$-selected DR6 KDE quasar catalog, which will contain several hundred $z \leqsim 5$ binary quasars, could easily constrain any clustering evolution at $R_{prop} \sim25\kpch$.
△ Less
Submitted 21 September, 2007;
originally announced September 2007.
-
The Sloan Digital Sky Survey Quasar Lens Search. II. Statistical Lens Sample from the Third Data Release
Authors:
Naohisa Inada,
Masamune Oguri,
Robert H. Becker,
Min-Su Shin,
Gordon T. Richards,
Joseph F. Hennawi,
Richard L. White,
Bartosz Pindor,
Michael A. Strauss,
Christopher S. Kochanek,
David E. Johnston,
Michael D. Gregg,
Issha Kayo,
Daniel Eisenstein,
Patrick B. Hall,
Francisco J. Castander,
Alejandro Clocchiatti,
Scott F. Anderson,
Donald P. Schneider,
Donald G. York,
Robert Lupton,
Kuenley Chiu,
Yozo Kawano,
Ryan Scranton,
Joshua A. Frieman
, et al. (9 additional authors not shown)
Abstract:
We report the first results of our systematic search for strongly lensed quasars using the spectroscopically confirmed quasars in the Sloan Digital Sky Survey (SDSS). Among 46,420 quasars from the SDSS Data Release 3 (~4188 deg^2), we select a subsample of 22,683 quasars that are located at redshifts between 0.6 and 2.2 and are brighter than the Galactic extinction corrected i-band magnitude of…
▽ More
We report the first results of our systematic search for strongly lensed quasars using the spectroscopically confirmed quasars in the Sloan Digital Sky Survey (SDSS). Among 46,420 quasars from the SDSS Data Release 3 (~4188 deg^2), we select a subsample of 22,683 quasars that are located at redshifts between 0.6 and 2.2 and are brighter than the Galactic extinction corrected i-band magnitude of 19.1. We identify 220 lens candidates from the quasar subsample, for which we conduct extensive and systematic follow-up observations in optical and near-infrared wavebands, in order to construct a complete lensed quasar sample at image separations between 1'' and 20'' and flux ratios of faint to bright lensed images larger than 10^{-0.5}. We construct a statistical sample of 11 lensed quasars. Ten of these are galaxy-scale lenses with small image separations (~1''-2'') and one is a large separation (15'') system which is produced by a massive cluster of galaxies, representing the first statistical sample of lensed quasars including both galaxy- and cluster-scale lenses. The Data Release 3 spectroscopic quasars contain an additional 11 lensed quasars outside the statistical sample.
△ Less
Submitted 30 October, 2007; v1 submitted 7 August, 2007;
originally announced August 2007.
-
The Sloan Digital Sky Survey Quasar Lens Search. III. Constraints on Dark Energy from the Third Data Release Quasar Lens Catalog
Authors:
Masamune Oguri,
Naohisa Inada,
Michael A. Strauss,
Christopher S. Kochanek,
Gordon T. Richards,
Donald P. Schneider,
Robert H. Becker,
Masataka Fukugita,
Michael D. Gregg,
Patrick B. Hall,
Joseph F. Hennawi,
David E. Johnston,
Issha Kayo,
Charles R. Keeton,
Bartosz Pindor,
Min-Su Shin,
Edwin L. Turner,
Richard L. White,
Donald G. York,
Scott F. Anderson,
Neta A. Bahcall,
Robert J. Brunner,
Scott Burles,
Francisco J. Castander,
Kuenley Chiu
, et al. (9 additional authors not shown)
Abstract:
We present cosmological results from the statistics of lensed quasars in the Sloan Digital Sky Survey (SDSS) Quasar Lens Search. By taking proper account of the selection function, we compute the expected number of quasars lensed by early-type galaxies and their image separation distribution assuming a flat universe, which is then compared with 7 lenses found in the SDSS Data Release 3 to derive…
▽ More
We present cosmological results from the statistics of lensed quasars in the Sloan Digital Sky Survey (SDSS) Quasar Lens Search. By taking proper account of the selection function, we compute the expected number of quasars lensed by early-type galaxies and their image separation distribution assuming a flat universe, which is then compared with 7 lenses found in the SDSS Data Release 3 to derive constraints on dark energy under strictly controlled criteria. For a cosmological constant model (w=-1) we obtain Ω_Λ=0.74^{+0.11}_{-0.15}(stat.)^{+0.13}_{-0.06}(syst.). Allowing w to be a free parameter we find Ω_M=0.26^{+0.07}_{-0.06}(stat.)^{+0.03}_{-0.05}(syst.) and w=-1.1\pm0.6(stat.)^{+0.3}_{-0.5}(syst.) when combined with the constraint from the measurement of baryon acoustic oscillations in the SDSS luminous red galaxy sample. Our results are in good agreement with earlier lensing constraints obtained using radio lenses, and provide additional confirmation of the presence of dark energy consistent with a cosmological constant, derived independently of type Ia supernovae.
△ Less
Submitted 30 October, 2007; v1 submitted 7 August, 2007;
originally announced August 2007.
-
The Effect of Variability on the Estimation of Quasar Black Hole Masses
Authors:
B. C. Wilhite,
R. J. Brunner,
D. P. Schneider,
D. E. Vanden Berk
Abstract:
We investigate the time-dependent variations of ultraviolet (UV) black hole mass estimates of quasars in the Sloan Digital Sky Survey (SDSS). From SDSS spectra of 615 high-redshift (1.69 < z < 4.75) quasars with spectra from two epochs, we estimate black hole masses, using a single-epoch technique which employs an additional, automated night-sky-line removal, and relies on UV continuum luminosit…
▽ More
We investigate the time-dependent variations of ultraviolet (UV) black hole mass estimates of quasars in the Sloan Digital Sky Survey (SDSS). From SDSS spectra of 615 high-redshift (1.69 < z < 4.75) quasars with spectra from two epochs, we estimate black hole masses, using a single-epoch technique which employs an additional, automated night-sky-line removal, and relies on UV continuum luminosity and CIV (1549A) emission line dispersion. Mass estimates show variations between epochs at about the 30% level for the sample as a whole. We determine that, for our full sample, measurement error in the line dispersion likely plays a larger role than the inherent variability, in terms of contributing to variations in mass estimates between epochs. However, we use the variations in quasars with r-band spectral signal-to-noise ratio greater than 15 to estimate that the contribution to these variations from inherent variability is roughly 20%. We conclude that these differences in black hole mass estimates between epochs indicate variability is not a large contributer to the current factor of two scatter between mass estimates derived from low- and high-ionization emission lines.
△ Less
Submitted 31 July, 2007;
originally announced August 2007.
-
Higher-Order Angular Galaxy Correlations in the SDSS: Redshift and Color Dependence of non-Linear Bias
Authors:
Ashley J. Ross,
Robert J. Brunner,
Adam D. Myers
Abstract:
We present estimates of the N-point galaxy, area-averaged, angular correlation functions $\barω_{N}$($θ$) for $N$ = 2,...,7 for galaxies from the fifth data release of the Sloan Digital Sky Survey. Our parent sample is selected from galaxies with $18 \leq r < 21$, and is the largest ever used to study higher-order correlations. We subdivide this parent sample into two volume limited samples usin…
▽ More
We present estimates of the N-point galaxy, area-averaged, angular correlation functions $\barω_{N}$($θ$) for $N$ = 2,...,7 for galaxies from the fifth data release of the Sloan Digital Sky Survey. Our parent sample is selected from galaxies with $18 \leq r < 21$, and is the largest ever used to study higher-order correlations. We subdivide this parent sample into two volume limited samples using photometric redshifts, and these two samples are further subdivided by magnitude, redshift, and color (producing early- and late-type galaxy samples) to determine the dependence of $\barω_{N}$($θ$) on luminosity, redshift, and galaxy-type. We measure $\barω_{N}$($θ$) using oversampling techniques and use them to calculate the projected, $s_{N}$. Using models derived from theoretical power-spectra and perturbation theory, we measure the bias parameters $b_1$ and $c_2$, finding that the large differences in both bias parameters ($b_1$ and $c_2$) between early- and late-type galaxies are robust against changes in redshift, luminosity, and $σ_8$, and that both terms are consistently smaller for late-type galaxies. By directly comparing their higher-order correlation measurements, we find large differences in the clustering of late-type galaxies at redshifts lower than 0.3 and those at redshifts higher than 0.3, both at large scales ($c_2$ is larger by $\sim0.5$ at $z > 0.3$) and small scales (large amplitudes are measured at small scales only for $z > 0.3$, suggesting much more merger driven star formation at $z > 0.3$). Finally, our measurements of $c_2$ suggest both that $σ_8 < 0.8$ and $c_2$ is negative.
△ Less
Submitted 19 April, 2007;
originally announced April 2007.
-
The Sloan Digital Sky Survey Quasar Catalog IV. Fifth Data Release
Authors:
Donald P. Schneider,
Patrick B. Hall,
Gordon T. Richards,
Michael A. Strauss,
Daniel E. Vanden Berk,
Scott F. Anderson,
W. N. Brandt,
Xiaohui Fan,
Sebastian Jester,
Jim Gray,
James E. Gunn,
Mark U. SubbaRao,
Anirudda R. Thakar,
Chris Stoughton,
Alexander S. Szalay,
Brian Yanny,
Donald G. York,
Neta A. Bahcall,
J. Barentine,
Michael R. Blanton,
Howard Brewington,
J. Brinkmann,
Robert J. Brunner,
Francisco J. Castander,
Istvan Csabai
, et al. (19 additional authors not shown)
Abstract:
We present the fourth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog. The catalog contains 77,429 objects; this is an increase of over 30,000 entries since the previous edition. The catalog consists of the objects in the SDSS Fifth Data Release that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc, Omega_M = 0.3, and Omega_Lambda = 0.7) have at leas…
▽ More
We present the fourth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog. The catalog contains 77,429 objects; this is an increase of over 30,000 entries since the previous edition. The catalog consists of the objects in the SDSS Fifth Data Release that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc, Omega_M = 0.3, and Omega_Lambda = 0.7) have at least one emission line with FWHM larger than 1000 km/s, or have interesting/complex absorption features, are fainter than i=15.0, and have highly reliable redshifts. The area covered by the catalog is 5740 sq. deg. The quasar redshifts range from 0.08 to 5.41, with a median value of 1.48; the catalog includes 891 quasars at redshifts greater than four, of which 36 are at redshifts greater than five. Approximately half of the catalog quasars have i < 19; nearly all have i < 21. For each object the catalog presents positions accurate to better than 0.2 arcsec. rms per coordinate, five-band (ugriz) CCD-based photometry with typical accuracy of 0.03 mag, and information on the morphology and selection method. The catalog also contains basic radio, near-infrared, and X-ray emission properties of the quasars, when available, from other large-area surveys. The calibrated digital spectra cover the wavelength region 3800--9200A at a spectral resolution of ~2000. The spectra can be retrieved from the public database using the information provided in the catalog. The average SDSS colors of quasars as a function of redshift, derived from the catalog entries, are presented in tabular form. Approximately 96% of the objects in the catalog were discovered by the SDSS.
△ Less
Submitted 5 April, 2007;
originally announced April 2007.
-
Robust Machine Learning Applied to Astronomical Datasets II: Quantifying Photometric Redshifts for Quasars Using Instance-Based Learning
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers,
Natalie E. Strand,
Stacey L. Alberts,
David Tcheng,
Xavier Llorà
Abstract:
We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs,…
▽ More
We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs, we find that the instance-based photometric redshifts are assigned with no regions of catastrophic failure. Remaining outliers are simply scattered about the ideal relation, in a similar manner to the pattern seen in the optical for normal galaxies at redshifts z < ~1. The instance-based algorithm is trained on a representative sample of the data and pseudo-blind-tested on the remaining unseen data. The variance between the photometric and spectroscopic redshifts is sigma^2 = 0.123 +/- 0.002 (compared to sigma^2 = 0.265 +/- 0.006 for the CZR), and 54.9 +/- 0.7%, 73.3 +/- 0.6%, and 80.7 +/- 0.3% of the objects are within delta z < 0.1, 0.2, and 0.3 respectively. We also match our sample to the Second Data Release of the Galaxy Evolution Explorer legacy data and the resulting 7,642 objects show a further improvement, giving a variance of sigma^2 = 0.054 +/- 0.005, and 70.8 +/- 1.2%, 85.8 +/- 1.0%, and 90.8 +/- 0.7% of objects within delta z < 0.1, 0.2, and 0.3. We show that the improvement is indeed due to the extra information provided by GALEX, by training on the same dataset using purely SDSS photometry, which has a variance of sigma^2 = 0.090 +/- 0.007. Each set of results represents a realistic standard for application to further datasets for which the spectra are representative.
△ Less
Submitted 22 March, 2007; v1 submitted 17 December, 2006;
originally announced December 2006.
-
The 2dF-SDSS LRG and QSO Survey: QSO clustering and the L-z degeneracy
Authors:
J. da Angela,
T. Shanks,
S. M. Croom,
P. Weilbacher,
R. J. Brunner,
W. J. Couch,
L. Miller,
A. D. Myers,
R. C. Nichol,
K. A. Pimbblet,
R. de Propris,
G. T. Richards,
N. P. Ross,
D. P. Schneider,
D. A. Wake
Abstract:
We combine the QSO samples from the 2dF QSO Redshift Survey (2QZ) and the 2dF-SDSS LRG and QSO Survey (2SLAQ) in order to investigate the clustering of z~1.4 QSOs and measure the correlation function. The clustering signal in z-space, projected along the sky direction, is similar to that previously obtained from 2QZ alone. By fitting the z-space correlation function and lifting the degeneracy be…
▽ More
We combine the QSO samples from the 2dF QSO Redshift Survey (2QZ) and the 2dF-SDSS LRG and QSO Survey (2SLAQ) in order to investigate the clustering of z~1.4 QSOs and measure the correlation function. The clustering signal in z-space, projected along the sky direction, is similar to that previously obtained from 2QZ alone. By fitting the z-space correlation function and lifting the degeneracy between beta and Omega_m_0 by using linear theory predictions, we obtain beta(z=1.4) = 0.60+-0.12 and Omega_m_0=0.25+-0.08, implying a value for the QSO bias, b(z=1.4)=1.5+-0.2. We further find that QSO clustering does not depend strongly on luminosity at fixed redshift. This result is inconsistent with the expectation of simple `high peaks' biasing models where more luminous, rare QSOs are assumed to inhabit higher mass haloes. The data are more consistent with models which predict that QSOs of different luminosities reside in haloes of similar mass. We find that halo mass does not evolve strongly with redshift nor depend on QSO luminosity. We finally investigate how black hole mass correlates with luminosity and redshift and ascertain the relation between Eddington efficiency and black hole mass. Our results suggest that QSOs of different luminosities may contain black holes of similar mass.
△ Less
Submitted 14 December, 2006;
originally announced December 2006.
-
Clustering Analyses of 300,000 Photometrically Classified Quasars--II. The Excess on Very Small Scales
Authors:
Adam D. Myers,
Robert J. Brunner,
Gordon T. Richards,
Robert C. Nichol,
Donald P. Schneider,
Neta A. Bahcall
Abstract:
We study quasar clustering on small scales, modeling clustering amplitudes using halo-driven dark matter descriptions. From 91 pairs on scales <35 kpc/h, we detect only a slight excess in quasar clustering over our best-fit large-scale model. Integrated across all redshifts, the implied quasar bias is b_Q = 4.21+/-0.98 (b_Q = 3.93+/-0.71) at ~18 kpc/h (~28 kpc/h). Our best-fit (real-space) power…
▽ More
We study quasar clustering on small scales, modeling clustering amplitudes using halo-driven dark matter descriptions. From 91 pairs on scales <35 kpc/h, we detect only a slight excess in quasar clustering over our best-fit large-scale model. Integrated across all redshifts, the implied quasar bias is b_Q = 4.21+/-0.98 (b_Q = 3.93+/-0.71) at ~18 kpc/h (~28 kpc/h). Our best-fit (real-space) power index is ~-2 (i.e., $ξ(r) \propto r^{-2}$), implying steeper halo profiles than currently found in simulations. Alternatively, quasar binaries with separation <35 kpc/h may trace merging galaxies, with typical dynamical merger times t_d~(610+/-260)m^{-1/2} Myr/h, for quasars of host halo mass m x 10^{12} Msolar/h. We find UVX quasars at ~28 kpc/h cluster >5 times higher at z > 2, than at z < 2, at the $2.0σ$ level. However, as the space density of quasars declines as z increases, an excess of quasar binaries (over expectation) at z > 2 could be consistent with reduced merger rates at z > 2 for the galaxies forming UVX quasars. Comparing our clustering at ~28 kpc/h to a $ξ(r)=(r/4.8\Mpch)^{-1.53}$ power-law, we find an upper limit on any excess of a factor of 4.3+/-1.3, which, noting some caveats, differs from large excesses recently measured for binary quasars, at $2.2σ$. We speculate that binary quasar surveys that are biased to z > 2 may find inflated clustering excesses when compared to models fit at z < 2. We provide details of 111 photometrically classified quasar pairs with separations <0.1'. Spectroscopy of these pairs could significantly constrain quasar dynamics in merging galaxies.
△ Less
Submitted 7 December, 2006;
originally announced December 2006.
-
Clustering Analyses of 300,000 Photometrically Classified Quasars--I. Luminosity and Redshift Evolution in Quasar Bias
Authors:
Adam D. Myers,
Robert J. Brunner,
Robert C. Nichol,
Gordon T. Richards,
Donald P. Schneider,
Neta A. Bahcall
Abstract:
Using ~300,000 photometrically classified quasars, by far the largest quasar sample ever used for such analyses, we study the redshift and luminosity evolution of quasar clustering on scales of ~50 kpc/h to ~20 Mpc/h from redshifts of z~0.75 to z~2.28. We parameterize our clustering amplitudes using realistic dark matter models, and find that a LCDM power spectrum provides a superb fit to our da…
▽ More
Using ~300,000 photometrically classified quasars, by far the largest quasar sample ever used for such analyses, we study the redshift and luminosity evolution of quasar clustering on scales of ~50 kpc/h to ~20 Mpc/h from redshifts of z~0.75 to z~2.28. We parameterize our clustering amplitudes using realistic dark matter models, and find that a LCDM power spectrum provides a superb fit to our data with a redshift-averaged quasar bias of b_Q = 2.41+/-0.08 ($P_{<χ^2}=0.847$) for $σ_8=0.9$. This represents a better fit than the best-fit power-law model ($ω= 0.0493\pm0.0064θ^ {-0.928\pm0.055}$; $P_{<χ^2}=0.482$). We find b_Q increases with redshift. This evolution is significant at >99.6% using our data set alone, increasing to >99.9999% if stellar contamination is not explicitly parameterized. We measure the quasar classification efficiency across our full sample as a = 95.6 +/- ^{4.4}_{1.9}%, a star-quasar separation comparable with the star-galaxy separation in many photometric studies of galaxy clustering. We derive the mean mass of the dark matter halos hosting quasars as MDMH=(5.2+/-0.6)x10^{12} M_solar/h. At z~1.9 we find a $1.5σ$ deviation from luminosity-independent quasar clustering; this suggests that increasing our sample size by a factor of 1.8 could begin to constrain any luminosity dependence in quasar bias at z~2. Our results agree with recent studies of quasar environments at z < 0.4, which detected little luminosity dependence to quasar clustering on proper scales >50 kpc/h. At z < 1.6, our analysis suggests that b_Q is constant with luminosity to within ~0.6, and that, for g < 21, angular quasar autocorrelation measurements are unlikely to have sufficient statistical power at z < 1.6 to detect any luminosity dependence in quasars' clustering.
△ Less
Submitted 7 December, 2006;
originally announced December 2006.