-
Optimized sampling of SDSS-IV MaStar spectra for stellar classification using supervised models
Authors:
R. I. El-Kholy,
Z. M. Hayman
Abstract:
Supervised machine learning models are increasingly being used for solving the problem of stellar classification of spectroscopic data. However, training such models requires a large number of labelled instances, the collection of which is usually costly in both time and expertise. Active learning algorithms minimize training dataset sizes by kee** only the most informative instances. This paper…
▽ More
Supervised machine learning models are increasingly being used for solving the problem of stellar classification of spectroscopic data. However, training such models requires a large number of labelled instances, the collection of which is usually costly in both time and expertise. Active learning algorithms minimize training dataset sizes by kee** only the most informative instances. This paper explores the application of active learning to sampling stellar spectra using data from a highly class-imbalanced dataset. We utilize the MaStar library from the SDSS DR17 along with its associated stellar parameter catalogue. A preprocessing pipeline that includes feature selection, scaling, and dimensionality reduction is applied to the data. Using different active learning algorithms, we iteratively query instances, where the model or committee of models exhibits the highest uncertainty or disagreement, respectively. We assess the effectiveness of the sampling techniques by comparing several performance metrics of supervised-learning models trained on the queried samples with randomly-sampled counterparts. Evaluation metrics include specificity, sensitivity, and the area under the curve; in addition to the Matthew's correlation coefficient, which accounts for class imbalance. We apply this procedure to effective temperature, surface gravity, and iron metallicity, separately. Our results demonstrate the effectiveness of active learning algorithms in selecting samples that produce performance metrics superior to random sampling and even stratified samples, with fewer training instances. Active learning is recommended for prioritizing instance labelling of astronomical-survey data by experts or crowdsourcing to mitigate the high time cost. Its effectiveness can be further exploited in selection of targets for follow-up observations in automated astronomical surveys.
△ Less
Submitted 29 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Global Estimation of Range Resolved Thermodynamic Profiles from MicroPulse Differential Absorption Lidar
Authors:
Matthew Hayman,
Robert A. Stillwell,
Adam Karboski,
Willem J. Marais,
Scott M. Spuler
Abstract:
We demonstrate thermodynamic profile estimation with data obtained using the MicroPulse DIAL such that the retrieval is entirely self contained. The only external input is surface meteorological variables obtained from a weather station installed on the instrument. The estimator provides products of temperature, absolute humidity and backscatter ratio such that cross dependencies between the lidar…
▽ More
We demonstrate thermodynamic profile estimation with data obtained using the MicroPulse DIAL such that the retrieval is entirely self contained. The only external input is surface meteorological variables obtained from a weather station installed on the instrument. The estimator provides products of temperature, absolute humidity and backscatter ratio such that cross dependencies between the lidar data products and raw observations are accounted for and the final products are self consistent. The method described here is applied to a combined oxygen DIAL, potassium HSRL, water vapor DIAL system operating at two pairs of wavelengths (nominally centered at 770 and 828 nm). We perform regularized maximum likelihood estimation through the Poisson Total Variation technique to suppress noise and improve the range of the observations. A comparison to 119 radiosondes indicates that this new processing method produces improved temperature retrievals, reducing total errors to less than 2 K below 3 km altitude and extending the maximum altitude of temperature retrievals to 5 km with less than 3 K error. The results of this work definitively demonstrates the potential for measuring temperature through the oxygen DIAL technique and furthermore that this can be accomplished with low-power semiconductor-based lidar sensors.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
2D Signal Estimation for Sparse Distributed Target Photon Counting Data
Authors:
Matthew Hayman,
Robert A. Stillwell,
Josh Carnes,
Grant J. Kirchhoff,
Scott M. Spuler,
Jeffrey P. Thayer
Abstract:
In this study, we explore the utilization of maximum likelihood estimation for the analysis of sparse photon counting data obtained from distributed target lidar systems. Specifically, we adapt the Poisson Total Variation processing technique to cater to this application. By assuming a Poisson noise model for the photon count observations, our approach yields denoised estimates of backscatter phot…
▽ More
In this study, we explore the utilization of maximum likelihood estimation for the analysis of sparse photon counting data obtained from distributed target lidar systems. Specifically, we adapt the Poisson Total Variation processing technique to cater to this application. By assuming a Poisson noise model for the photon count observations, our approach yields denoised estimates of backscatter photon flux and related parameters. This facilitates the processing of raw photon counting signals with exceptionally high temporal and range resolutions (demonstrated here to 50 Hz and 75 cm resolutions), including data acquired through time-correlated single photon counting, without significant sacrifice of resolution. Through examination involving both simulated and real-world 2D atmospheric data, our method consistently demonstrates superior accuracy in signal recovery compared to the conventional histogram-based approach commonly employed in distributed target lidar applications.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Mimicking non-ideal instrument behavior for hologram processing using neural style translation
Authors:
John S. Schreck,
Matthew Hayman,
Gabrielle Gantos,
Aaron Bansemer,
David John Gagne
Abstract:
Holographic cloud probes provide unprecedented information on cloud particle density, size and position. Each laser shot captures particles within a large volume, where images can be computationally refocused to determine particle size and shape. However, processing these holograms, either with standard methods or with machine learning (ML) models, requires considerable computational resources, ti…
▽ More
Holographic cloud probes provide unprecedented information on cloud particle density, size and position. Each laser shot captures particles within a large volume, where images can be computationally refocused to determine particle size and shape. However, processing these holograms, either with standard methods or with machine learning (ML) models, requires considerable computational resources, time and occasional human intervention. ML models are trained on simulated holograms obtained from the physical model of the probe since real holograms have no absolute truth labels. Using another processing method to produce labels would be subject to errors that the ML model would subsequently inherit. Models perform well on real holograms only when image corruption is performed on the simulated images during training, thereby mimicking non-ideal conditions in the actual probe (Schreck et. al, 2022). Optimizing image corruption requires a cumbersome manual labeling effort.
Here we demonstrate the application of the neural style translation approach (Gatys et. al, 2016) to the simulated holograms. With a pre-trained convolutional neural network (VGG-19), the simulated holograms are ``stylized'' to resemble the real ones obtained from the probe, while at the same time preserving the simulated image ``content'' (e.g. the particle locations and sizes). Two image similarity metrics concur that the stylized images are more like real holograms than the synthetic ones. With an ML model trained to predict particle locations and shapes on the stylized data sets, we observed comparable performance on both simulated and real holograms, obviating the need to perform manual labeling. The described approach is not specific to hologram images and could be applied in other domains for capturing noise and imperfections in observational instruments to make simulated data more like real world observations.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Neural network processing of holographic images
Authors:
John S. Schreck,
Gabrielle Gantos,
Matthew Hayman,
Aaron Bansemer,
David John Gagne
Abstract:
HOLODEC, an airborne cloud particle imager, captures holographic images of a fixed volume of cloud to characterize the types and sizes of cloud particles, such as water droplets and ice crystals. Cloud particle properties include position, diameter, and shape. We present a hologram processing algorithm, HolodecML, that utilizes a neural segmentation model, GPUs, and computational parallelization.…
▽ More
HOLODEC, an airborne cloud particle imager, captures holographic images of a fixed volume of cloud to characterize the types and sizes of cloud particles, such as water droplets and ice crystals. Cloud particle properties include position, diameter, and shape. We present a hologram processing algorithm, HolodecML, that utilizes a neural segmentation model, GPUs, and computational parallelization. HolodecML is trained using synthetically generated holograms based on a model of the instrument, and predicts masks around particles found within reconstructed images. From these masks, the position and size of the detected particles can be characterized in three dimensions. In order to successfully process real holograms, we find we must apply a series of image corrupting transformations and noise to the synthetic images used in training.
In this evaluation, HolodecML had comparable position and size estimation performance to the standard processing method, but improved particle detection by nearly 20\% on several thousand manually labeled HOLODEC images. However, the improvement only occurred when image corruption was performed on the simulated images during training, thereby mimicking non-ideal conditions in the actual probe. The trained model also learned to differentiate artifacts and other impurities in the HOLODEC images from the particles, even though no such objects were present in the training data set, while the standard processing method struggled to separate particles from artifacts. The novelty of the training approach, which leveraged noise as a means for parameterizing non-ideal aspects of the HOLODEC detector, could be applied in other domains where the theoretical model is incapable of fully describing the real-world operation of the instrument and accurate truth data required for supervised learning cannot be obtained from real-world observations.
△ Less
Submitted 18 March, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
X-ray properties of X-CLASS-redMaPPer galaxy cluster sample: The luminosity-temperature relation
Authors:
Mona Molham,
Nicolas Clerc,
Ali Takey,
Tatyana Sadibekova,
A. B. Morcos,
Shahinaz Yousef,
Z. M. Hayman,
Maggie Lieu,
Somak Raychaudhury,
Evelina R. Gaynullina
Abstract:
This paper presents results of a spectroscopic analysis of the X-CLASS-redMaPPer (XC1-RM) galaxy cluster sample. X-CLASS is a serendipitous search for clusters in the X-ray wavebands based on the XMM-Newton archive, whereas redMaPPer is an optical cluster catalogue derived from the Sloan Digital Sky Survey (SDSS). The present sample comprises 92 X-ray extended sources identified in optical images…
▽ More
This paper presents results of a spectroscopic analysis of the X-CLASS-redMaPPer (XC1-RM) galaxy cluster sample. X-CLASS is a serendipitous search for clusters in the X-ray wavebands based on the XMM-Newton archive, whereas redMaPPer is an optical cluster catalogue derived from the Sloan Digital Sky Survey (SDSS). The present sample comprises 92 X-ray extended sources identified in optical images within 1\arcmin~separation. The area covered by the cluster sample is $\sim$ 27 deg$^{2}$. The clusters span a wide redshift range (0.05 < z < 0.6) and 88 clusters benefit from spectrosopically confirmed redshifts using data from SDSS Data Release 14. We present an automated pipeline to derive the X-ray properties of the clusters in three distinct apertures: R\textsubscript{500} (at fixed mass overdensity), R\textsubscript{fit} (at fixed signal-to-noise ratio), R\textsubscript{300kpc} (fixed physical radius). The sample extends over wide temperature and luminosity ranges: from 1 to 10 keV and from 6$\times$10$^{42}$ to 11$\times$10$^{44}$ erg\,s$^{-1}$, respectively. We investigate the luminosity-temperature (L-T) relation of the XC1-RM sample and find a slope equals to 3.03 $\pm$ 0.26. It is steeper than predicted by self-similar assumptions, in agreement with independent studies. A simplified approach is developed to estimate the amount and impact of selection biases which might be affecting our recovered L-T parameters. The result of this simulation process suggests that the measured L-T relation is biased to a steeper slope and higher normalization.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Color similarity study among small galaxy groups members
Authors:
Ahmed M. Fouad,
Z. Awad,
A. A. Shaker,
Z. M. Hayman
Abstract:
We applied a membership test based on the color similarity of group members to detect the discordant galaxies in small groups (quintets) that had been determined by the Friends-of-Friends (FoF) algorithm. Our method depends on the similarity of the color indices (u-g) and (g-r) of the group members. The chosen sample of quintets was extracted from "Flux- and volume-limited groups for SDSS galaxies…
▽ More
We applied a membership test based on the color similarity of group members to detect the discordant galaxies in small groups (quintets) that had been determined by the Friends-of-Friends (FoF) algorithm. Our method depends on the similarity of the color indices (u-g) and (g-r) of the group members. The chosen sample of quintets was extracted from "Flux- and volume-limited groups for SDSS galaxies" catalog which is a spectroscopic sample of galaxies originally taken from the Sloan Digital Sky Survey - Data Release 10 (SDSS-DR10). The sample included 282 quintets with a total number of 1410 galaxies. The similarity measure used in this study is the Euclidean distance. The calculations showed that 73.4% of the group samples (207 out of 282 quintet groups) have galaxies with similar colors (u-g) and (g-r). Each of the remainder groups (75 systems) has an interloper galaxy with different colors than the other members, and hence they became quadrants. We found that group members tend to be more luminous than outliers. We conclude that using the similarities in the color indices between group members gives better identification of group membership.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.
-
Fundamental parameters of isolated galaxy triplets in the local Universe: Statistical study
Authors:
Amira A. Tawfeek,
Gamal B. Ali,
Ali Takey,
Zainab Awad,
Z. M. Hayman
Abstract:
Understanding the dynamics of galaxy triplet systems is one of the significant ways of obtaining insight into the dynamics of large galaxy clusters. Toward that aim, we present a detailed study of all isolated triplet systems (total of 315) taken from the `SDSS-based catalogue of Isolated Triplets' (SIT). In addition, we compared our results with those obtained for a sample of triplets from the Lo…
▽ More
Understanding the dynamics of galaxy triplet systems is one of the significant ways of obtaining insight into the dynamics of large galaxy clusters. Toward that aim, we present a detailed study of all isolated triplet systems (total of 315) taken from the `SDSS-based catalogue of Isolated Triplets' (SIT). In addition, we compared our results with those obtained for a sample of triplets from the Local Supercluster (LS), SDSS-triplets, Tully's catalogue, Wide (W) and Compact (K)-triplets. In addition, we performed the correlation between the dynamical parameters and the Large Scale Structure (LSS). Interestingly, we found that there is no correlation between both the mean projected separation for the triplet systems and the LSS and its dynamical parameters. Furthermore, we found that only 3 percent of these systems can be considered as compact since the mean harmonic separation (rh) is more than 0.4 Mpc for 97 percent of the population.Thus we may conclude that, mergers might not have played a dominant role in their evolution.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
A Photometric Study of Four Recently Discovered Contact Binaries: 1SWASP J064501.21+342154.9, 1SWASP J155822.10-025604.8, 1SWASP J212808.86+151622.0 and UCAC4 436-062932
Authors:
G. Djurašević,
A. Essam,
O. Latković,
A. Cséki,
M. A. El-Sadek,
M. S. Abo-Elala,
Z. M. Hayman
Abstract:
We present new, high-quality multicolor observations of four recently discovered contact binaries: 1SWASP J064501.21+342154.9, 1SWASP J155822.10-025604.8, 1SWASP J212808.86+151622.0, and UCAC4 436-062932, and analyze their light curves to determine orbital and physical parameters using the modeling program of G. Djurašević. In the absence of spectroscopic observations, the effective temperatures o…
▽ More
We present new, high-quality multicolor observations of four recently discovered contact binaries: 1SWASP J064501.21+342154.9, 1SWASP J155822.10-025604.8, 1SWASP J212808.86+151622.0, and UCAC4 436-062932, and analyze their light curves to determine orbital and physical parameters using the modeling program of G. Djurašević. In the absence of spectroscopic observations, the effective temperatures of the brighter components are estimated from the color indices, and the mass ratios are determined with the q-search method. The analysis shows that all four systems are W UMa type binaries in shallow contact configurations, consisting of late-type main-sequence primaries and evolved secondaries with active surface regions (dark or bright spots) resulting from magnetic activity or ongoing transfer of thermal energy between the components. We compare the derived orbital and stellar parameters for these four variables with a large sample of previously analyzed W UMa stars and find that our results fit it well.
△ Less
Submitted 23 August, 2016;
originally announced August 2016.