-
Hunting for exocomet transits in the TESS database using the Random Forest method
Authors:
D. V. Dobrycheva,
M. Yu. Vasylenko,
I. V. Kulyk,
Ya. V. Pavlenko,
O. S. Shubina,
I. V. Luk'yanyk,
P. P. Korsun
Abstract:
This study introduces an approach to detecting exocomet transits in the dataset of the Transiting Exoplanet Survey Satellite (TESS), specifically within its Sector 1. Given the limited number of exocomet transits detected in the observed light curves, creating a sufficient training sample for the machine learning method was challenging. We developed a unique training sample by encapsulating simula…
▽ More
This study introduces an approach to detecting exocomet transits in the dataset of the Transiting Exoplanet Survey Satellite (TESS), specifically within its Sector 1. Given the limited number of exocomet transits detected in the observed light curves, creating a sufficient training sample for the machine learning method was challenging. We developed a unique training sample by encapsulating simulated asymmetric transit profiles into observed light curves, thereby creating realistic data for the model training. To analyze these light curves, we employed the TSFresh software, which was a tool for extracting key features that were then used to refine our Random Forest model training. Considering that cometary transits typically exhibit a small depth, less than 1% of the star's brightness, we chose to limit our sample to the CDPP parameter. Our study focused on two target samples: light curves with a CDPP of less than 40 ppm and light curves with a CDPP of up to 150 ppm. Each sample was accompanied by a corresponding training set. This methodology achieved an accuracy of approximately 96%, with both precision and recall rates exceeding 95% and a balanced F1-score of around 96%. This level of accuracy was effective in distinguishing between 'exocomet candidate' and 'non-candidate' classifications for light curves with a CDPP of less than 40 ppm, and our model identified 12 potential exocomet candidates. However, when applying machine learning to less accurate light curves (CDPP up to 150 ppm), we noticed a significant increase in curves that could not be confidently classified, but even in this case, our model identified 20 potential exocomet candidates. These promising results within Sector 1 motivate us to extend our analysis across all TESS sectors to detect and study comet-like activity in the extrasolar planetary systems.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Machine learning technique for morphological classification of galaxies from the SDSS. III. Image-based inference of detailed features
Authors:
V. Khramtsov,
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
O. V. Melnyk,
A. A. Elyiv,
V. S. Akhmetov,
A. M. Dmytrenko
Abstract:
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the…
▽ More
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference of visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies.
We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set. We have also found optimal galaxy image transformations to increase the classifier generalization ability. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, likely auto-immunization, when the CNN classifier trained on very good images is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating the human bias.
The accuracy of CNN classifier is in the range of 83.3-99.4 percent depending on 32 features. As a result, for the first time, we assigned the detailed morphological classification for more than 140K low-redshift galaxies, especially at the fainter end. We accentuate on the typical problem points of galaxy CNN image classification from the astronomical point of view. The catalogs will be available through the VizieR.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Machine learning technique for morphological classification of galaxies from SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1
Authors:
I. B. Vavilova,
V. Khramtsov,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk
Abstract:
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies a…
▽ More
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies at 0.02<z<0.1, where morphological five classes and 34 detailed features (bar, rings, number of spiral arms, mergers, etc.) were first defined for 216148 galaxies (inference dataset) by the image-based CNN classifier. For the rest of galaxies the initial morphological classification was re-assigned as in the GZ2 project.
Our method shows the promising performance of morphological classification attaining more 93 % of accuracy for five classes morphology prediction except the cigar-shaped (75 %) and completely rounded (83 %) galaxies. Main results are presented in the catalog of 19468 completely rounded, 27321 rounded in-between, 3235 cigar-shaped, 4099 edge-on, 18615 spiral, and 72738 general low-redshift galaxies of the studied SDSS sample. As for the classification of galaxies by their detailed structural morphological features, our CNN model gives the accuracy in the range of 92-99 % depending on features, a number of galaxies with the given feature in the inference dataset, and the galaxy image quality. We demonstrate that implication of the CNN model with adversarial validation and adversarial image data augmentation improves classification of smaller and fainter SDSS galaxies with $m_{r}$ <17.7.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
New exocomets of $β$ Pic
Authors:
Ya. Pavlenko,
I. Kulyk,
O. Shubina,
M. Vasylenko,
D. Dobrycheva,
P. Korsun
Abstract:
Aims. The aim of our work is to analyze the light curves of $β$ Pic recently observed by TESS in sectors 32, 33, and 34, searching for the signatures of exocomet transits. Methods. We process the $β$ Pic light curves from the MAST database, applying the frequency analysis to remove harmonic signals due to the star's pulsations and use a simple 1-D model to fit the profiles of the found events. Res…
▽ More
Aims. The aim of our work is to analyze the light curves of $β$ Pic recently observed by TESS in sectors 32, 33, and 34, searching for the signatures of exocomet transits. Methods. We process the $β$ Pic light curves from the MAST database, applying the frequency analysis to remove harmonic signals due to the star's pulsations and use a simple 1-D model to fit the profiles of the found events. Results. We recover events previously found by other authors in sectors 5 and 6 and find five new distinct aperiodic dip** events with asymmetric shapes resembling the expected profiles due to the passage of a comet-like body across the star disk. These dips are rather shallow, with the flux drop at a level of 0.03\% and a duration of less than 1 day. No periodic transits were found in the sectors investigated. Conclusions. The depth and duration of the identified dips are similar to the recently discovered transits in the $β$ Pic light curves from sector 5 of the TESS observations as well as to those found in the light curves of KIC 354116 and KIC 1108472 from the Kepler database. It indicates that aperiodic shallow dips are not likely an exceptional phenomenon, at least for the $β$ Pic system.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
The Voronoi tessellation method in astronomy
Authors:
Irina Vavilova,
Andrii Elyiv,
Daria Dobrycheva,
Olga Melnyk
Abstract:
The Voronoi tessellation is a natural way of space segmentation, which has many applications in various fields of science and technology, as well as in social sciences and visual art. The varieties of the Voronoi tessellation methods are commonly used in computational fluid dynamics, computational geometry, geolocation and logistics, game dev programming, cartography, engineering, liquid crystal e…
▽ More
The Voronoi tessellation is a natural way of space segmentation, which has many applications in various fields of science and technology, as well as in social sciences and visual art. The varieties of the Voronoi tessellation methods are commonly used in computational fluid dynamics, computational geometry, geolocation and logistics, game dev programming, cartography, engineering, liquid crystal electronic technology, machine learning, etc. The very innovative results were obtained in astronomy, namely for a large-scale galaxy distribution and cosmic web pattern, for revealing the quasi-periodicity in a pencil-beam survey, for a description of constraints on the isotropic cosmic microwave background and the explosion scenario likely supernova events, for image processing, adaptive smoothing, segmentation, for signal-to-noise ratio balancing, for spectrography data analysis as well as in the moving-mesh cosmology simulation. We briefly describe these results, paying more attention to the practical application of the Voronoi tessellation related to the spatial large-scale galaxy distribution.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Machine-learning computation of distance modulus for local galaxies
Authors:
A. Elyiv,
O. Melnyk,
I. Vavilova,
D. Dobrycheva,
V. Karachentseva
Abstract:
Quickly growing computing facilities and an increasing number of extragalactic observations encourage the application of data-driven approaches to uncover hidden relations from astronomical data. In this work we raise the problem of distance reconstruction for a large number of galaxies from available extensive observations. We propose a new data-driven approach for computing distance moduli for l…
▽ More
Quickly growing computing facilities and an increasing number of extragalactic observations encourage the application of data-driven approaches to uncover hidden relations from astronomical data. In this work we raise the problem of distance reconstruction for a large number of galaxies from available extensive observations. We propose a new data-driven approach for computing distance moduli for local galaxies based on the machine-learning regression as an alternative to physically oriented methods. We use key observable parameters for a large number of galaxies as input explanatory variables for training: magnitudes in U, B, I, and K bands, corresponding colour indices, surface brightness, angular size, radial velocity, and coordinates. We performed detailed tests of the five machine-learning regression techniques for inference of $m-M$: linear, polynomial, k-nearest neighbours, gradient boosting, and artificial neural network regression. As a test set we selected 91 760 galaxies at $z<0.2$ from the NASA/IPAC extragalactic database with distance moduli measured by different independent redshift methods. We find that the most effective and precise is the neural network regression model with two hidden layers. The obtained root-mean-square error of 0.35 mag, which corresponds to a relative error of 16\%, does not depend on the distance to galaxy and is comparable with methods based on the Tully-Fisher and Fundamental Plane relations. The proposed model shows a 0.44 mag (20\%) error in the case of spectroscopic redshift absence and is complementary to existing photometric redshift methodologies. Our approach has great potential for obtaining distance moduli for around 250 000 galaxies at $z<0.2$ for which the above-mentioned parameters are already observed.
△ Less
Submitted 2 March, 2020; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Machine learning technique for morphological classification of galaxies from the SDSS. I. Photometry-based approach
Authors:
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk,
V. Khramtsov
Abstract:
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. W…
▽ More
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. We applied its to the sample of galaxies from the SDSS DR9 with 0.02 < z < 0.1 and 24m < Mr < 19.4m. To study the classifier, we used absolute magnitudes: Mu, Mg, Mr , Mi, Mz, Mu-Mr , Mg-Mi, Mu-Mg, Mr-Mz, and inverse concentration index to the center R50/R90. Using the Support vector machine classifier and the data on color indices, absolute magnitudes, inverse concentration index of galaxies with visual morphological types, we were able to classify 316 031 galaxies from the SDSS DR9 with unknown morphological types. Conclusions. The methods of Support Vector Machine and Random Forest with Scikit-learn machine learning in Python provide the highest accuracy for the binary galaxy morphological classification: 96.4% correctly classified (96.1% early E and 96.9% late L types) and 95.5% correctly classified (96.7% early E and 92.8% late L types), respectively. Applying the Support Vector Machine for the sample of 316 031 galaxies from the SDSS DR9 at z < 0.1, we found 141 211 E and 174 820 L types among them.
△ Less
Submitted 8 June, 2021; v1 submitted 24 December, 2017;
originally announced December 2017.