-
Machine learning technique for morphological classification of galaxies from the SDSS. III. Image-based inference of detailed features
Authors:
V. Khramtsov,
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
O. V. Melnyk,
A. A. Elyiv,
V. S. Akhmetov,
A. M. Dmytrenko
Abstract:
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the…
▽ More
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference of visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies.
We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set. We have also found optimal galaxy image transformations to increase the classifier generalization ability. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, likely auto-immunization, when the CNN classifier trained on very good images is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating the human bias.
The accuracy of CNN classifier is in the range of 83.3-99.4 percent depending on 32 features. As a result, for the first time, we assigned the detailed morphological classification for more than 140K low-redshift galaxies, especially at the fainter end. We accentuate on the typical problem points of galaxy CNN image classification from the astronomical point of view. The catalogs will be available through the VizieR.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Machine learning technique for morphological classification of galaxies from SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1
Authors:
I. B. Vavilova,
V. Khramtsov,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk
Abstract:
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies a…
▽ More
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies at 0.02<z<0.1, where morphological five classes and 34 detailed features (bar, rings, number of spiral arms, mergers, etc.) were first defined for 216148 galaxies (inference dataset) by the image-based CNN classifier. For the rest of galaxies the initial morphological classification was re-assigned as in the GZ2 project.
Our method shows the promising performance of morphological classification attaining more 93 % of accuracy for five classes morphology prediction except the cigar-shaped (75 %) and completely rounded (83 %) galaxies. Main results are presented in the catalog of 19468 completely rounded, 27321 rounded in-between, 3235 cigar-shaped, 4099 edge-on, 18615 spiral, and 72738 general low-redshift galaxies of the studied SDSS sample. As for the classification of galaxies by their detailed structural morphological features, our CNN model gives the accuracy in the range of 92-99 % depending on features, a number of galaxies with the given feature in the inference dataset, and the galaxy image quality. We demonstrate that implication of the CNN model with adversarial validation and adversarial image data augmentation improves classification of smaller and fainter SDSS galaxies with $m_{r}$ <17.7.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
Machine learning technique for morphological classification of galaxies from the SDSS. I. Photometry-based approach
Authors:
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk,
V. Khramtsov
Abstract:
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. W…
▽ More
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. We applied its to the sample of galaxies from the SDSS DR9 with 0.02 < z < 0.1 and 24m < Mr < 19.4m. To study the classifier, we used absolute magnitudes: Mu, Mg, Mr , Mi, Mz, Mu-Mr , Mg-Mi, Mu-Mg, Mr-Mz, and inverse concentration index to the center R50/R90. Using the Support vector machine classifier and the data on color indices, absolute magnitudes, inverse concentration index of galaxies with visual morphological types, we were able to classify 316 031 galaxies from the SDSS DR9 with unknown morphological types. Conclusions. The methods of Support Vector Machine and Random Forest with Scikit-learn machine learning in Python provide the highest accuracy for the binary galaxy morphological classification: 96.4% correctly classified (96.1% early E and 96.9% late L types) and 95.5% correctly classified (96.7% early E and 92.8% late L types), respectively. Applying the Support Vector Machine for the sample of 316 031 galaxies from the SDSS DR9 at z < 0.1, we found 141 211 E and 174 820 L types among them.
△ Less
Submitted 8 June, 2021; v1 submitted 24 December, 2017;
originally announced December 2017.
-
Low Density Structures in the Local Universe. I. Diffuse Agglomerates of Galaxies
Authors:
I. D. Karachentsev,
V. E. Karachentseva,
O. V. Melnyk,
A. A. Elyiv,
D. I. Makarov
Abstract:
This paper is the first of a series considering the properties of distribution of nearby galaxies in the low density regions. Among 7596 galaxies with radial velocities V_{LG}<3500 km/s, absolute magnitudes M_K<-18.4^m$, and Galactic latitudes |b| >15 degr there are 3168 field galaxies (i.e. 42%) that do not belong to pairs, groups or clusters in the Local universe. Applying to this sample the per…
▽ More
This paper is the first of a series considering the properties of distribution of nearby galaxies in the low density regions. Among 7596 galaxies with radial velocities V_{LG}<3500 km/s, absolute magnitudes M_K<-18.4^m$, and Galactic latitudes |b| >15 degr there are 3168 field galaxies (i.e. 42%) that do not belong to pairs, groups or clusters in the Local universe. Applying to this sample the percolation method with a radius of r_0=2.8 Mpc, we found 226 diffuse agglomerates with n>=4 number of members. The structures of eight most populated objects among them (n>=25) are discussed. These non-virialized agglomerates are characterized by a median dispersion of radial velocities of about 170 km/s, the linear size of around 6 Mpc, integral K-band luminosity of 3*10^{11} L_sun, and a formal virial-mass-to-luminosity ratio of about 700 M_sun/L_sun. The mean density contrast for the considered agglomerates is only <Delta n/\bar{n}\gtrsim 5, and their crossing time is about 30-40 Gyr.
△ Less
Submitted 30 October, 2012; v1 submitted 24 October, 2012;
originally announced October 2012.
-
The Structure of the Local Supercluster of Galaxies Revealed by the Three-Dimensional Voronoi's Tessellation Method
Authors:
O. V. Melnyk,
A. A. Elyiv,
I. B. Vavilova
Abstract:
3D Voronoi's tessellation method was first applied to identify groups of galaxies in the structure of a supercluster. The sample under consideration consists of more than 7000 galaxies of the Local Supercluster (LS) with radial velocities up to 3100 km/s. Because of an essential non-homogeneity of the LS catalogue, it was proposed to overscale distances in such an ''artificial'' way that the con…
▽ More
3D Voronoi's tessellation method was first applied to identify groups of galaxies in the structure of a supercluster. The sample under consideration consists of more than 7000 galaxies of the Local Supercluster (LS) with radial velocities up to 3100 km/s. Because of an essential non-homogeneity of the LS catalogue, it was proposed to overscale distances in such an ''artificial'' way that the concentration of galaxies was varying as with increase of the distance a power-behaved function with the same exponent beta as for the full homogeneous catalogue. Various parameters of clustering were taking into account: alpha (0.01, 0.1, 1%) as the part of galaxies, which have the relative volume of a Voronoi's cell smaller than the critical one for the random distribution; beta = 0, which fits to the random galaxy distribution; beta = 0.7, which is close to the pancake galaxy distribution. It is revealed that Voronoi's tessellation method depends weakly on beta-parameter, and the number of galaxies in rich structures is growing rather than in poor ones with increase of alpha-parameter. The comparison of the groups derived with the groups obtained by Karachentsev's dynamical method shows that the number of groups, which coincides by all the components, is 22%. As a whole, the dynamical method is more preferred for identifying sparsely populated galaxy groups, whereas 3D Voronoi's tessellation method is preferred for more populated ones.
△ Less
Submitted 8 December, 2007;
originally announced December 2007.