-
Galaxy spectroscopy without spectra: Galaxy properties from photometric images with conditional diffusion models
Authors:
Lars Doorenbos,
Eva Sextl,
Kevin Heng,
Stefano Cavuoti,
Massimo Brescia,
Olena Torbaniuk,
Giuseppe Longo,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Modern spectroscopic surveys can only target a small fraction of the vast amount of photometrically cataloged sources in wide-field surveys. Here, we report the development of a generative AI method capable of predicting optical galaxy spectra from photometric broad-band images alone. This method draws from the latest advances in diffusion models in combination with contrastive networks. We pass m…
▽ More
Modern spectroscopic surveys can only target a small fraction of the vast amount of photometrically cataloged sources in wide-field surveys. Here, we report the development of a generative AI method capable of predicting optical galaxy spectra from photometric broad-band images alone. This method draws from the latest advances in diffusion models in combination with contrastive networks. We pass multi-band galaxy images into the architecture to obtain optical spectra. From these, robust values for galaxy properties can be derived with any methods in the spectroscopic toolbox, such as standard population synthesis techniques and Lick indices. When trained and tested on 64x64-pixel images from the Sloan Digital Sky Survey, the global bimodality of star-forming and quiescent galaxies in photometric space is recovered, as well as a mass-metallicity relation of star-forming galaxies. The comparison between the observed and the artificially created spectra shows good agreement in overall metallicity, age, Dn4000, stellar velocity dispersion, and E(B-V) values. Photometric redshift estimates of our generative algorithm can compete with other current, specialized deep-learning techniques. Moreover, this work is the first attempt in the literature to infer velocity dispersion from photometric images. Additionally, we can predict the presence of an active galactic nucleus up to an accuracy of 82%. With our method, scientifically interesting galaxy properties, normally requiring spectroscopic inputs, can be obtained in future data sets from large-scale photometric surveys alone. The spectra prediction via AI can further assist in creating realistic mock catalogs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Euclid: Identification of asteroid streaks in simulated images using deep learning
Authors:
M. Pöntinen,
M. Granvik,
A. A. Nucita,
L. Conversi,
B. Altieri,
B. Carry,
C. M. O'Riordan,
D. Scott,
N. Aghanim,
A. Amara,
L. Amendola,
N. Auricchio,
M. Baldi,
D. Bonino,
E. Branchini,
M. Brescia,
S. Camera,
V. Capobianco,
C. Carbone,
J. Carretero,
M. Castellano,
S. Cavuoti,
A. Cimatti,
R. Cledassou,
G. Congedo
, et al. (92 additional authors not shown)
Abstract:
Up to 150000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the Strea…
▽ More
Up to 150000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the StreakDet software was previously tested, but the results were not optimal for short and/or faint streaks. We set out to improve the capability to detect asteroid streaks in Euclid images by using deep learning.
We built, trained, and tested a three-step machine-learning pipeline with simulated Euclid images. First, a convolutional neural network (CNN) detected streaks and their coordinates in full images, aiming to maximize the completeness (recall) of detections. Then, a recurrent neural network (RNN) merged snippets of long streaks detected in several parts by the CNN. Lastly, gradient-boosted trees (XGBoost) linked detected streaks between different Euclid exposures to reduce the number of false positives and improve the purity (precision) of the sample.
The deep-learning pipeline surpasses the completeness and reaches a similar level of purity of a non-machine-learning pipeline based on the StreakDet software. Additionally, the deep-learning pipeline can detect asteroids 0.25-0.5 magnitudes fainter than StreakDet. The deep-learning pipeline could result in a 50% increase in the number of detected asteroids compared to the StreakDet software. There is still scope for further refinement, particularly in improving the accuracy of streak coordinates and enhancing the completeness of the final stage of the pipeline, which involves linking detections across multiple exposures.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
ULISSE: A Tool for One-shot Sky Exploration and its Application to Active Galactic Nuclei Detection
Authors:
Lars Doorenbos,
Olena Torbaniuk,
Stefano Cavuoti,
Maurizio Paolillo,
Giuseppe Longo,
Massimo Brescia,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time-consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods. We propose ULISSE, a new deep learning tool that, starting from a single pr…
▽ More
Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time-consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods. We propose ULISSE, a new deep learning tool that, starting from a single prototype object, is capable of identifying objects sharing the same morphological and photometric properties, and hence of creating a list of candidate sosia. In this work, we focus on applying our method to the detection of AGN candidates in a Sloan Digital Sky Survey galaxy sample, since the identification and classification of Active Galactic Nuclei (AGN) in the optical band still remains a challenging task in extragalactic astronomy. Intended for the initial exploration of large sky surveys, ULISSE directly uses features extracted from the ImageNet dataset to perform a similarity search. The method is capable of rapidly identifying a list of candidates, starting from only a single image of a given prototype, without the need for any time-consuming neural network training. Our experiments show ULISSE is able to identify AGN candidates based on a combination of host galaxy morphology, color and the presence of a central nuclear source, with a retrieval efficiency ranging from 21% to 65% (including composite sources) depending on the prototype, where the random guess baseline is 12%. We find ULISSE to be most effective in retrieving AGN in early-type host galaxies, as opposed to prototypes with spiral- or late-type properties. Based on the results described in this work, ULISSE can be a promising tool for selecting different types of astrophysical objects in current and future wide-field surveys (e.g. Euclid, LSST etc.) that target millions of sources every single night.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Machine learning based data mining for Milky Way filamentary structures reconstruction
Authors:
Giuseppe Riccio,
Stefano Cavuoti,
Eugenio Schisano,
Massimo Brescia,
Amata Mercurio,
Davide Elia,
Milena Benedettini,
Stefano Pezzuto,
Sergio Molinari,
Anna Maria Di Giorgio
Abstract:
We present an innovative method called FilExSeC (Filaments Extraction, Selection and Classification), a data mining tool developed to investigate the possibility to refine and optimize the shape reconstruction of filamentary structures detected with a consolidated method based on the flux derivative analysis, through the column-density maps computed from Herschel infrared Galactic Plane Survey (Hi…
▽ More
We present an innovative method called FilExSeC (Filaments Extraction, Selection and Classification), a data mining tool developed to investigate the possibility to refine and optimize the shape reconstruction of filamentary structures detected with a consolidated method based on the flux derivative analysis, through the column-density maps computed from Herschel infrared Galactic Plane Survey (Hi-GAL) observations of the Galactic plane. The present methodology is based on a feature extraction module followed by a machine learning model (Random Forest) dedicated to select features and to classify the pixels of the input images. From tests on both simulations and real observations the method appears reliable and robust with respect to the variability of shape and distribution of filaments. In the cases of highly defined filament structures, the presented method is able to bridge the gaps among the detected fragments, thus improving their shape reconstruction. From a preliminary "a posteriori" analysis of derived filament physical parameters, the method appears potentially able to add a sufficient contribution to complete and refine the filament reconstruction.
△ Less
Submitted 11 October, 2016; v1 submitted 25 May, 2015;
originally announced May 2015.
-
Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation
Authors:
Sabina Tangaro,
Nicola Amoroso,
Massimo Brescia,
Stefano Cavuoti,
Andrea Chincarini,
Rosangela Errico,
Paolo Inglese,
Giuseppe Longo,
Rosalia Maglietta,
Andrea Tateo,
Giuseppe Riccio,
Roberto Bellotti
Abstract:
Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, rob…
▽ More
Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 features for each voxel (sequential backward elimination) we obtained comparable state of-the-art performances with respect to the standard tool FreeSurfer.
△ Less
Submitted 16 January, 2015;
originally announced January 2015.
-
Genetic Algorithm Modeling with GPU Parallel Computing Technology
Authors:
Stefano Cavuoti,
Mauro Garofalo,
Massimo Brescia,
Antonio Pescapé,
Giuseppe Longo,
Giorgio Ventre
Abstract:
We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machin…
▽ More
We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability.
△ Less
Submitted 23 November, 2012;
originally announced November 2012.
-
DAME: A Distributed Data Mining & Exploration Framework within the Virtual Observatory
Authors:
M. Brescia,
S. Cavuoti,
R. D'Abrusco,
O. Laurino,
G. Longo
Abstract:
Nowadays, many scientific areas share the same broad requirements of being able to deal with massive and distributed datasets while, when possible, being integrated with services and applications. In order to solve the growing gap between the incremental generation of data and our understanding of it, it is required to know how to access, retrieve, analyze, mine and integrate data from disparate s…
▽ More
Nowadays, many scientific areas share the same broad requirements of being able to deal with massive and distributed datasets while, when possible, being integrated with services and applications. In order to solve the growing gap between the incremental generation of data and our understanding of it, it is required to know how to access, retrieve, analyze, mine and integrate data from disparate sources. One of the fundamental aspects of any new generation of data mining software tool or package which really wants to become a service for the community is the possibility to use it within complex workflows which each user can fine tune in order to match the specific demands of his scientific goal. These workflows need often to access different resources (data, providers, computing facilities and packages) and require a strict interoperability on (at least) the client side. The project DAME (DAta Mining & Exploration) arises from these requirements by providing a distributed WEB-based data mining infrastructure specialized on Massive Data Sets exploration with Soft Computing methods. Originally designed to deal with astrophysical use cases, where first scientific application examples have demonstrated its effectiveness, the DAME Suite results as a multi-disciplinary platform-independent tool perfectly compliant with modern KDD (Knowledge Discovery in Databases) requirements and Information & Communication Technology trends.
△ Less
Submitted 4 December, 2011;
originally announced December 2011.
-
The DAME/VO-Neural Infrastructure: an Integrated Data Mining System Support for the Science Community
Authors:
M. Brescia,
A. Corazza,
S. Cavuoti,
G. d'Angelo,
R. D'Abrusco,
C. Donalek,
S. G. Djorgovski,
N. Deniskina,
M. Fiore,
M. Garofalo,
O. Laurino,
G. Longo A. Mahabal,
F. Manna,
A. Nocella,
B. Skordovski
Abstract:
Astronomical data are gathered through a very large number of heterogeneous techniques and stored in very diversified and often incompatible data repositories. Moreover in the e-science environment, it is needed to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise and/or external resource sharing and servi…
▽ More
Astronomical data are gathered through a very large number of heterogeneous techniques and stored in very diversified and often incompatible data repositories. Moreover in the e-science environment, it is needed to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed by different resources within a single enterprise and/or external resource sharing and service provider relationships. The DAME/VONeural project, run jointly by the University Federico II, INAF (National Institute of Astrophysics) Astronomical Observatories of Napoli and the California Institute of Technology, aims at creating a single, sustainable, distributed e-infrastructure for data mining and exploration in massive data sets, to be offered to the astronomical (but not only) community as a web application. The framework makes use of distributed computing environments (e.g. S.Co.P.E.) and matches the international IVOA standards and requirements. The integration process is technically challenging due to the need of achieving a specific quality of service when running on top of different native platforms. In these terms, the result of the DAME/VO-Neural project effort will be a service-oriented architecture, obtained by using appropriate standards and incorporating Grid paradigms and restful Web services frameworks where needed, that will have as main target the integration of interdisciplinary distributed systems within and across organizational domains.
△ Less
Submitted 4 December, 2011;
originally announced December 2011.
-
DAME: A Web Oriented Infrastructure for Scientific Data Mining & Exploration
Authors:
Massimo Brescia,
Giuseppe Longo,
George S. Djorgovski,
Stefano Cavuoti,
Raffaele D'Abrusco,
Ciro Donalek,
Alessandro Di Guido,
Michelangelo Fiore,
Mauro Garofalo,
Omar Laurino,
Ashish Mahabal,
Francesco Manna,
Alfonso Nocella,
Giovanni d'Angelo,
Maurizio Paolillo
Abstract:
Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, VO…
▽ More
Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, VObs compliant, distributed data mining infrastructure specialized in Massive Data Sets exploration with machine learning methods. Initially fine tuned to deal with astronomical data only, DAME has evolved in a general purpose platform which has found applications also in other domains of human endeavor. We present the products and a short outline of a science case, together with a detailed description of main features available in the beta release of the web application now released.
△ Less
Submitted 7 December, 2010; v1 submitted 23 October, 2010;
originally announced October 2010.
-
Astrophysics in S.Co.P.E
Authors:
M. Brescia,
S. Cavuoti,
G. D'Angelo,
R. D'Abrusco,
C. Donalek,
N. Deniskina,
O. Laurino,
G. Longo
Abstract:
S.Co.P.E. is one of the four projects funded by the Italian Government in order to provide Southern Italy with a distributed computing infrastructure for fundamental science. Beside being aimed at building the infrastructure, S.Co.P.E. is also actively pursuing research in several areas among which astrophysics and observational cosmology. We shortly summarize the most significant results obtain…
▽ More
S.Co.P.E. is one of the four projects funded by the Italian Government in order to provide Southern Italy with a distributed computing infrastructure for fundamental science. Beside being aimed at building the infrastructure, S.Co.P.E. is also actively pursuing research in several areas among which astrophysics and observational cosmology. We shortly summarize the most significant results obtained in the first two years of the project and related to the development of middleware and Data Mining tools for the Virtual Observatory.
△ Less
Submitted 7 July, 2008;
originally announced July 2008.
-
GRID-Launcher v.1.0
Authors:
N. Deniskina,
M. Brescia,
S. Cavuoti,
G. d'Angelo,
O. Laurino,
G. Longo
Abstract:
GRID-launcher-1.0 was built within the VO-Tech framework, as a software interface between the UK-ASTROGRID and a generic GRID infrastructures in order to allow any ASTROGRID user to launch on the GRID computing intensive tasks from the ASTROGRID Workbench or Desktop. Even though of general application, so far the Grid-Launcher has been tested on a few selected softwares (VONeural-MLP, VONeural-S…
▽ More
GRID-launcher-1.0 was built within the VO-Tech framework, as a software interface between the UK-ASTROGRID and a generic GRID infrastructures in order to allow any ASTROGRID user to launch on the GRID computing intensive tasks from the ASTROGRID Workbench or Desktop. Even though of general application, so far the Grid-Launcher has been tested on a few selected softwares (VONeural-MLP, VONeural-SVM, Sextractor and SWARP) and on the SCOPE-GRID.
△ Less
Submitted 6 June, 2008;
originally announced June 2008.
-
The VO-Neural project: recent developments and some applications
Authors:
M. Brescia,
S. Cavuoti,
G. d'Angelo,
R. D'Abrusco,
N. Deniskina,
M. Garofalo,
O. Laurino,
G. Longo,
A. Nocella,
B. Skordovski
Abstract:
VO-Neural is the natural evolution of the Astroneural project which was started in 1994 with the aim to implement a suite of neural tools for data mining in astronomical massive data sets. At a difference with its ancestor, which was implemented under Matlab, VO-Neural is written in C++, object oriented, and it is specifically tailored to work in distributed computing architectures. We discuss t…
▽ More
VO-Neural is the natural evolution of the Astroneural project which was started in 1994 with the aim to implement a suite of neural tools for data mining in astronomical massive data sets. At a difference with its ancestor, which was implemented under Matlab, VO-Neural is written in C++, object oriented, and it is specifically tailored to work in distributed computing architectures. We discuss the current status of implementation of VO-Neural, present an application to the classification of Active Galactic Nuclei, and outline the ongoing work to improve the functionalities of the package.
△ Less
Submitted 5 June, 2008;
originally announced June 2008.