-
NLP-based classification of software tools for metagenomics sequencing data analysis into EDAM semantic annotation
Authors:
Kaoutar Daoud Hiri,
Matjaž Hren,
Tomaž Curk
Abstract:
Motivation: The rapid growth of metagenomics sequencing data makes metagenomics increasingly dependent on computational and statistical methods for fast and efficient analysis. Consequently, novel analysis tools for big-data metagenomics are constantly emerging. One of the biggest challenges for researchers occurs in the analysis planning stage: selecting the most suitable metagenomics software to…
▽ More
Motivation: The rapid growth of metagenomics sequencing data makes metagenomics increasingly dependent on computational and statistical methods for fast and efficient analysis. Consequently, novel analysis tools for big-data metagenomics are constantly emerging. One of the biggest challenges for researchers occurs in the analysis planning stage: selecting the most suitable metagenomics software tool to gain valuable insights from sequencing data. The building process of data analysis pipelines is often laborious and time-consuming since it requires a deep and critical understanding of how to apply a particular tool to complete a specified metagenomics task.
Results: We have addressed this challenge by using machine learning methods to develop a classification system of metagenomics software tools into 13 classes (11 semantic annotations of EDAM and two virus-specific classes) based on the descriptions of the tools. We trained three classifiers (Naive Bayes, Logistic Regression, and Random Forest) using 15 text feature extraction techniques (TF-IDF, GloVe, BERT-based models, and others). The manually curated dataset includes 224 software tools and contains text from the abstract and the methods section of the tools' publications. The best classification performance, with an Area Under the Precision-Recall Curve score of 0.85, is achieved using Logistic regression, BioBERT for text embedding, and text from abstracts only. The proposed system provides accurate and unified identification of metagenomics data analysis tools and tasks, which is a crucial step in the construction of metagenomics data analysis pipelines.
△ Less
Submitted 18 October, 2022; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Photometric survey, modelling, and scaling of long-period and low-amplitude asteroids
Authors:
A. Marciniak,
P. Bartczak,
T. Müller,
J. J. Sanabria,
V. Alí-Lagoa,
P. Antonini,
R. Behrend,
L. Bernasconi,
M. Bronikowska,
M. Butkiewicz - Bąk,
A. Cikota,
R. Crippa,
R. Ditteon,
G. Dudziński,
R. Duffard,
K. Dziadura,
S. Fauvaud,
S. Geier,
R. Hirsch,
J. Horbowicz,
M. Hren,
L. Jerosimic,
K. Kamiński,
P. Kankiewicz,
I. Konstanciak
, et al. (18 additional authors not shown)
Abstract:
The available set of spin and shape modelled asteroids is strongly biased against slowly rotating targets and those with low lightcurve amplitudes. As a consequence of these selection effects, the current picture of asteroid spin axis distribution, rotation rates, or radiometric properties, might be affected too.
To counteract these selection effects, we are running a photometric campaign of a l…
▽ More
The available set of spin and shape modelled asteroids is strongly biased against slowly rotating targets and those with low lightcurve amplitudes. As a consequence of these selection effects, the current picture of asteroid spin axis distribution, rotation rates, or radiometric properties, might be affected too.
To counteract these selection effects, we are running a photometric campaign of a large sample of main belt asteroids omitted in most previous studies. We determined synodic rotation periods and verified previous determinations. When a dataset for a given target was sufficiently large and varied, we performed spin and shape modelling with two different methods.
We used the convex inversion method and the non-convex SAGE algorithm, applied on the same datasets of dense lightcurves. Unlike convex inversion, the SAGE method allows for the existence of valleys and indentations on the shapes based only on lightcurves.
We obtained detailed spin and shape models for the first five targets of our sample: (159) Aemilia, (227) Philosophia, (329) Svea, (478) Tergeste, and (487) Venetia. When compared to stellar occultation chords, our models obtained an absolute size scale and major topographic features of the shape models were also confirmed. When applied to thermophysical modelling, they provided a very good fit to the infrared data and allowed their size, albedo, and thermal inertia to be determined.
Convex and non-convex shape models provide comparable fits to lightcurves. However, some non-convex models fit notably better to stellar occultation chords and to infrared data in sophisticated thermophysical modelling (TPM). In some cases TPM showed strong preference for one of the spin and shape solutions. Also, we confirmed that slowly rotating asteroids tend to have higher-than-average values of thermal inertia.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
New and updated convex shape models of asteroids based on optical data from a large collaboration network
Authors:
J. Hanuš,
J. Ďurech,
D. A. Oszkiewicz,
R. Behrend,
B. Carry,
M. Delbo',
O. Adam,
V. Afonina,
R. Anquetin,
P. Antonini,
L. Arnold,
M. Audejean,
P. Aurard,
M. Bachschmidt,
B. Badue,
E. Barbotin,
P. Barroy,
P. Baudouin,
L. Berard,
N. Berger,
L. Bernasconi,
J-G. Bosch,
S. Bouley,
I. Bozhinova,
J. Brinsfield
, et al. (144 additional authors not shown)
Abstract:
Asteroid modeling efforts in the last decade resulted in a comprehensive dataset of almost 400 convex shape models and their rotation states. This amount already provided a deep insight into physical properties of main-belt asteroids or large collisional families. We aim to increase the number of asteroid shape models and rotation states. Such results are an important input for various further stu…
▽ More
Asteroid modeling efforts in the last decade resulted in a comprehensive dataset of almost 400 convex shape models and their rotation states. This amount already provided a deep insight into physical properties of main-belt asteroids or large collisional families. We aim to increase the number of asteroid shape models and rotation states. Such results are an important input for various further studies such as analysis of asteroid physical properties in different populations, including smaller collisional families, thermophysical modeling, and scaling shape models by disk-resolved images, or stellar occultation data. This provides, in combination with known masses, bulk density estimates, but constrains also theoretical collisional and evolutional models of the Solar System. We use all available disk-integrated optical data (i.e., classical dense-in-time photometry obtained from public databases and through a large collaboration network as well as sparse-in-time individual measurements from a few sky surveys) as an input for the convex inversion method, and derive 3D shape models of asteroids, together with their rotation periods and orientations of rotation axes. The key ingredient is the support of more that one hundred observers who submit their optical data to publicly available databases. We present updated shape models for 36 asteroids, for which mass estimates are currently available in the literature or their masses will be most likely determined from their gravitational influence on smaller bodies, which orbital deflection will be observed by the ESA Gaia astrometric mission. This was achieved by using additional optical data from recent apparitions for the shape optimization. Moreover, we also present new shape model determinations for 250 asteroids, including 13 Hungarias and 3 near-Earth asteroids.
△ Less
Submitted 26 October, 2015;
originally announced October 2015.