-
Explainable Graph Neural Networks Under Fire
Authors:
Zhong Li,
Simon Geisler,
Yuhang Wang,
Stephan Günnemann,
Matthijs van Leeuwen
Abstract:
Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work…
▽ More
Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology
Authors:
Frank A. Ruis,
Alma M. Liezenga,
Friso G. Heslinga,
Luca Ballan,
Thijs A. Eker,
Richard J. M. den Hollander,
Martin C. van Leeuwen,
Judith Dijk,
Wyke Huizinga
Abstract:
Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic an…
▽ More
Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic and real data remains a challenge. Existing methods usually build on top of baseline Convolutional Neural Network (CNN) models that have been shown to perform well when trained on real data, but have limited ability to perform well when trained on synthetic data. For example, some architectures allow for fine-tuning with the expectation of large quantities of training data and are prone to overfitting on synthetic data. Related work usually ignores various best practices from object detection on real data, e.g. by training on synthetic data from a single environment with relatively little variation. In this paper we propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data. Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images. Based on the state of the art, we incorporate data augmentation methods and a Transformer backbone. Besides reaching relatively strong performance without any specialized synthetic data transfer methods, we show that our methods improve the state of the art on synthetic data trained object detection for the RarePlanes and DGTA-VisDrone datasets, and reach near-perfect performance on an in-house vehicle detection dataset.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Discovery of a dormant 33 solar-mass black hole in pre-release Gaia astrometry
Authors:
Gaia Collaboration,
P. Panuzzo,
T. Mazeh,
F. Arenou,
B. Holl,
E. Caffau,
A. Jorissen,
C. Babusiaux,
P. Gavras,
J. Sahlmann,
U. Bastian,
Ł. Wyrzykowski,
L. Eyer,
N. Leclerc,
N. Bauchet,
A. Bombrun,
N. Mowlavi,
G. M. Seabroke,
D. Teyssier,
E. Balbinot,
A. Helmi,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne
, et al. (390 additional authors not shown)
Abstract:
Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is exp…
▽ More
Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is expected to uncover many Galactic wide-binary systems containing dormant BHs, which may not have been detected before. The study of this population will provide new information on the BH-mass distribution in binaries and shed light on their formation mechanisms and progenitors. As part of the validation efforts in preparation for the fourth Gaia data release (DR4), we analysed the preliminary astrometric binary solutions, obtained by the Gaia Non-Single Star pipeline, to verify their significance and to minimise false-detection rates in high-mass-function orbital solutions. The astrometric binary solution of one source, Gaia BH3, implies the presence of a 32.70 \pm 0.82 M\odot BH in a binary system with a period of 11.6 yr. Gaia radial velocities independently validate the astrometric orbit. Broad-band photometric and spectroscopic data show that the visible component is an old, very metal-poor giant of the Galactic halo, at a distance of 590 pc. The BH in the Gaia BH3 system is more massive than any other Galactic stellar-origin BH known thus far. The low metallicity of the star companion supports the scenario that metal-poor massive stars are progenitors of the high-mass BHs detected by gravitational-wave telescopes. The Galactic orbit of the system and its metallicity indicate that it might belong to the Sequoia halo substructure. Alternatively, and more plausibly, it could belong to the ED-2 stream, which likely originated from a globular cluster that had been disrupted by the Milky Way.
△ Less
Submitted 19 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Probabilistic Truly Unordered Rule Sets
Authors:
Lincen Yang,
Matthijs van Leeuwen
Abstract:
Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing m…
▽ More
Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach.
To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlap** rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
The Radiation Balance for a semi-gray Atmosphere
Authors:
J. M. J. van Leeuwen
Abstract:
The equations governing the radiation balance are discussed and exactly solved for the model of the semi-gray atmosphere, i.e. an atmosphere that has regions of transparant frequencies and regions with the same absorption coefficient.
The equations governing the radiation balance are discussed and exactly solved for the model of the semi-gray atmosphere, i.e. an atmosphere that has regions of transparant frequencies and regions with the same absorption coefficient.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Quantum Evolution as a Square Root of the Master Equation
Authors:
J. M. J. van Leeuwen
Abstract:
The analogy between the quantum evolution and that of the master equation is explored. By stressing the stochastic nature of quantum evolution a number of conceptual difficulties in the interpretation of quantum mechanics are avoided.
The analogy between the quantum evolution and that of the master equation is explored. By stressing the stochastic nature of quantum evolution a number of conceptual difficulties in the interpretation of quantum mechanics are avoided.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Gaia Focused Product Release: Sources from Service Interface Function image analysis -- Half a million new sources in omega Centauri
Authors:
Gaia Collaboration,
K. Weingrill,
A. Mints,
J. Castañeda,
Z. Kostrzewa-Rutkowska,
M. Davidson,
F. De Angeli,
J. Hernández,
F. Torra,
M. Ramos-Lerate,
C. Babusiaux,
M. Biermann,
C. Crowley,
D. W. Evans,
L. Lindegren,
J. M. Martín-Fleitas,
L. Palaversa,
D. Ruz Mieres,
K. Tisanić,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
A. Barbier
, et al. (378 additional authors not shown)
Abstract:
Gaia's readout window strategy is challenged by very dense fields in the sky. Therefore, in addition to standard Gaia observations, full Sky Mapper (SM) images were recorded for nine selected regions in the sky. A new software pipeline exploits these Service Interface Function (SIF) images of crowded fields (CFs), making use of the availability of the full two-dimensional (2D) information. This ne…
▽ More
Gaia's readout window strategy is challenged by very dense fields in the sky. Therefore, in addition to standard Gaia observations, full Sky Mapper (SM) images were recorded for nine selected regions in the sky. A new software pipeline exploits these Service Interface Function (SIF) images of crowded fields (CFs), making use of the availability of the full two-dimensional (2D) information. This new pipeline produced half a million additional Gaia sources in the region of the omega Centauri ($ω$ Cen) cluster, which are published with this Focused Product Release. We discuss the dedicated SIF CF data reduction pipeline, validate its data products, and introduce their Gaia archive table. Our aim is to improve the completeness of the {\it Gaia} source inventory in a very dense region in the sky, $ω$ Cen. An adapted version of {\it Gaia}'s Source Detection and Image Parameter Determination software located sources in the 2D SIF CF images. We validated the results by comparing them to the public {\it Gaia} DR3 catalogue and external Hubble Space Telescope data. With this Focused Product Release, 526\,587 new sources have been added to the {\it Gaia} catalogue in $ω$ Cen. Apart from positions and brightnesses, the additional catalogue contains parallaxes and proper motions, but no meaningful colour information. While SIF CF source parameters generally have a lower precision than nominal {\it Gaia} sources, in the cluster centre they increase the depth of the combined catalogue by three magnitudes and improve the source density by a factor of ten. This first SIF CF data publication already adds great value to the {\it Gaia} catalogue. It demonstrates what to expect for the fourth {\it Gaia} catalogue, which will contain additional sources for all nine SIF CF regions.
△ Less
Submitted 8 November, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Gaia Focused Product Release: A catalogue of sources around quasars to search for strongly lensed quasars
Authors:
Gaia Collaboration,
A. Krone-Martins,
C. Ducourant,
L. Galluccio,
L. Delchambre,
I. Oreshina-Slezak,
R. Teixeira,
J. Braine,
J. -F. Le Campion,
F. Mignard,
W. Roux,
A. Blazere,
L. Pegoraro,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
A. Barbier,
M. Biermann,
O. L. Creevey,
D. W. Evans,
L. Eyer,
R. Guerra
, et al. (376 additional authors not shown)
Abstract:
Context. Strongly lensed quasars are fundamental sources for cosmology. The Gaia space mission covers the entire sky with the unprecedented resolution of $0.18$" in the optical, making it an ideal instrument to search for gravitational lenses down to the limiting magnitude of 21. Nevertheless, the previous Gaia Data Releases are known to be incomplete for small angular separations such as those ex…
▽ More
Context. Strongly lensed quasars are fundamental sources for cosmology. The Gaia space mission covers the entire sky with the unprecedented resolution of $0.18$" in the optical, making it an ideal instrument to search for gravitational lenses down to the limiting magnitude of 21. Nevertheless, the previous Gaia Data Releases are known to be incomplete for small angular separations such as those expected for most lenses. Aims. We present the Data Processing and Analysis Consortium GravLens pipeline, which was built to analyse all Gaia detections around quasars and to cluster them into sources, thus producing a catalogue of secondary sources around each quasar. We analysed the resulting catalogue to produce scores that indicate source configurations that are compatible with strongly lensed quasars. Methods. GravLens uses the DBSCAN unsupervised clustering algorithm to detect sources around quasars. The resulting catalogue of multiplets is then analysed with several methods to identify potential gravitational lenses. We developed and applied an outlier scoring method, a comparison between the average BP and RP spectra of the components, and we also used an extremely randomised tree algorithm. These methods produce scores to identify the most probable configurations and to establish a list of lens candidates. Results. We analysed the environment of 3 760 032 quasars. A total of 4 760 920 sources, including the quasars, were found within 6" of the quasar positions. This list is given in the Gaia archive. In 87\% of cases, the quasar remains a single source, and in 501 385 cases neighbouring sources were detected. We propose a list of 381 lensed candidates, of which we identified 49 as the most promising. Beyond these candidates, the associate tables in this Focused Product Release allow the entire community to explore the unique Gaia data for strong lensing studies further.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Gaia Focused Product Release: Radial velocity time series of long-period variables
Authors:
Gaia Collaboration,
Gaia Collaboration,
M. Trabucchi,
N. Mowlavi,
T. Lebzelter,
I. Lecoeur-Taibi,
M. Audard,
L. Eyer,
P. García-Lario,
P. Gavras,
B. Holl,
G. Jevardat de Fombelle,
K. Nienartowicz,
L. Rimoldini,
P. Sartoretti,
R. Blomme,
Y. Frémat,
O. Marchal,
Y. Damerdji,
A. G. A. Brown,
A. Guerrier,
P. Panuzzo,
D. Katz,
G. M. Seabroke,
K. Benson
, et al. (382 additional authors not shown)
Abstract:
The third Gaia Data Release (DR3) provided photometric time series of more than 2 million long-period variable (LPV) candidates. Anticipating the publication of full radial-velocity (RV) in DR4, this Focused Product Release (FPR) provides RV time series for a selection of LPVs with high-quality observations. We describe the production and content of the Gaia catalog of LPV RV time series, and the…
▽ More
The third Gaia Data Release (DR3) provided photometric time series of more than 2 million long-period variable (LPV) candidates. Anticipating the publication of full radial-velocity (RV) in DR4, this Focused Product Release (FPR) provides RV time series for a selection of LPVs with high-quality observations. We describe the production and content of the Gaia catalog of LPV RV time series, and the methods used to compute variability parameters published in the Gaia FPR. Starting from the DR3 LPVs catalog, we applied filters to construct a sample of sources with high-quality RV measurements. We modeled their RV and photometric time series to derive their periods and amplitudes, and further refined the sample by requiring compatibility between the RV period and at least one of the $G$, $G_{\rm BP}$, or $G_{\rm RP}$ photometric periods. The catalog includes RV time series and variability parameters for 9\,614 sources in the magnitude range $6\lesssim G/{\rm mag}\lesssim 14$, including a flagged top-quality subsample of 6\,093 stars whose RV periods are fully compatible with the values derived from the $G$, $G_{\rm BP}$, and $G_{\rm RP}$ photometric time series. The RV time series contain a mean of 24 measurements per source taken unevenly over a duration of about three years. We identify the great most sources (88%) as genuine LPVs, with about half of them showing a pulsation period and the other half displaying a long secondary period. The remaining 12% consists of candidate ellipsoidal binaries. Quality checks against RVs available in the literature show excellent agreement. We provide illustrative examples and cautionary remarks. The publication of RV time series for almost 10\,000 LPVs constitutes, by far, the largest such database available to date in the literature. The availability of simultaneous photometric measurements gives a unique added value to the Gaia catalog (abridged)
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Preparing for Gaia Searches for Optical Counterparts of Gravitational Wave Events during O4
Authors:
Sumedha Biswas,
Zuzanna Kostrzewa-Rutkowska,
Peter G. Jonker,
Paul Vreeswijk,
Deepak Eappachen,
Paul J. Groot,
Simon Hodgkin,
Abdullah Yoldas,
Guy Rixon,
Diana Harrison,
M. van Leeuwen,
Dafydd Evans
Abstract:
The discovery of gravitational wave (GW) events and the detection of electromagnetic counterparts from GW170817 has started the era of multimessenger GW astronomy.The field has been develo** rapidly and in this paper,we discuss the preparation for detecting these events with the ESA Gaia satellite,during the 4th observing run of the LIGO-Virgo-KAGRA (LVK) collaboration that has started on May 24…
▽ More
The discovery of gravitational wave (GW) events and the detection of electromagnetic counterparts from GW170817 has started the era of multimessenger GW astronomy.The field has been develo** rapidly and in this paper,we discuss the preparation for detecting these events with the ESA Gaia satellite,during the 4th observing run of the LIGO-Virgo-KAGRA (LVK) collaboration that has started on May 24,2023. Gaia is contributing to the search for GW counterparts by a new transient detection pipeline called GaiaX. In GaiaX, a new source appearing in the field of view of only one of the two telescopes on-board Gaia is sufficient to send out an alert on the possible detection of a new transient. Ahead of O4, an experiment was conducted over a period of about two months. During the two weeks around New Moon in this period of time, the MeerLICHT (ML) telescope located in South Africa tried (weather permitting) to observe the same region of the sky as Gaia within 10 minutes. Any GaiaX detected transient was published publicly. ML and Gaia have similar limiting magnitudes for typical seeing conditions at ML. At the end of the experiment, we had 11861 GaiaX candidate transients and 15806 ML candidate transients, which we further analysed and the results of which are presented in this paper. Finally, we discuss the possibility and capabilities of Gaia contributing to the search for electromagnetic counterparts of gravitational wave events during O4 through the GaiaX detection and alert procedure.
△ Less
Submitted 21 August, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Graph Neural Networks based Log Anomaly Detection and Explanation
Authors:
Zhong Li,
Jiayang Shi,
Matthijs van Leeuwen
Abstract:
Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential…
▽ More
Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential relationships may result in low detection accuracy. To alleviate this problem, we propose a graph-based method for unsupervised log anomaly detection, dubbed Logs2Graphs, which first converts event logs into attributed, directed, and weighted graphs, and then leverages graph neural networks to perform graph-level anomaly detection. Specifically, we introduce One-Class Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph neural network model for detecting graph-level anomalies in a collection of attributed, directed, and weighted graphs. By coupling the graph representation and anomaly detection steps, OCDiGCN can learn a representation that is especially suited for anomaly detection, resulting in a high detection accuracy. Importantly, for each identified anomaly, we additionally provide a small subset of nodes that play a crucial role in OCDiGCN's prediction as explanations, which can offer valuable cues for subsequent root cause diagnosis. Experiments on five benchmark datasets show that Logs2Graphs performs at least on par with state-of-the-art log anomaly detection methods on simple datasets while largely outperforming state-of-the-art log anomaly detection methods on complicated datasets.
△ Less
Submitted 24 January, 2024; v1 submitted 2 July, 2023;
originally announced July 2023.
-
A Large-Scale Pad-Sensor Based Prototype of the Silicon Tungsten Electromagnetic Calorimeter for the Forward Direction in ALICE at LHC
Authors:
R. G. E. Barthel,
T. Chujo,
T. Hachiya,
M. Hatakeyama,
Y. Hoshi,
M. Inaba,
Y.,
Kawamura,
D. Kawana,
C. Loizides,
Y. Miake,
Y. Minato,
K. Nakagawa,
N. Novitzky,
T. Peitzmann,
M. Rossewij,
M. Shimomura,
T. Sugitate,
T. Suzuki,
K. Tadokoro,
M. Takamura,
S. Takasu,
A. van den Brink,
M. van Leeuwen
Abstract:
We constructed a large-scale electromagnetic calorimeter prototype as a part of the Forward Calorimeter upgrade project (FoCal) for the ALICE experiment at the Large Hadron Collider (LHC). The prototype, also known as ``Mini FoCal'', consists of 20 layers of silicon pad sensors and tungsten alloy plates with printed circuit boards and readout electronics. The constructed detector was tested at the…
▽ More
We constructed a large-scale electromagnetic calorimeter prototype as a part of the Forward Calorimeter upgrade project (FoCal) for the ALICE experiment at the Large Hadron Collider (LHC). The prototype, also known as ``Mini FoCal'', consists of 20 layers of silicon pad sensors and tungsten alloy plates with printed circuit boards and readout electronics. The constructed detector was tested at the test beam facility of the Super Proton Synchrotron (SPS) at CERN. We obtain an energy resolution of about 4.3% for electron beams at both 150 and 250 GeV/$c$, which is consistent with realistic detector response simulations. Longitudinal profiles of electromagnetic shower were also measured and found to agree with the simulations. The same prototype detector was installed in the ALICE experimental area about 7.5m away from the interaction point. It was used to measure inclusive electromagnetic cluster energy distributions and neutral-pion candidate invariant mass distributions for pseudo-rapidity of $η$=3.7-4.5 in proton-proton collisions at $\sqrt{s}$ = 13 TeV at LHC. The measured distributions in different $η$ regions are similar to those obtained from PYTHIA simulations.
△ Less
Submitted 18 March, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Jet substructure observables for jet quenching in Quark Gluon Plasma: a Machine Learning driven analysis
Authors:
Miguel Crispim Romão,
José Guilherme Milhano,
Marco van Leeuwen
Abstract:
We present a survey of a comprehensive set of jet substructure observables commonly used to study the modifications of jets resulting from interactions with the Quark Gluon Plasma in Heavy Ion Collisions. The \jewel{} event generator is used to produce simulated samples of quenched and unquenched jets. Three distinct analyses using Machine Learning techniques on the jet substructure observables ha…
▽ More
We present a survey of a comprehensive set of jet substructure observables commonly used to study the modifications of jets resulting from interactions with the Quark Gluon Plasma in Heavy Ion Collisions. The \jewel{} event generator is used to produce simulated samples of quenched and unquenched jets. Three distinct analyses using Machine Learning techniques on the jet substructure observables have been performed to identify both linear and non-linear relations between the observables, and to distinguish the Quenched and Unquenched jet samples. We find that most of the observables are highly correlated, and that their information content can be captured by a small set of observables. We also find that the correlations between observables are resilient to quenching effects and that specific pairs of observables exhaust the full sensitivity to quenching effects. The code, the datasets, and instructions on how to reproduce this work are also provided.
△ Less
Submitted 12 December, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring
Authors:
Richard M. K. van Dijk,
Daniela Gawehns,
Matthijs van Leeuwen
Abstract:
We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, baro…
▽ More
We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, barometer, and GPS -- which should enable researchers to, for example, estimate energy expenditure and mine movement trajectories. A Samsung smartwatch running the Tizen OS was chosen because of 1) the required functionalities of the smartwatch software API, 2) the availability of software development tools and accessible documentation, 3) having the required sensors, and 4) the requirements on case design for acceptance by the target user group. WEARDA addresses five practical challenges concerning preparation, measurement, logistics, privacy preservation, and reproducibility to ensure efficient and errorless data collection. The software package was initially created for the project "Dementia back at the heart of the community", and has been successfully used in that context.
△ Less
Submitted 30 October, 2023; v1 submitted 28 February, 2023;
originally announced March 2023.
-
Explainable Contextual Anomaly Detection using Quantile Regression Forests
Authors:
Zhong Li,
Matthijs van Leeuwen
Abstract:
Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between depend…
▽ More
Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods. Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection that uses Quantile Regression Forests to model dependencies between features. Extensive experiments on various synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art anomaly detection methods in identifying contextual anomalies in terms of accuracy and interpretability.
△ Less
Submitted 4 August, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
A Survey on Explainable Anomaly Detection
Authors:
Zhong Li,
Yuxuan Zhu,
Matthijs van Leeuwen
Abstract:
In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those d…
▽ More
In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.
△ Less
Submitted 11 July, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Snowmass 2021/22 Letter of Interest: A Forward Calorimeter at the LHC
Authors:
I. G. Bearden,
R. Bellwied,
V. Borshchov,
J. Faivre,
C. Furget,
E. Garcia-Solis,
M. B. Gay Ducati,
G. Conesa-Balbastre,
R. Guernane,
C. Loizides,
J. Rojo,
M. Płoskoń,
S. R. Klein,
Y. Kovchegov,
V. A. Okorokov,
T. Peitzmann,
M. Protsenko,
J. Putschke,
D. Röhrich,
J. D. Tapia Takaki,
I. Tymchuk,
M. van Leeuwen,
R. Venugopalan
Abstract:
A forward electromagnetic and hadronic calorimeter (FoCal) was proposed as an upgrade to the ALICE experiment, to be installed during LS3 for data-taking in 2027--2029 at the LHC. The FoCal extends the scope of ALICE, which was designed for the comprehensive study of hot and dense partonic matter, by adding new capabilities to explore the small-$x$ parton structure of nucleons and nuclei. The prim…
▽ More
A forward electromagnetic and hadronic calorimeter (FoCal) was proposed as an upgrade to the ALICE experiment, to be installed during LS3 for data-taking in 2027--2029 at the LHC. The FoCal extends the scope of ALICE, which was designed for the comprehensive study of hot and dense partonic matter, by adding new capabilities to explore the small-$x$ parton structure of nucleons and nuclei. The primary objective of the FoCal is high-precision inclusive measurement of direct photons and jets, as well as coincident gamma-jet and jet-jet measurements, in pp and p--Pb collisions. These measurements by FoCal constitute an essential part of a comprehensive small-$x$ program at the LHC down to $x\sim10^{-6}$ and over a large range of $Q^2$ with a broad array of complementary probes, comprising -- in addition to the photon measurements by FoCal and LHCb -- Drell-Yan and open charm measurements planned by LHCb, as well as photon-induced reactions performed by all LHC experiments.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Feature Selection for Fault Detection and Prediction based on Event Log Analysis
Authors:
Zhong Li,
Matthijs van Leeuwen
Abstract:
Event logs are widely used for anomaly detection and prediction in complex systems. Existing log-based anomaly detection methods usually consist of four main steps: log collection, log parsing, feature extraction, and anomaly detection, wherein the feature extraction step extracts useful features for anomaly detection by counting log events. For a complex system, such as a lithography machine cons…
▽ More
Event logs are widely used for anomaly detection and prediction in complex systems. Existing log-based anomaly detection methods usually consist of four main steps: log collection, log parsing, feature extraction, and anomaly detection, wherein the feature extraction step extracts useful features for anomaly detection by counting log events. For a complex system, such as a lithography machine consisting of a large number of subsystems, its log may contain thousands of different events, resulting in abounding extracted features. However, when anomaly detection is performed at the subsystem level, analyzing all features becomes expensive and unnecessary. To mitigate this problem, we develop a feature selection method for log-based anomaly detection and prediction, largely improving the effectiveness and efficiency.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Gaia Data Release 3: Summary of the content and survey properties
Authors:
Gaia Collaboration,
A. Vallenari,
A. G. A. Brown,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
L. Eyer,
R. Guerra,
A. Hutton,
C. Jordi,
S. A. Klioner,
U. L. Lammers,
L. Lindegren,
X. Luri,
F. Mignard,
C. Panem,
D. Pourbaix,
S. Randich,
P. Sartoretti,
C. Soubiran
, et al. (431 additional authors not shown)
Abstract:
We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photom…
▽ More
We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photometry in the G, G$_{BP}$, and G$_{RP}$ pass-bands already present in the Early Third Data Release. GDR3 introduces an impressive wealth of new data products. More than 33 million objects in the ranges $G_{rvs} < 14$ and $3100 <T_{eff} <14500 $, have new determinations of their mean radial velocities based on data collected by Gaia. We provide G$_{rvs}$ magnitudes for most sources with radial velocities, and a line broadening parameter is listed for a subset of these. Mean Gaia spectra are made available to the community. The GDR3 catalogue includes about 1 million mean spectra from the radial velocity spectrometer, and about 220 million low-resolution blue and red prism photometer BPRP mean spectra. The results of the analysis of epoch photometry are provided for some 10 million sources across 24 variability types. GDR3 includes astrophysical parameters and source class probabilities for about 470 million and 1500 million sources, respectively, including stars, galaxies, and quasars. Orbital elements and trend parameters are provided for some $800\,000$ astrometric, spectroscopic and eclipsing binaries. More than $150\,000$ Solar System objects, including new discoveries, with preliminary orbital solutions and individual epoch observations are part of this release. Reflectance spectra derived from the epoch BPRP spectral data are published for about 60\,000 asteroids. Finally, an additional data set is provided, namely the Gaia Andromeda Photometric Survey (abridged)
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Extension of the Uhlenbeck-Ford Model with an Attraction
Authors:
J. M. J. van Leeuwen
Abstract:
The Uhlenbeck-Ford model for soft repulsion, which has only a repulsive interaction, is extended by inclusion of an attraction. This extension still allows an analytical evaluation of the virial coefficients. The integrals over the graph contributions are reduced to a combinatorial problem. We have calculated the virial coefficients to order 6 in the density. A link is made between this model and…
▽ More
The Uhlenbeck-Ford model for soft repulsion, which has only a repulsive interaction, is extended by inclusion of an attraction. This extension still allows an analytical evaluation of the virial coefficients. The integrals over the graph contributions are reduced to a combinatorial problem. We have calculated the virial coefficients to order 6 in the density. A link is made between this model and more common interactions, like the 12-6 Lennard-Jones potential.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Gaia Data Release 3: Reflectance spectra of Solar System small bodies
Authors:
Gaia Collaboration,
L. Galluccio,
M. Delbo,
F. De Angeli,
T. Pauwels,
P. Tanga,
F. Mignard,
A. Cellino,
A. G. A. Brown,
K. Muinonen,
A. Penttila,
S. Jordan,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
L. Eyer,
R. Guerra,
A. Hutton,
C. Jordi
, et al. (422 additional authors not shown)
Abstract:
The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was deriv…
▽ More
The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was derived from measurements obtained by means of the Blue and Red photometers (BP/RP), which were binned in 16 discrete wavelength bands. We describe the processing of the Gaia spectral data of SSOs, explaining both the criteria used to select the subset of asteroid spectra published in Gaia DR3, and the different steps of our internal validation procedures. In order to further assess the quality of Gaia SSO reflectance spectra, we carried out external validation against SSO reflectance spectra obtained from ground-based and space-borne telescopes and available in the literature. For each selected SSO, an epoch reflectance was computed by dividing the calibrated spectrum observed by the BP/RP at each transit on the focal plane by the mean spectrum of a solar analogue. The latter was obtained by averaging the Gaia spectral measurements of a selected sample of stars known to have very similar spectra to that of the Sun. Finally, a mean of the epoch reflectance spectra was calculated in 16 spectral bands for each SSO. The agreement between Gaia mean reflectance spectra and those available in the literature is good for bright SSOs, regardless of their taxonomic spectral class. We identify an increase in the spectral slope of S-type SSOs with increasing phase angle. Moreover, we show that the spectral slope increases and the depth of the 1 um absorption band decreases for increasing ages of S-type asteroid families.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Truly Unordered Probabilistic Rule Sets for Multi-class Classification
Authors:
Lincen Yang,
Matthijs van Leeuwen
Abstract:
Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which har…
▽ More
Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only one for probabilistic rule lists).
We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalize the problem of learning truly unordered rule sets. To resolve conflicts caused by overlap** rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule.
Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability but also better predictive performance.
△ Less
Submitted 18 July, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Map** the asymmetric disc of the Milky Way
Authors:
Gaia Collaboration,
R. Drimmel,
M. Romero-Gomez,
L. Chemin,
P. Ramos,
E. Poggio,
V. Ripepi,
R. Andrae,
R. Blomme,
T. Cantat-Gaudin,
A. Castro-Ginard,
G. Clementini,
F. Figueras,
M. Fouesneau,
Y. Fremat,
K. Jardine,
S. Khanna,
A. Lobel,
D. J. Marshall,
T. Muraveva,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou
, et al. (431 additional authors not shown)
Abstract:
With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provid…
▽ More
With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provided in Gaia DR3, we select various stellar populations to explore and identify non-axisymmetric features in the disc of the Milky Way in both configuration and velocity space. Using more about 580 thousand sources identified as hot OB stars, together with 988 known open clusters younger than 100 million years, we map the spiral structure associated with star formation 4-5 kpc from the Sun. We select over 2800 Classical Cepheids younger than 200 million years, which show spiral features extending as far as 10 kpc from the Sun in the outer disc. We also identify more than 8.7 million sources on the red giant branch (RGB), of which 5.7 million have line-of-sight velocities, allowing the velocity field of the Milky Way to be mapped as far as 8 kpc from the Sun, including the inner disc. The spiral structure revealed by the young populations is consistent with recent results using Gaia EDR3 astrometry and source lists based on near infrared photometry, showing the Local (Orion) arm to be at least 8 kpc long, and an outer arm consistent with what is seen in HI surveys, which seems to be a continuation of the Perseus arm into the third quadrant. Meanwhile, the subset of RGB stars with velocities clearly reveals the large scale kinematic signature of the bar in the inner disc, as well as evidence of streaming motions in the outer disc that might be associated with spiral arms or bar resonances. (abridged)
△ Less
Submitted 5 August, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Pulsations in main sequence OBAF-type stars
Authors:
Gaia Collaboration,
J. De Ridder,
V. Ripepi,
C. Aerts,
L. Palaversa,
L. Eyer,
B. Holl,
M. Audard,
L. Rimoldini,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
R. Guerra,
A. Hutton,
C. Jordi,
S. A. Klioner,
U. L. Lammers,
L. Lindegren
, et al. (423 additional authors not shown)
Abstract:
The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), del…
▽ More
The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), delta Sct, and gamma Dor stars. These stars are often multi-periodic and display low amplitudes, making them challenging targets to analyse with sparse time series. All datasets used in this analysis are part of the Gaia DR3 data release. The photometric time series were used to perform a Fourier analysis, while the global astrophysical parameters necessary for the empirical instability strips were taken from the Gaia DR3 gspphot tables, and the vsini data were taken from the Gaia DR3 esphs tables. We show that for nearby OBAF-type pulsators, the Gaia DR3 data are precise and accurate enough to pinpoint them in the Hertzsprung-Russell diagram. We find empirical instability strips covering broader regions than theoretically predicted. In particular, our study reveals the presence of fast rotating gravity-mode pulsators outside the strips, as well as the co-existence of rotationally modulated variables inside the strips as reported before in the literature. We derive an extensive period-luminosity relation for delta Sct stars and provide evidence that the relation features different regimes depending on the oscillation period. Finally, we demonstrate how stellar rotation attenuates the amplitude of the dominant oscillation mode of delta Sct stars.
△ Less
Submitted 16 August, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: A Golden Sample of Astrophysical Parameters
Authors:
Gaia Collaboration,
O. L. Creevey,
L. M. Sarro,
A. Lobel,
E. Pancino,
R. Andrae,
R. L. Smart,
G. Clementini,
U. Heiter,
A. J. Korn,
M. Fouesneau,
Y. Frémat,
F. De Angeli,
A. Vallenari,
D. L. Harrison,
F. Thévenin,
C. Reylé,
R. Sordo,
A. Garofalo,
A. G. A. Brown,
L. Eyer,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux
, et al. (423 additional authors not shown)
Abstract:
Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples…
▽ More
Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples of the stars of interest. We validate our results by using the Gaia catalogue itself and by comparison with external data. We have produced six homogeneous samples of stars with high quality astrophysical parameters across the HR diagram for the community to exploit. We first focus on three samples that span a large parameter space: young massive disk stars (~3M), FGKM spectral type stars (~3M), and UCDs (~20K). We provide these sources along with additional information (either a flag or complementary parameters) as tables that are made available in the Gaia archive. We furthermore identify 15740 bone fide carbon stars, 5863 solar-analogues, and provide the first homogeneous set of stellar parameters of the Spectro Photometric Standard Stars. We use a subset of the OBA sample to illustrate its usefulness to analyse the Milky Way rotation curve. We then use the properties of the FGKM stars to analyse known exoplanet systems. We also analyse the ages of some unseen UCD-companions to the FGKM stars. We additionally predict the colours of the Sun in various passbands (Gaia, 2MASS, WISE) using the solar-analogue sample.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: The extragalactic content
Authors:
Gaia Collaboration,
C. A. L. Bailer-Jones,
D. Teyssier,
L. Delchambre,
C. Ducourant,
D. Garabato,
D. Hatzidimitriou,
S. A. Klioner,
L. Rimoldini,
I. Bellas-Velidis,
R. Carballo,
M. I. Carnerero,
C. Diener,
M. Fouesneau,
L. Galluccio,
P. Gavras,
A. Krone-Martins,
C. M. Raiteri,
R. Teixeira,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux
, et al. (422 additional authors not shown)
Abstract:
The Gaia Galactic survey mission is designed and optimized to obtain astrometry, photometry, and spectroscopy of nearly two billion stars in our Galaxy. Yet as an all-sky multi-epoch survey, Gaia also observes several million extragalactic objects down to a magnitude of G~21 mag. Due to the nature of the Gaia onboard selection algorithms, these are mostly point-source-like objects. Using data prov…
▽ More
The Gaia Galactic survey mission is designed and optimized to obtain astrometry, photometry, and spectroscopy of nearly two billion stars in our Galaxy. Yet as an all-sky multi-epoch survey, Gaia also observes several million extragalactic objects down to a magnitude of G~21 mag. Due to the nature of the Gaia onboard selection algorithms, these are mostly point-source-like objects. Using data provided by the satellite, we have identified quasar and galaxy candidates via supervised machine learning methods, and estimate their redshifts using the low resolution BP/RP spectra. We further characterise the surface brightness profiles of host galaxies of quasars and of galaxies from pre-defined input lists. Here we give an overview of the processing of extragalactic objects, describe the data products in Gaia DR3, and analyse their properties. Two integrated tables contain the main results for a high completeness, but low purity (50-70%), set of 6.6 million candidate quasars and 4.8 million candidate galaxies. We provide queries that select purer sub-samples of these containing 1.9 million probable quasars and 2.9 million probable galaxies (both 95% purity). We also use high quality BP/RP spectra of 43 thousand high probability quasars over the redshift range 0.05-4.36 to construct a composite quasar spectrum spanning restframe wavelengths from 72-100 nm.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Stellar multiplicity, a teaser for the hidden treasure
Authors:
Gaia Collaboration,
F. Arenou,
C. Babusiaux,
M. A. Barstow,
S. Faigler,
A. Jorissen,
P. Kervella,
T. Mazeh,
N. Mowlavi,
P. Panuzzo,
J. Sahlmann,
S. Shahaf,
A. Sozzetti,
N. Bauchet,
Y. Damerdji,
P. Gavras,
P. Giacobbe,
E. Gosset,
J. -L. Halbwachs,
B. Holl,
M. G. Lattanzi,
N. Leclerc,
T. Morel,
D. Pourbaix,
P. Re Fiorentin
, et al. (425 additional authors not shown)
Abstract:
The Gaia DR3 Catalogue contains for the first time about eight hundred thousand solutions with either orbital elements or trend parameters for astrometric, spectroscopic and eclipsing binaries, and combinations of them. This paper aims to illustrate the huge potential of this large non-single star catalogue. Using the orbital solutions together with models of the binaries, a catalogue of tens of t…
▽ More
The Gaia DR3 Catalogue contains for the first time about eight hundred thousand solutions with either orbital elements or trend parameters for astrometric, spectroscopic and eclipsing binaries, and combinations of them. This paper aims to illustrate the huge potential of this large non-single star catalogue. Using the orbital solutions together with models of the binaries, a catalogue of tens of thousands of stellar masses, or lower limits, partly together with consistent flux ratios, has been built. Properties concerning the completeness of the binary catalogues are discussed, statistical features of the orbital elements are explained and a comparison with other catalogues is performed. Illustrative applications are proposed for binaries across the H-R diagram. The binarity is studied in the RGB/AGB and a search for genuine SB1 among long-period variables is performed. The discovery of new EL CVn systems illustrates the potential of combining variability and binarity catalogues. Potential compact object companions are presented, mainly white dwarf companions or double degenerates, but one candidate neutron star is also presented. Towards the bottom of the main sequence, the orbits of previously-suspected binary ultracool dwarfs are determined and new candidate binaries are discovered. The long awaited contribution of Gaia to the analysis of the substellar regime shows the brown dwarf desert around solar-type stars using true, rather than minimum, masses, and provides new important constraints on the occurrence rates of substellar companions to M dwarfs. Several dozen new exoplanets are proposed, including two with validated orbital solutions and one super-Jupiter orbiting a white dwarf, all being candidates requiring confirmation. Beside binarity, higher order multiple systems are also found.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Chemical cartography of the Milky Way
Authors:
Gaia Collaboration,
A. Recio-Blanco,
G. Kordopatis,
P. de Laverny,
P. A. Palicio,
A. Spagna,
L. Spina,
D. Katz,
P. Re Fiorentin,
E. Poggio,
P. J. McMillan,
A. Vallenari,
M. G. Lattanzi,
G. M. Seabroke,
L. Casamiquela,
A. Bragaglia,
T. Antoja,
C. A. L. Bailer-Jones,
R. Andrae,
M. Fouesneau,
M. Cropper,
T. Cantat-Gaudin,
U. Heiter,
A. Bijaoui,
A. G. A. Brown
, et al. (425 additional authors not shown)
Abstract:
Gaia DR3 opens a new era of all-sky spectral analysis of stellar populations thanks to the nearly 5.6 million stars observed by the RVS and parametrised by the GSP-spec module. The all-sky Gaia chemical cartography allows a powerful and precise chemo-dynamical view of the Milky Way with unprecedented spatial coverage and statistical robustness. First, it reveals the strong vertical symmetry of the…
▽ More
Gaia DR3 opens a new era of all-sky spectral analysis of stellar populations thanks to the nearly 5.6 million stars observed by the RVS and parametrised by the GSP-spec module. The all-sky Gaia chemical cartography allows a powerful and precise chemo-dynamical view of the Milky Way with unprecedented spatial coverage and statistical robustness. First, it reveals the strong vertical symmetry of the Galaxy and the flared structure of the disc. Second, the observed kinematic disturbances of the disc -- seen as phase space correlations -- and kinematic or orbital substructures are associated with chemical patterns that favour stars with enhanced metallicities and lower [alpha/Fe] abundance ratios compared to the median values in the radial distributions. This is detected both for young objects that trace the spiral arms and older populations. Several alpha, iron-peak elements and at least one heavy element trace the thin and thick disc properties in the solar cylinder. Third, young disc stars show a recent chemical impoverishment in several elements. Fourth, the largest chemo-dynamical sample of open clusters analysed so far shows a steepening of the radial metallicity gradient with age, which is also observed in the young field population. Finally, the Gaia chemical data have the required coverage and precision to unveil galaxy accretion debris and heated disc stars on halo orbits through their [alpha/Fe] ratio, and to allow the study of the chemo-dynamical properties of globular clusters. Gaia DR3 chemo-dynamical diagnostics open new horizons before the era of ground-based wide-field spectroscopic surveys. They unveil a complex Milky Way that is the outcome of an eventful evolution, sha** it to the present day (abridged).
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Gaia Early Data Release 3: The celestial reference frame (Gaia-CRF3)
Authors:
Gaia Collaboration,
S. A. Klioner,
L. Lindegren,
F. Mignard,
J. Hernández,
M. Ramos-Lerate,
U. Bastian,
M. Biermann,
A. Bombrun,
A. de Torres,
E. Gerlach,
R. Geyer,
T. Hilger,
D. Hobbs,
U. L. Lammers,
P. J. McMillan,
H. Steidelmüller,
D. Teyssier,
C. M. Raiteri,
S. Bartolomé,
M. Bernet,
J. Castañeda,
M. Clotet,
M. Davidson,
C. Fabricius
, et al. (426 additional authors not shown)
Abstract:
Gaia-CRF3 is the celestial reference frame for positions and proper motions in the third release of data from the Gaia mission, Gaia DR3 (and for the early third release, Gaia EDR3, which contains identical astrometric results). The reference frame is defined by the positions and proper motions at epoch 2016.0 for a specific set of extragalactic sources in the (E)DR3 catalogue.
We describe the c…
▽ More
Gaia-CRF3 is the celestial reference frame for positions and proper motions in the third release of data from the Gaia mission, Gaia DR3 (and for the early third release, Gaia EDR3, which contains identical astrometric results). The reference frame is defined by the positions and proper motions at epoch 2016.0 for a specific set of extragalactic sources in the (E)DR3 catalogue.
We describe the construction of Gaia-CRF3, and its properties in terms of the distributions in magnitude, colour, and astrometric quality.
Compact extragalactic sources in Gaia DR3 were identified by positional cross-matching with 17 external catalogues of quasars (QSO) and active galactic nuclei (AGN), followed by astrometric filtering designed to remove stellar contaminants. Selecting a clean sample was favoured over including a higher number of extragalactic sources. For the final sample, the random and systematic errors in the proper motions are analysed, as well as the radio-optical offsets in position for sources in the third realisation of the International Celestial Reference Frame (ICRF3).
The Gaia-CRF3 comprises about 1.6 million QSO-like sources, of which 1.2 million have five-parameter astrometric solutions in Gaia DR3 and 0.4 million have six-parameter solutions. The sources span the magnitude range G = 13 to 21 with a peak density at 20.6 mag, at which the typical positional uncertainty is about 1 mas. The proper motions show systematic errors on the level of 12 $μ$as yr${}^{-1}$ on angular scales greater than 15 deg. For the 3142 optical counterparts of ICRF3 sources in the S/X frequency bands, the median offset from the radio positions is about 0.5 mas, but exceeds 4 mas in either coordinate for 127 sources. We outline the future of the Gaia-CRF in the next Gaia data releases.
△ Less
Submitted 30 October, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Gaia Photometric Science Alerts
Authors:
S. T. Hodgkin,
D. L. Harrison,
E. Breedt,
T. Wevers,
G. Rixon,
A. Delgado,
A. Yoldas,
Z. Kostrzewa-Rutkowska,
Ł. Wyrzykowski,
M. van Leeuwen,
N. Blagorodnova,
H. Campbell,
D. Eappachen,
M. Fraser,
N. Ihanec,
S. E. Koposov,
K. Kruszyńska,
G. Marton,
K. A. Rybicki,
A. G. A. Brown,
P. W. Burgess,
G. Busso,
S. Cowell,
F. De Angeli,
C. Diener
, et al. (86 additional authors not shown)
Abstract:
Since July 2014, the Gaia mission has been engaged in a high-spatial-resolution, time-resolved, precise, accurate astrometric, and photometric survey of the entire sky.
Aims: We present the Gaia Science Alerts project, which has been in operation since 1 June 2016. We describe the system which has been developed to enable the discovery and publication of transient photometric events as seen by G…
▽ More
Since July 2014, the Gaia mission has been engaged in a high-spatial-resolution, time-resolved, precise, accurate astrometric, and photometric survey of the entire sky.
Aims: We present the Gaia Science Alerts project, which has been in operation since 1 June 2016. We describe the system which has been developed to enable the discovery and publication of transient photometric events as seen by Gaia.
Methods: We outline the data handling, timings, and performances, and we describe the transient detection algorithms and filtering procedures needed to manage the high false alarm rate. We identify two classes of events: (1) sources which are new to Gaia and (2) Gaia sources which have undergone a significant brightening or fading. Validation of the Gaia transit astrometry and photometry was performed, followed by testing of the source environment to minimise contamination from Solar System objects, bright stars, and fainter near-neighbours.
Results: We show that the Gaia Science Alerts project suffers from very low contamination, that is there are very few false-positives. We find that the external completeness for supernovae, $C_E=0.46$, is dominated by the Gaia scanning law and the requirement of detections from both fields-of-view. Where we have two or more scans the internal completeness is $C_I=0.79$ at 3 arcsec or larger from the centres of galaxies, but it drops closer in, especially within 1 arcsec.
Conclusions: The per-transit photometry for Gaia transients is precise to 1 per cent at $G=13$, and 3 per cent at $G=19$. The per-transit astrometry is accurate to 55 milliarcseconds when compared to Gaia DR2. The Gaia Science Alerts project is one of the most homogeneous and productive transient surveys in operation, and it is the only survey which covers the whole sky at high spatial resolution (subarcsecond), including the Galactic plane and bulge.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Robust subgroup discovery
Authors:
Hugo Manuel Proença,
Peter Grünwald,
Thomas Bäck,
Matthijs van Leeuwen
Abstract:
We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same tim…
▽ More
We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, including traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, finding optimal subgroup lists is NP-hard. Therefore, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration. In fact, the greedy gain is shown to be equivalent to a Bayesian one-sample proportion, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. Furthermore, we empirically show on 54 datasets that SSD++ outperforms previous subgroup discovery methods in terms of quality, generalisation on unseen data, and subgroup list size.
△ Less
Submitted 30 June, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Finding Efficient Trade-offs in Multi-Fidelity Response Surface Modeling
Authors:
Sander van Rijn,
Sebastian Schmitt,
Matthijs van Leeuwen,
Thomas Bäck
Abstract:
In the context of optimization approaches to engineering applications, time-consuming simulations are often utilized which can be configured to deliver solutions for various levels of accuracy, commonly referred to as different fidelity levels. It is common practice to train hierarchical surrogate models on the objective functions in order to speed-up the optimization process. These operate under…
▽ More
In the context of optimization approaches to engineering applications, time-consuming simulations are often utilized which can be configured to deliver solutions for various levels of accuracy, commonly referred to as different fidelity levels. It is common practice to train hierarchical surrogate models on the objective functions in order to speed-up the optimization process. These operate under the assumption that there is a correlation between the high- and low-fidelity versions of the problem that can be exploited to cheaply gain information. In the practical scenario where the computational budget has to be allocated between multiple fidelities, limited guidelines are available to help make that division. In this paper we evaluate a range of different choices for a two-fidelity setup that provide helpful intuitions about the trade-off between evaluating in high- or low-fidelity. We present a heuristic method based on subsampling from an initial Design of Experiments (DoE) to find a suitable division of the computational budget between the fidelity levels. This enables the setup of multi-fidelity optimizations which utilize the available computational budget efficiently, independent of the multi-fidelity model used.
△ Less
Submitted 16 May, 2022; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Progenitor, environment, and modelling of the interacting transient, AT 2016jbu (Gaia16cfr)
Authors:
S. J. Brennan,
M. Fraser,
J. Johansson,
A. Pastorello,
R. Kotak,
H. F. Stevance,
T. -W. Chen,
J. J. Eldridge,
S. Bose,
P. J. Brown,
E. Callis,
R. Cartier,
M. Dennefeld,
Subo Dong,
P. Duffy,
N. Elias-Rosa,
G. Hosseinzadeh,
E. Hsiao,
H. Kuncarayakti,
A. Martin-Carrillo,
B. Monard,
G. Pignata,
D. Sand,
B. J. Shappee,
S. J. Smartt
, et al. (45 additional authors not shown)
Abstract:
We present the bolometric lightcurve, identification and analysis of the progenitor candidate, and preliminary modelling of AT2016jbu (Gaia16cfr). We find a progenitor consistent with a $\sim$22--25~$M_{\odot}$ yellow hypergiant surrounded by a dusty circumstellar shell, in agreement with what has been previously reported. We see evidence for significant photometric variability in the progenitor,…
▽ More
We present the bolometric lightcurve, identification and analysis of the progenitor candidate, and preliminary modelling of AT2016jbu (Gaia16cfr). We find a progenitor consistent with a $\sim$22--25~$M_{\odot}$ yellow hypergiant surrounded by a dusty circumstellar shell, in agreement with what has been previously reported. We see evidence for significant photometric variability in the progenitor, as well as strong H$α$ emission consistent with pre-existing circumstellar material. The age of the environment as well as the resolved stellar population surrounding AT2016jbu, support a progenitor age of $>$10 Myr, consistent with a progenitor mass of $\sim$22~$M_{\odot}$. A joint analysis of the velocity evolution of AT2016jbu, and the photospheric radius inferred from the bolometric lightcurve shows the transient is consistent with two successive outbursts/explosions. The first outburst ejected material with velocity $\sim$650$kms^{-1}$, while the second, more energetic event, ejected material at $\sim$4500$kms^{-1}$. Whether the latter is the core-collapse of the progenitor remains uncertain. We place a limit on the ejected $^{56}$Ni mass of $<$0.016$M_{\odot}$. Using the BPASS code, we explore a wide range of possible progenitor systems, and find that the majority of these are in binaries, some of which are undergoing mass transfer or common envelope evolution immediately prior to explosion. Finally, we use the SNEC code to demonstrate that the low-energy explosion within some of these binary systems, together with sufficient CSM, can reproduce the overall morphology of the lightcurve of AT2016jbu.
△ Less
Submitted 27 April, 2022; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Photometric and spectroscopic evolution of the interacting transient AT 2016jbu (Gaia16cfr)
Authors:
S. J. Brennan,
M. Fraser,
J. Johansson,
A. Pastorello,
R. Kotak,
H. F. Stevance,
T. -W. Chen,
J. J. Eldridge,
S. Bose,
P. J. Brown,
E. Callis,
R. Cartier,
M. Dennefeld,
Subo Dong,
P. Duffy,
N. Elias-Rosa,
G. Hosseinzadeh,
E. Hsiao,
H. Kuncarayakti,
A. Martin-Carrillo,
B. Monard,
A. Nyholm,
G. Pignata,
D. Sand,
B. J. Shappee
, et al. (46 additional authors not shown)
Abstract:
We present the results from a high cadence, multi-wavelength observation campaign of AT 2016jbu (aka Gaia16cfr), an interacting transient. This dataset complements the current literature by adding higher cadence as well as extended coverage of the lightcurve evolution and late-time spectroscopic evolution. Photometric coverage reveals that AT 2016jbu underwent significant photometric variability f…
▽ More
We present the results from a high cadence, multi-wavelength observation campaign of AT 2016jbu (aka Gaia16cfr), an interacting transient. This dataset complements the current literature by adding higher cadence as well as extended coverage of the lightcurve evolution and late-time spectroscopic evolution. Photometric coverage reveals that AT 2016jbu underwent significant photometric variability followed by two luminous events, the latter of which reached an absolute magnitude of M$_V\sim$-18.5 mag. This is similar to the transient SN 2009ip whose nature is still debated. Spectra are dominated by narrow emission lines and show a blue continuum during the peak of the second event. AT 2016jbu shows signatures of a complex, non-homogeneous circumstellar material (CSM). We see slowly evolving asymmetric hydrogen line profiles, with velocities of 500km$s^{-1}$ seen in narrow emission features from a slow moving CSM, and up to 10,000km$s^{-1}$ seen in broad absorption from some high velocity material. Late-time spectra ($\sim$+1 year) show a lack of forbidden emission lines expected from a core-collapse supernova and are dominated by strong emission from H, He i and Ca ii. Strong asymmetric emission features, a bumpy lightcurve, and continually evolving spectra suggest an inhibit nebular phase. We compare the evolution of H$α$ among SN 2009ip-like transients and find possible evidence for orientation angle effects. The light-curve evolution of AT 2016jbu suggests similar, but not identical, circumstellar environments to other SN 2009ip-like transients.
△ Less
Submitted 27 April, 2022; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Gaia Early Data Release 3: The Galactic anticentre
Authors:
Gaia Collaboration,
T. Antoja,
P. McMillan,
G. Kordopatis,
P. Ramos,
A. Helmi,
E. Balbinot,
T. Cantat-Gaudin,
L. Chemin,
F. Figueras,
C. Jordi,
S. Khanna,
M. Romero-Gomez,
G. Seabroke,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
D. W. Evans,
L. Eyer,
A. Hutton,
F. Jansen
, et al. (395 additional authors not shown)
Abstract:
We aim to demonstrate the scientific potential of the Gaia Early Data Release 3 (EDR3) for the study of the Milky Way structure and evolution. We used astrometric positions, proper motions, parallaxes, and photometry from EDR3 to select different populations and components and to calculate the distances and velocities in the direction of the anticentre. We explore the disturbances of the current d…
▽ More
We aim to demonstrate the scientific potential of the Gaia Early Data Release 3 (EDR3) for the study of the Milky Way structure and evolution. We used astrometric positions, proper motions, parallaxes, and photometry from EDR3 to select different populations and components and to calculate the distances and velocities in the direction of the anticentre. We explore the disturbances of the current disc, the spatial and kinematical distributions of early accreted versus in-situ stars, the structures in the outer parts of the disc, and the orbits of open clusters Berkeley 29 and Saurer 1. We find that: i) the dynamics of the Galactic disc are very complex with vertical asymmetries, and new correlations, including a bimodality with disc stars with large angular momentum moving vertically upwards from below the plane, and disc stars with slightly lower angular momentum moving preferentially downwards; ii) we resolve the kinematic substructure (diagonal ridges) in the outer parts of the disc for the first time; iii) the red sequence that has been associated with the proto-Galactic disc that was present at the time of the merger with Gaia-Enceladus-Sausage is currently radially concentrated up to around 14 kpc, while the blue sequence that has been associated with debris of the satellite extends beyond that; iv) there are density structures in the outer disc, both above and below the plane, most probably related to Monoceros, the Anticentre Stream, and TriAnd, for which the Gaia data allow an exhaustive selection of candidate member stars and dynamical study; and v) the open clusters Berkeley~29 and Saurer~1, despite being located at large distances from the Galactic centre, are on nearly circular disc-like orbits. We demonstrate how, once again, the Gaia are crucial for our understanding of the different pieces of our Galaxy and their connection to its global structure and history.
△ Less
Submitted 26 April, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms
Authors:
Alexander Marx,
Lincen Yang,
Matthijs van Leeuwen
Abstract:
Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can…
▽ More
Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data. Further, we show that CMI can be consistently estimated for discrete-continuous mixture variables by learning an adaptive histogram model. In practice, we estimate such a model by iteratively discretizing the continuous data points in the mixture variables. To evaluate the performance of our estimator, we benchmark it against state-of-the-art CMI estimators as well as evaluate it in a causal discovery setting.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Comparison of $pp$ and $p \bar{p}$ differential elastic cross sections and observation of the exchange of a colorless $C$-odd gluonic compound
Authors:
V. M. Abazov,
B. Abbott,
B. S. Acharya,
M. Adams,
T. Adams,
J. P. Agnew,
G. D. Alexeev,
G. Alkhazov,
A. Alton,
G. A. Alves,
G. Antchev,
A. Askew,
P. Aspell,
A. C. S. Assis Jesus,
I. Atanassov,
S. Atkins,
K. Augsten,
V. Aushev,
Y. Aushev,
V. Avati,
C. Avila,
F. Badaud,
J. Baechler,
L. Bagby,
C. Baldenegro Barrera
, et al. (451 additional authors not shown)
Abstract:
We describe an analysis comparing the $p\bar{p}$ elastic cross section as measured by the D0 Collaboration at a center-of-mass energy of 1.96 TeV to that in $pp$ collisions as measured by the TOTEM Collaboration at 2.76, 7, 8, and 13 TeV using a model-independent approach. The TOTEM cross sections extrapolated to a center-of-mass energy of $\sqrt{s} =$ 1.96 TeV are compared with the D0 measurement…
▽ More
We describe an analysis comparing the $p\bar{p}$ elastic cross section as measured by the D0 Collaboration at a center-of-mass energy of 1.96 TeV to that in $pp$ collisions as measured by the TOTEM Collaboration at 2.76, 7, 8, and 13 TeV using a model-independent approach. The TOTEM cross sections extrapolated to a center-of-mass energy of $\sqrt{s} =$ 1.96 TeV are compared with the D0 measurement in the region of the diffractive minimum and the second maximum of the $pp$ cross section. The two data sets disagree at the 3.4$σ$ level and thus provide evidence for the $t$-channel exchange of a colorless, $C$-odd gluonic compound, also known as the odderon. We combine these results with a TOTEM analysis of the same $C$-odd exchange based on the total cross section and the ratio of the real to imaginary parts of the forward elastic scattering amplitude in $pp$ scattering. The combined significance of these results is larger than 5$σ$ and is interpreted as the first observation of the exchange of a colorless, $C$-odd gluonic compound.
△ Less
Submitted 25 June, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Gaia Early Data Release 3: The Gaia Catalogue of Nearby Stars
Authors:
Gaia Collaboration,
R. L. Smart,
L. M. Sarro,
J. Rybizki,
C. Reylé,
A. C. Robin,
N. C. Hambly,
U. Abbas,
M. A. Barstow,
J. H. J. de Bruijne,
B. Bucciarelli,
J. M. Carrasco,
W. J. Cooper,
S. T. Hodgkin,
E. Masana,
D. Michalik,
J. Sahlmann,
A. Sozzetti,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
D. W. Evans
, et al. (398 additional authors not shown)
Abstract:
We produce a clean and well-characterised catalogue of objects within 100\,pc of the Sun from the \G\ Early Data Release 3. We characterise the catalogue through comparisons to the full data release, external catalogues, and simulations. We carry out a first analysis of the science that is possible with this sample to demonstrate its potential and best practices for its use.
The selection of obj…
▽ More
We produce a clean and well-characterised catalogue of objects within 100\,pc of the Sun from the \G\ Early Data Release 3. We characterise the catalogue through comparisons to the full data release, external catalogues, and simulations. We carry out a first analysis of the science that is possible with this sample to demonstrate its potential and best practices for its use.
The selection of objects within 100\,pc from the full catalogue used selected training sets, machine-learning procedures, astrometric quantities, and solution quality indicators to determine a probability that the astrometric solution is reliable. The training set construction exploited the astrometric data, quality flags, and external photometry. For all candidates we calculated distance posterior probability densities using Bayesian procedures and mock catalogues to define priors. Any object with reliable astrometry and a non-zero probability of being within 100\,pc is included in the catalogue.
We have produced a catalogue of \NFINAL\ objects that we estimate contains at least 92\% of stars of stellar type M9 within 100\,pc of the Sun. We estimate that 9\% of the stars in this catalogue probably lie outside 100\,pc, but when the distance probability function is used, a correct treatment of this contamination is possible. We produced luminosity functions with a high signal-to-noise ratio for the main-sequence stars, giants, and white dwarfs. We examined in detail the Hyades cluster, the white dwarf population, and wide-binary systems and produced candidate lists for all three samples. We detected local manifestations of several streams, superclusters, and halo objects, in which we identified 12 members of \G\ Enceladus. We present the first direct parallaxes of five objects in multiple systems within 10\,pc of the Sun.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Gaia Early Data Release 3: Acceleration of the solar system from Gaia astrometry
Authors:
Gaia Collaboration,
S. A. Klioner,
F. Mignard,
L. Lindegren,
U. Bastian,
P. J. McMillan,
J. Hernández,
D. Hobbs,
M. Ramos-Lerate,
M. Biermann,
A. Bombrun,
A. de Torres,
E. Gerlach,
R. Geyer,
T. Hilger,
U. Lammers,
H. Steidelmüller,
C. A. Stephenson,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
C. Babusiaux,
O. L. Creevey,
D. W. Evans
, et al. (392 additional authors not shown)
Abstract:
Context. Gaia Early Data Release 3 (Gaia EDR3) provides accurate astrometry for about 1.6 million compact (QSO-like) extragalactic sources, 1.2 million of which have the best-quality five-parameter astrometric solutions.
Aims. The proper motions of QSO-like sources are used to reveal a systematic pattern due to the acceleration of the solar system barycentre with respect to the rest frame of the…
▽ More
Context. Gaia Early Data Release 3 (Gaia EDR3) provides accurate astrometry for about 1.6 million compact (QSO-like) extragalactic sources, 1.2 million of which have the best-quality five-parameter astrometric solutions.
Aims. The proper motions of QSO-like sources are used to reveal a systematic pattern due to the acceleration of the solar system barycentre with respect to the rest frame of the Universe. Apart from being an important scientific result by itself, the acceleration measured in this way is a good quality indicator of the Gaia astrometric solution. Methods. The effect of the acceleration is obtained as a part of the general expansion of the vector field of proper motions in Vector Spherical Harmonics (VSH). Various versions of the VSH fit and various subsets of the sources are tried and compared to get the most consistent result and a realistic estimate of its uncertainty. Additional tests with the Gaia astrometric solution are used to get a better idea on possible systematic errors in the estimate.
Results. Our best estimate of the acceleration based on Gaia EDR3 is $(2.32 \pm 0.16) \times 10^{-10}$ m s${}^{-2}$ (or $7.33 \pm 0.51$ km s$^{-1}$ Myr${}^{-1}$) towards $α= 269.1^\circ \pm 5.4^\circ$, $δ= -31.6^\circ \pm 4.1^\circ$, corresponding to a proper motion amplitude of $5.05 \pm 0.35$ $μ$as yr${}^{-1}$. This is in good agreement with the acceleration expected from current models of the Galactic gravitational potential. We expect that future Gaia data releases will provide estimates of the acceleration with uncertainties substantially below 0.1 $μ$as yr${}^{-1}$.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Gaia Early Data Release 3: Structure and properties of the Magellanic Clouds
Authors:
Gaia Collaboration,
X. Luri,
L. Chemin,
G. Clementini,
H. E. Delgado,
P. J. McMillan,
M. Romero-Gómez,
E. Balbinot,
A. Castro-Ginard,
R. Mor,
V. Ripepi,
L. M. Sarro,
M. -R. L. Cioni,
C. Fabricius,
A. Garofalo,
A. Helmi,
T. Muraveva,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
D. W. Evans
, et al. (395 additional authors not shown)
Abstract:
We compare the Gaia DR2 and Gaia EDR3 performances in the study of the Magellanic Clouds and show the clear improvements in precision and accuracy in the new release. We also show that the systematics still present in the data make the determination of the 3D geometry of the LMC a difficult endeavour; this is at the very limit of the usefulness of the Gaia EDR3 astrometry, but it may become feasib…
▽ More
We compare the Gaia DR2 and Gaia EDR3 performances in the study of the Magellanic Clouds and show the clear improvements in precision and accuracy in the new release. We also show that the systematics still present in the data make the determination of the 3D geometry of the LMC a difficult endeavour; this is at the very limit of the usefulness of the Gaia EDR3 astrometry, but it may become feasible with the use of additional external data.
We derive radial and tangential velocity maps and global profiles for the LMC for the several subsamples we defined. To our knowledge, this is the first time that the two planar components of the ordered and random motions are derived for multiple stellar evolutionary phases in a galactic disc outside the Milky Way, showing the differences between younger and older phases. We also analyse the spatial structure and motions in the central region, the bar, and the disc, providing new insights into features and kinematics.
Finally, we show that the Gaia EDR3 data allows clearly resolving the Magellanic Bridge, and we trace the density and velocity flow of the stars from the SMC towards the LMC not only globally, but also separately for young and evolved populations. This allows us to confirm an evolved population in the Bridge that is slightly shift from the younger population. Additionally, we were able to study the outskirts of both Magellanic Clouds, in which we detected some well-known features and indications of new ones.
△ Less
Submitted 4 January, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Gaia Early Data Release 3: Summary of the contents and survey properties
Authors:
Gaia Collaboration,
A. G. A Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
D. W. Evans,
L. Eyer,
A. Hutton,
F. Jansen,
C. Jordi,
S. A. Klioner,
U. Lammers,
L. Lindegren,
X. Luri,
F. Mignard,
C. Panem,
D. Pourbaix,
S. Randich,
P. Sartoretti,
C. Soubiran,
N. A. Walton,
F. Arenou
, et al. (401 additional authors not shown)
Abstract:
We present the early installment of the third Gaia data release, Gaia EDR3, consisting of astrometry and photometry for 1.8 billion sources brighter than magnitude 21, complemented with the list of radial velocities from Gaia DR2. Gaia EDR3 contains celestial positions and the apparent brightness in G for approximately 1.8 billion sources. For 1.5 billion of those sources, parallaxes, proper motio…
▽ More
We present the early installment of the third Gaia data release, Gaia EDR3, consisting of astrometry and photometry for 1.8 billion sources brighter than magnitude 21, complemented with the list of radial velocities from Gaia DR2. Gaia EDR3 contains celestial positions and the apparent brightness in G for approximately 1.8 billion sources. For 1.5 billion of those sources, parallaxes, proper motions, and the (G_BP-G_RP) colour are also available. The passbands for G, G_BP, and G_RP are provided as part of the release. For ease of use, the 7 million radial velocities from Gaia DR2 are included in this release, after the removal of a small number of spurious values. New radial velocities will appear as part of Gaia DR3. Finally, Gaia EDR3 represents an updated materialisation of the celestial reference frame (CRF) in the optical, the Gaia-CRF3, which is based solely on extragalactic sources. The creation of the source list for Gaia EDR3 includes enhancements that make it more robust with respect to high proper motion stars, and the disturbing effects of spurious and partially resolved sources. The source list is largely the same as that for Gaia DR2, but it does feature new sources and there are some notable changes. The source list will not change for Gaia DR3. Gaia EDR3 represents a significant advance over Gaia DR2, with parallax precisions increased by 30 percent, proper motion precisions increased by a factor of 2, and the systematic errors in the astrometry suppressed by 30--40 percent for the parallaxes and by a factor ~2.5 for the proper motions. The photometry also features increased precision, but above all much better homogeneity across colour, magnitude, and celestial position. A single passband for G, G_BP, and G_RP is valid over the entire magnitude and colour range, with no systematics above the 1 percent level.
△ Less
Submitted 9 June, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Evaporating droplets on inclined plant leaves and synthetic surfaces: experiments and mathematical models
Authors:
Eloise C. Tredenick,
W. Alison Forster,
Ravindra Pethiyagoda,
Rebecca M. van Leeuwen,
Scott W. McCue
Abstract:
Hypothesis: Evaporation of surfactant droplets on leaves is complicated due to the complex physical and chemical properties of the leaf surfaces. However, for certain leaf surfaces for which the evaporation process appears to follow the standard constant-contact-radius or constant-contact-angle modes, it should be possible to mimic the droplet evaporation with both a well-chosen synthetic surface…
▽ More
Hypothesis: Evaporation of surfactant droplets on leaves is complicated due to the complex physical and chemical properties of the leaf surfaces. However, for certain leaf surfaces for which the evaporation process appears to follow the standard constant-contact-radius or constant-contact-angle modes, it should be possible to mimic the droplet evaporation with both a well-chosen synthetic surface and a relatively simple mathematical model.
Experiments: Surfactant droplet evaporation experiments were performed on two commercial crop species, wheat and capsicum, along with two synthetic surfaces, up to a $90\,^{\circ}$ incline. The time-dependence of the droplets' contact angles, height, volume and contact radius was measured throughout the evaporation experiments. Mathematical models were developed to simulate the experiments.
Findings: With one clear exception, for all combinations of surfaces, surfactant concentrations and angles, the experiments appear to follow the standard evaporation modes and are well described by the mathematical models (modified Popov and Young-Laplace-Popov). The exception is wheat with a high surfactant concentration, for which droplet evaporation appears nonstandard and deviates from the diffusion limited models, perhaps due to additional mechanisms such as the adsorption of surfactant, stomatal density or an elongated shape in the direction of the grooves in the wheat surface.
△ Less
Submitted 27 January, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Discovering outstanding subgroup lists for numeric targets using MDL
Authors:
Hugo M. Proença,
Peter Grünwald,
Thomas Bäck,
Matthijs van Leeuwen
Abstract:
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperpar…
▽ More
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperparameters.
We propose a dispersion-aware problem formulation for subgroup set discovery that is based on the minimum description length (MDL) principle and subgroup lists. We argue that the best subgroup list is the one that best summarizes the data given the overall distribution of the target. We restrict our focus to a single numeric target variable and show that our formalization coincides with an existing quality measure when finding a single subgroup, but that-in addition-it allows to trade off subgroup quality with the complexity of the subgroup. We next propose SSD++, a heuristic algorithm for which we empirically demonstrate that it returns outstanding subgroup lists: non-redundant sets of compact subgroups that stand out by having strongly deviating means and small spread.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Unsupervised Discretization by Two-dimensional MDL-based Histogram
Authors:
Lincen Yang,
Mitra Baratchi,
Matthijs van Leeuwen
Abstract:
Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectan…
▽ More
Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable.
To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalized maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which Partitions each dimension ALternately and then Merges neighboring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to the state-of-the-art multivariate discretization method IPD. Finally, we apply our algorithm to three spatial datasets, and we demonstrate that, compared to kernel density estimation (KDE), our algorithm not only reveals more detailed density changes, but also fits unseen data better, as measured by the log-likelihood.
△ Less
Submitted 9 December, 2022; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Electromagnetic counterparts to gravitational wave events from Gaia
Authors:
Z. Kostrzewa-Rutkowska,
P. G. Jonker,
S. T. Hodgkin,
D. Eappachen,
D. L. Harrison,
S. E. Koposov,
G. Rixon,
L. Wyrzykowski,
A. Yoldas,
E. Breedt,
A. Delgado,
M. van Leeuwen,
T. Wevers,
P. W. Burgess,
F. De Angeli,
D. W. Evans,
P. J. Osborne,
M. Riello
Abstract:
The recent discoveries of gravitational wave events and in one case also its electromagnetic (EM) counterpart allow us to study the Universe in a novel way. The increased sensitivity of the LIGO and Virgo detectors has opened the possibility for regular detections of EM transient events from mergers of stellar remnants. Gravitational wave sources are expected to have sky localisation up to a few h…
▽ More
The recent discoveries of gravitational wave events and in one case also its electromagnetic (EM) counterpart allow us to study the Universe in a novel way. The increased sensitivity of the LIGO and Virgo detectors has opened the possibility for regular detections of EM transient events from mergers of stellar remnants. Gravitational wave sources are expected to have sky localisation up to a few hundred square degrees, thus Gaia as an all-sky multi-epoch photometric survey has the potential to be a good tool to search for the EM counterparts. In this paper we study the possibility of detecting EM counterparts to gravitational wave sources using the Gaia Science Alerts system. We develop an extension to current used algorithms to find transients and test its capabilities in discovering candidate transients on a sample of events from the observation periods O1 and O2 of LIGO and Virgo. For the gravitational wave events from the current run O3 we expect that about 16 (25) per cent should fall in sky regions observed by Gaia 7 (10) days after gravitational wave. The new algorithm will provide about 10 candidates per day from the whole sky.
△ Less
Submitted 23 April, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Design and Performance of a Silicon Tungsten Calorimeter Prototype Module and the Associated Readout
Authors:
T. Awes,
C. L. Britton,
T. Chujo,
T. Cormier,
M. N. Ericson,
N. B. Ezell,
D. Fehlker,
S. S. Frank,
Y. Fukuda,
T. Gunji,
T. Hachiya,
H. Hamagaki,
S. Hayashi,
M. Hirano,
R. Hosokawa,
M. Inaba,
K. Ito,
Y. Kawamura,
D. Kawana,
B. Kim,
S. Kudo,
C. Loizides,
Y. Miake,
G. Nooren,
N. Novitzky
, et al. (19 additional authors not shown)
Abstract:
We describe the details of a silicon-tungsten prototype electromagnetic calorimeter module and associated readout electronics. Detector performance for this prototype has been measured in test beam experiments at the CERN PS and SPS accelerator facilities in 2015/16. The results are compared to those in Monte Carlo Geant4 simulations. This is the first real-world demonstration of the performance o…
▽ More
We describe the details of a silicon-tungsten prototype electromagnetic calorimeter module and associated readout electronics. Detector performance for this prototype has been measured in test beam experiments at the CERN PS and SPS accelerator facilities in 2015/16. The results are compared to those in Monte Carlo Geant4 simulations. This is the first real-world demonstration of the performance of a custom ASIC designed for fast, lower-power, high-granularity applications.
△ Less
Submitted 9 December, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Vouw: Geometric Pattern Mining using the MDL Principle
Authors:
Micky Faas,
Matthijs van Leeuwen
Abstract:
We introduce geometric pattern mining, the problem of finding recurring local structure in discrete, geometric matrices. It differs from existing pattern mining problems by identifying complex spatial relations between elements, resulting in arbitrarily shaped patterns. After we formalise this new type of pattern mining, we propose an approach to selecting a set of patterns using the Minimum Descr…
▽ More
We introduce geometric pattern mining, the problem of finding recurring local structure in discrete, geometric matrices. It differs from existing pattern mining problems by identifying complex spatial relations between elements, resulting in arbitrarily shaped patterns. After we formalise this new type of pattern mining, we propose an approach to selecting a set of patterns using the Minimum Description Length principle. We demonstrate the potential of our approach by introducing Vouw, a heuristic algorithm for mining exact geometric patterns. We show that Vouw delivers high-quality results with a synthetic benchmark.
△ Less
Submitted 22 November, 2019; v1 submitted 21 November, 2019;
originally announced November 2019.
-
The pressure underneath a skate at rest
Authors:
J. M. J. van Leeuwen
Abstract:
The pressure distribution is calculated underneath a skate which is pushed in the ice by the weight of a skater at rest. Due to the sharp edges of the skate the deformation is partly elastic and partly plastic. The ratio of the plastic and elastic contribution to the reaction force is determined. Using this ratio the deformation in ice with a finite hardness can be mapped on the problem of the def…
▽ More
The pressure distribution is calculated underneath a skate which is pushed in the ice by the weight of a skater at rest. Due to the sharp edges of the skate the deformation is partly elastic and partly plastic. The ratio of the plastic and elastic contribution to the reaction force is determined. Using this ratio the deformation in ice with a finite hardness can be mapped on the problem of the deformation in a purely elastic medium with infinite hardness. Both the upright skate and the tilted position are exactly calculated.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Fabrication and beam test of a silicon-tungsten electromagnetic calorimeter
Authors:
Sanjib Muhuri,
Sourav Mukhopadhyay,
Vinay B. Chandratre,
Tapan K. Nayak,
Sumit Kumar Saha,
Sanchari Thakur,
Rama N. Singaraju,
Jogender Saini,
Anthony van den Brink,
Tatsuya Chujo,
Rajendra Nath Patra,
Marco van Leeuwen,
Shuaib Ahmad Khan,
Menka Sukhwani,
Gert-Jan Nooren,
Thomas Peitzmann
Abstract:
A silicon-tungsten (Si-W) sampling calorimeter, consisting of 19 alternate layers of silicon pad detectors (individual pad area of 1~cm$^2$) and tungsten absorbers (each of one radiation length), has been constructed for measurement of electromagnetic showers over a large energy range. The signal from each of the silicon pads is readout using an ASIC with a dynamic range from $-300$~fC to $+500$~f…
▽ More
A silicon-tungsten (Si-W) sampling calorimeter, consisting of 19 alternate layers of silicon pad detectors (individual pad area of 1~cm$^2$) and tungsten absorbers (each of one radiation length), has been constructed for measurement of electromagnetic showers over a large energy range. The signal from each of the silicon pads is readout using an ASIC with a dynamic range from $-300$~fC to $+500$~fC. Another ASIC with a larger dynamic range, $\pm 600$~fC has been used as a test study. The calorimeter was exposed to pion and electron beams at the CERN Super Proton Synchrotron (SPS) to characterise the response to minimum ionising particles (MIP) and showers from electromagnetic (EM) interactions. Pion beams of 120 GeV provided baseline measurements towards the understanding of the MIP behaviour in the silicon pad layers, while electron beams of energy from 5 GeV to 60 GeV rendered detailed shower profiles within the calorimeter. The energy deposition in each layer, the longitudinal shower profile, and the total energy deposition have been measured for each incident electron energy. Linear behaviour of the total measured energy ($E$) with that of the incident particle energy ($E_{0}$) ensured satisfactory calorimetric performance. For a subset of the data sample, selected based on the cluster position of the electromagnetic shower of the incident electron, the dependence of the measured energy resolution on $E_{0}$ has been found to be $σ/E = (15.36/\sqrt{E_0(\mathrm{GeV)}} \oplus 2.0) \%$.
△ Less
Submitted 13 January, 2020; v1 submitted 2 November, 2019;
originally announced November 2019.
-
Deep learning assessment of breast terminal duct lobular unit involution: towards automated prediction of breast cancer risk
Authors:
Suzanne C Wetstein,
Allison M Onken,
Christina Luffman,
Gabrielle M Baker,
Michael E Pyle,
Kevin H Kensler,
Ying Liu,
Bart Bakker,
Ruud Vlutters,
Marinus B van Leeuwen,
Laura C Collins,
Stuart J Schnitt,
Josien PW Pluim,
Rulla M Tamimi,
Yu**g J Heng,
Mitko Veta
Abstract:
Terminal ductal lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involuti…
▽ More
Terminal ductal lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses' Health Study (NHS). A first set of 92 WSIs was annotated for TDLUs, acini and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73$\pm$0.09, and segmented TDLUs and adipose tissue with Dice scores of 0.86$\pm$0.11 and 0.86$\pm$0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 95% CI [0.51, 0.83], 0.81, 95% CI [0.67, 0.90], and 0.73, 95% CI [0.54, 0.85], respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80, 95% CI [0.63, 0.90] for number of TDLUs per tissue area, 0.57, 95% CI [0.19, 0.77] for median TDLU span, and 0.80, 95% CI [0.62, 0.89] for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.