-
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models
Authors:
Hojjat Aghakhani,
Wei Dai,
Andre Manoel,
Xavier Fernandes,
Anant Kharkar,
Christopher Kruegel,
Giovanni Vigna,
David Evans,
Ben Zorn,
Robert Sim
Abstract:
With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training by injecting malicious data. Poisoning attacks…
▽ More
With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training by injecting malicious data. Poisoning attacks could be designed to influence the model's suggestions at run time for chosen contexts, such as inducing the model into suggesting insecure code payloads. To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set. In this work, we demonstrate two novel attacks, COVERT and TROJANPUZZLE, that can bypass static analysis by planting malicious poison data in out-of-context regions such as docstrings. Our most novel attack, TROJANPUZZLE, goes one step further in generating less suspicious poison data by never explicitly including certain (suspicious) parts of the payload in the poison data, while still inducing a model that suggests the entire payload when completing code (i.e., outside docstrings). This makes TROJANPUZZLE robust against signature-based dataset-cleansing methods that can filter out suspicious sequences from the training data. Our evaluation against models of two sizes demonstrates that both COVERT and TROJANPUZZLE have significant implications for practitioners when selecting code used to train or tune code-suggestion models.
△ Less
Submitted 24 January, 2024; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Gaia Data Release 3: Gaia scan-angle dependent signals and spurious periods
Authors:
B. Holl,
C. Fabricius,
J. Portell,
L. Lindegren,
P. Panuzzo,
M. Bernet,
J. Castañeda,
G. Jevardat de Fombelle,
M. Audard,
C. Ducourant,
D. L. Harrison,
D. W. Evans,
G. Busso,
A. Sozzetti,
E. Gosset,
F. Arenou,
F. De Angeli,
M. Riello,
L. Eyer,
L. Rimoldini,
P. Gavras,
N. Mowlavi,
K. Nienartowicz,
I. Lecoeur-Taïbi,
P. García-Lario
, et al. (1 additional authors not shown)
Abstract:
Context: Gaia DR3 time series data may contain spurious signals related to the time-dependent scan angle. Aims: We aim to explain the origin of scan-angle dependent signals and how they can lead to spurious periods, provide statistics to identify them in the data, and suggest how to deal with them in Gaia DR3 data and in future releases. Methods: Using real Gaia data, alongside numerical and analy…
▽ More
Context: Gaia DR3 time series data may contain spurious signals related to the time-dependent scan angle. Aims: We aim to explain the origin of scan-angle dependent signals and how they can lead to spurious periods, provide statistics to identify them in the data, and suggest how to deal with them in Gaia DR3 data and in future releases. Methods: Using real Gaia data, alongside numerical and analytical models, we visualise and explain the features observed in the data. Results: We demonstrated with Gaia data that source structure (multiplicity or extendedness) or pollution from close-by bright objects can cause biases in the image parameter determination from which photometric, astrometric and (indirectly) radial velocity time series are derived. These biases are a function of the time-dependent scan direction of the instrument and thus can introduce scan-angle dependent signals, which in turn can result in specific spurious periodic signals. Numerical simulations qualitatively reproduce the general structure observed in the spurious period and spatial distribution of photometry and astrometry. A variety of statistics allows for identification of affected sources. Conclusions: The origin of the scan-angle dependent signals and subsequent spurious periods is well-understood and is in majority caused by fixed-orientation optical pairs with separation <0.5" (amongst which binaries with P>>5y) and (cores of) distant galaxies. Though the majority of sources with affected derived parameters have been filtered out from the Gaia archive, there remain Gaia DR3 data that should be treated with care (e.g. gaia_source was untouched). Finally, the various statistics discussed in the paper can not only be used to identify and filter affected sources, but alternatively reveal new information about them not available through other means, especially in terms of binarity on sub-arcsecond scale.
△ Less
Submitted 31 May, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Authors:
Ahmed Salem,
Giovanni Cherubin,
David Evans,
Boris Köpf,
Andrew Paverd,
Anshuman Suri,
Shruti Tople,
Santiago Zanella-Béguelin
Abstract:
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy i…
▽ More
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. We use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) to uncover hitherto unknown relations that would have been difficult to spot otherwise.
△ Less
Submitted 20 April, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Dissecting Distribution Inference
Authors:
Anshuman Suri,
Yifu Lu,
Yan** Chen,
David Evans
Abstract:
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box thre…
▽ More
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference
△ Less
Submitted 5 April, 2024; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Gaia Data Release 3: All-sky classification of 12.4 million variable sources into 25 classes
Authors:
Lorenzo Rimoldini,
Berry Holl,
Panagiotis Gavras,
Marc Audard,
Joris De Ridder,
Nami Mowlavi,
Krzysztof Nienartowicz,
Grégory Jevardat de Fombelle,
Isabelle Lecoeur-Taïbi,
Lea Karbevska,
Dafydd W. Evans,
Péter Ábrahám,
Maria I. Carnerero,
Gisella Clementini,
Elisa Distefano,
Alessia Garofalo,
Pedro García-Lario,
Roy Gomel,
Sergei A. Klioner,
Katarzyna Kruszyńska,
Alessandro C. Lanzafame,
Thomas Lebzelter,
Gábor Marton,
Tsevi Mazeh,
Roberto Molinaro
, et al. (9 additional authors not shown)
Abstract:
Gaia DR3 contains 1.8 billion sources with G-band photometry, 1.5 billion of which with BP and RP photometry, complemented by positions on the sky, parallax, and proper motion. The median number of field-of-view transits in the three photometric bands is between 40 and 44 measurements per source and covers 34 months of data collection. We pursue a classification of Galactic and extra-galactic obje…
▽ More
Gaia DR3 contains 1.8 billion sources with G-band photometry, 1.5 billion of which with BP and RP photometry, complemented by positions on the sky, parallax, and proper motion. The median number of field-of-view transits in the three photometric bands is between 40 and 44 measurements per source and covers 34 months of data collection. We pursue a classification of Galactic and extra-galactic objects that are detected as variable by Gaia across the whole sky. Supervised machine learning (eXtreme Gradient Boosting and Random Forest) was employed to generate multi-class, binary, and meta-classifiers that classified variable objects with photometric time series in the G, BP, and RP bands. Classification results comprise 12.4 million sources (selected from a much larger set of potential variable objects) and include about 9 million variable stars classified into 22 variability types in the Milky Way and nearby galaxies such as the Magellanic Clouds and Andromeda, plus thousands of supernova explosions in distant galaxies, 1 million active galactic nuclei, and almost 2.5 million galaxies. The identification of galaxies was made possible by the artificial variability of extended objects as detected by Gaia, so they were published in the galaxy_candidates table of the Gaia DR3 archive, separate from the classifications of genuine variability (in the vari_classifier_result table). The latter contains 24 variability classes or class groups of periodic and non-periodic variables (pulsating, eclipsing, rotating, eruptive, cataclysmic, stochastic, and microlensing), with amplitudes from a few milli-magnitudes to several magnitudes.
△ Less
Submitted 7 March, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Structural Origin of Recovered Ferroelectricity in BaTiO$_3$ Nanoparticles
Authors:
H. Zhang,
S. Liu,
S. Ghose,
B. Ravel,
I. U. Idehenre,
Y. A. Barnakov,
S. A. Basun,
D. R. Evans,
T. A. Tyson
Abstract:
Nanoscale BaTiO3 particles (approximately 10 nm) prepared by ball-milling a mixture of oleic acid and heptane have been reported to have an electric polarization several times larger than that for bulk BaTiO3. In this work, detailed local, intermediate, and long-range structural studies are combined with spectroscopic measurements to develop a model structure of these materials. The X-ray spectros…
▽ More
Nanoscale BaTiO3 particles (approximately 10 nm) prepared by ball-milling a mixture of oleic acid and heptane have been reported to have an electric polarization several times larger than that for bulk BaTiO3. In this work, detailed local, intermediate, and long-range structural studies are combined with spectroscopic measurements to develop a model structure of these materials. The X-ray spectroscopic measurements reveal large Ti off-centering as the key factor producing the large spontaneous polarization in the nanoparticles. Temperature-dependent lattice parameter changes reveal the sharpening of the structural phase transitions in these BaTiO3 nanoparticles compared to the pure nanoparticle systems. Sharp crystalline-type peaks in the barium oleate Raman spectra suggest that this component in the composite core-shell matrix, a product of mechanochemical synthesis, stabilizes an enhanced polar structural phase of the BaTiO3 core nanoparticles.
△ Less
Submitted 15 February, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models
Authors:
Hannah Chen,
Yangfeng Ji,
David Evans
Abstract:
Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur when an adversary finds a small perturbation that preserves the classifier's prediction but changes the true label of an input. Adversarial training and certifie…
▽ More
Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur when an adversary finds a small perturbation that preserves the classifier's prediction but changes the true label of an input. Adversarial training and certified robust training have shown some effectiveness in improving the robustness of machine learnt models to fickle adversarial examples. We show that standard adversarial training methods focused on reducing vulnerability to fickle adversarial examples may make a model more vulnerable to obstinate adversarial examples, with experiments for both natural language inference and paraphrase identification tasks. To counter this phenomenon, we introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
△ Less
Submitted 28 October, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
The effect of variable labels on deep learning models trained to predict breast density
Authors:
Steven Squires,
Elaine F. Harkness,
D. Gareth Evans,
Susan M. Astley
Abstract:
Purpose: High breast density is associated with reduced efficacy of mammographic screening and increased risk of develo** breast cancer. Accurate and reliable automated density estimates can be used for direct risk prediction and passing density related information to further predictive models. Expert reader assessments of density show a strong relationship to cancer risk but also inter-reader v…
▽ More
Purpose: High breast density is associated with reduced efficacy of mammographic screening and increased risk of develo** breast cancer. Accurate and reliable automated density estimates can be used for direct risk prediction and passing density related information to further predictive models. Expert reader assessments of density show a strong relationship to cancer risk but also inter-reader variation. The effect of label variability on model performance is important when considering how to utilise automated methods for both research and clinical purposes. Methods: We utilise subsets of images with density labels to train a deep transfer learning model which is used to assess how label variability affects the map** from representation to prediction. We then create two end-to-end deep learning models which allow us to investigate the effect of label variability on the model representation formed. Results: We show that the trained map**s from representations to labels are altered considerably by the variability of reader scores. Training on labels with distribution variation removed causes the Spearman rank correlation coefficients to rise from $0.751\pm0.002$ to either $0.815\pm0.006$ when averaging across readers or $0.844\pm0.002$ when averaging across images. However, when we train different models to investigate the representation effect we see little difference, with Spearman rank correlation coefficients of $0.846\pm0.006$ and $0.850\pm0.006$ showing no statistically significant difference in the quality of the model representation with regard to density prediction. Conclusions: We show that the map** between representation and mammographic density prediction is significantly affected by label variability. However, the effect of the label variability on the model representation is limited.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Bypassing the Simulation-to-reality Gap: Online Reinforcement Learning using a Supervisor
Authors:
Benjamin David Evans,
Johannes Betz,
Hongrui Zheng,
Herman A. Engelbrecht,
Rahul Mangharam,
Hendrik W. Jordaan
Abstract:
Deep reinforcement learning (DRL) is a promising method to learn control policies for robots only from demonstration and experience. To cover the whole dynamic behaviour of the robot, DRL training is an active exploration process typically performed in simulation environments. Although this simulation training is cheap and fast, applying DRL algorithms to real-world settings is difficult. If agent…
▽ More
Deep reinforcement learning (DRL) is a promising method to learn control policies for robots only from demonstration and experience. To cover the whole dynamic behaviour of the robot, DRL training is an active exploration process typically performed in simulation environments. Although this simulation training is cheap and fast, applying DRL algorithms to real-world settings is difficult. If agents are trained until they perform safely in simulation, transferring them to physical systems is difficult due to the sim-to-real gap caused by the difference between the simulation dynamics and the physical robot. In this paper, we present a method of online training a DRL agent to drive autonomously on a physical vehicle by using a model-based safety supervisor. Our solution uses a supervisory system to check if the action selected by the agent is safe or unsafe and ensure that a safe action is always implemented on the vehicle. With this, we can bypass the sim-to-real problem while training the DRL algorithm safely, quickly, and efficiently. We compare our method with conventional learning in simulation and on a physical vehicle. We provide a variety of real-world experiments where we train online a small-scale vehicle to drive autonomously with no prior simulation training. The evaluation results show that our method trains agents with improved sample efficiency while never crashing, and the trained agents demonstrate better driving performance than those trained in simulation.
△ Less
Submitted 13 July, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Screening-Induced Phase Transitions in Core-Shell Ferroic Nanoparticles
Authors:
Anna N. Morozovska,
Eugene A. Eliseev,
Yulian M. Vysochanskii,
Viktoria V. Khist,
Dean R. Evans
Abstract:
Using the Landau-Ginzburg-Devonshire approach, we study screening-induced phase transitions in core-shell ferroic nanoparticles for three different shapes: an oblate disk, a sphere, and a prolate needle. The nanoparticle is made of a ferroic CuInP2S6 core and covered by a "tunable" screening shell made of a phase-change material with a conductivity that varies as the material changes between semic…
▽ More
Using the Landau-Ginzburg-Devonshire approach, we study screening-induced phase transitions in core-shell ferroic nanoparticles for three different shapes: an oblate disk, a sphere, and a prolate needle. The nanoparticle is made of a ferroic CuInP2S6 core and covered by a "tunable" screening shell made of a phase-change material with a conductivity that varies as the material changes between semiconductor and metallic phases. We reveal a critical influence of the shell screening length on the phase transitions and spontaneous polarization of the nanoparticle core. Since the tunable screening shell allows the control of the polar state and phase diagrams of core-shell ferroic nanoparticles, the obtained results can be of particular interest for applications in nonvolatile memory cells.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Gaia-TESS synergy: Improving the identification of transit candidates
Authors:
Aviad Panahi,
Tsevi Mazeh,
Shay Zucker,
David W. Latham,
Karen A. Collins,
Lorenzo Rimoldini,
Dafydd Wyn Evans,
Laurent Eyer
Abstract:
Context: The TESS team periodically issues a new list of transiting exoplanet candidates based on the analysis of the accumulating light curves obtained by the satellite. The list includes the estimated epochs, periods, and durations of the potential transits. As the point spread function (PSF) of TESS is relatively wide, follow-up photometric observations at higher spatial resolution are required…
▽ More
Context: The TESS team periodically issues a new list of transiting exoplanet candidates based on the analysis of the accumulating light curves obtained by the satellite. The list includes the estimated epochs, periods, and durations of the potential transits. As the point spread function (PSF) of TESS is relatively wide, follow-up photometric observations at higher spatial resolution are required in order to exclude apparent transits that are actually blended background eclipsing binaries (BEBs). Aims: The Gaia space mission, with its growing database of epoch photometry and high angular resolution, enables the production of distinct light curves for all sources included in the TESS PSF, up to the limiting magnitude of Gaia. This paper reports the results of an ongoing Gaia-TESS collaboration that uses the Gaia photometry to facilitate the identification of BEB candidates and even to confirm on-target candidates in some cases. Methods: We inspected the Gaia photometry of the individual sources included in the TESS PSF, searching for periodic dimming events compatible with their ephemerides and uncertainties, as published by TESS. The performance of the search depends mainly on the number of Gaia measurements during transit and their precision. Results: Since February 2021, the collaboration has been able to confirm 126 on-target candidates and exclude 124 as BEBs. Since June 2021, when our search methodology matured, we have been able to identify on the order of 5% as on-target candidates and another 5% as BEBs. Conclusions: This synergistic approach is combining the complementary capabilities of two of the astronomical space missions of NASA and ESA. It serves to optimize the process of detecting new planets by making better use of the resources of the astronomical community.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Are Attribute Inference Attacks Just Imputation?
Authors:
Bargav Jayaraman,
David Evans
Abstract:
Models can expose sensitive information about their training data. In an attribute inference attack, an adversary has partial knowledge of some training records and access to a model trained on those records, and infers the unknown values of a sensitive feature of those records. We study a fine-grained variant of attribute inference we call \emph{sensitive value inference}, where the adversary's g…
▽ More
Models can expose sensitive information about their training data. In an attribute inference attack, an adversary has partial knowledge of some training records and access to a model trained on those records, and infers the unknown values of a sensitive feature of those records. We study a fine-grained variant of attribute inference we call \emph{sensitive value inference}, where the adversary's goal is to identify with high confidence some records from a candidate set where the unknown attribute has a particular sensitive value. We explicitly compare attribute inference with data imputation that captures the training distribution statistics, under various assumptions about the training data available to the adversary. Our main conclusions are: (1) previous attribute inference methods do not reveal more about the training data from the model than can be inferred by an adversary without access to the trained model, but with the same knowledge of the underlying distribution as needed to train the attribute inference attack; (2) black-box attribute inference attacks rarely learn anything that cannot be learned without the model; but (3) white-box attacks, which we introduce and evaluate in the paper, can reliably identify some records with the sensitive value attribute that would not be predicted without having access to the model. Furthermore, we show that proposed defenses such as differentially private training and removing vulnerable records from training do not mitigate this privacy risk. The code for our experiments is available at \url{https://github.com/bargavj/EvaluatingDPML}.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Gaia Data Release 3: Summary of the content and survey properties
Authors:
Gaia Collaboration,
A. Vallenari,
A. G. A. Brown,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
L. Eyer,
R. Guerra,
A. Hutton,
C. Jordi,
S. A. Klioner,
U. L. Lammers,
L. Lindegren,
X. Luri,
F. Mignard,
C. Panem,
D. Pourbaix,
S. Randich,
P. Sartoretti,
C. Soubiran
, et al. (431 additional authors not shown)
Abstract:
We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photom…
▽ More
We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photometry in the G, G$_{BP}$, and G$_{RP}$ pass-bands already present in the Early Third Data Release. GDR3 introduces an impressive wealth of new data products. More than 33 million objects in the ranges $G_{rvs} < 14$ and $3100 <T_{eff} <14500 $, have new determinations of their mean radial velocities based on data collected by Gaia. We provide G$_{rvs}$ magnitudes for most sources with radial velocities, and a line broadening parameter is listed for a subset of these. Mean Gaia spectra are made available to the community. The GDR3 catalogue includes about 1 million mean spectra from the radial velocity spectrometer, and about 220 million low-resolution blue and red prism photometer BPRP mean spectra. The results of the analysis of epoch photometry are provided for some 10 million sources across 24 variability types. GDR3 includes astrophysical parameters and source class probabilities for about 470 million and 1500 million sources, respectively, including stars, galaxies, and quasars. Orbital elements and trend parameters are provided for some $800\,000$ astrometric, spectroscopic and eclipsing binaries. More than $150\,000$ Solar System objects, including new discoveries, with preliminary orbital solutions and individual epoch observations are part of this release. Reflectance spectra derived from the epoch BPRP spectral data are published for about 60\,000 asteroids. Finally, an additional data set is provided, namely the Gaia Andromeda Photometric Survey (abridged)
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Measurement of the Parity-Odd Angular Distribution of Gamma Rays From Polarized Neutron Capture on $^{35}$Cl
Authors:
N. Fomin,
R. Alarcon,
L. Alonzi,
E. Askanazi,
S. Baeßler,
S. Balascuta,
L. Barrón-Palos,
A. Barzilov,
D. Blyth,
J. D. Bowman,
N. Birge,
J. R. Calarco,
T. E. Chupp,
V. Cianciolo,
C. E. Coppola,
C. B. Crawford,
K. Craycraft,
D. Evans,
C. Fieseler,
E. Frlež,
J. Fry,
I. Garishvili,
M. T. W. Gericke,
R. C. Gillis,
K. B. Grammer
, et al. (39 additional authors not shown)
Abstract:
We report a measurement of two energy-weighted gamma cascade angular distributions from polarized slow neutron capture on the ${}^{35}$Cl nucleus, one parity-odd correlation proportional to $\vec{s_{n}} \cdot \vec{k_γ}$ and one parity-even correlation proportional to $\vec{s_{n}} \cdot \vec{k_{n}} \times \vec{k_γ}$. A parity violating asymmetry can appear in this reaction due to the weak nucleon-n…
▽ More
We report a measurement of two energy-weighted gamma cascade angular distributions from polarized slow neutron capture on the ${}^{35}$Cl nucleus, one parity-odd correlation proportional to $\vec{s_{n}} \cdot \vec{k_γ}$ and one parity-even correlation proportional to $\vec{s_{n}} \cdot \vec{k_{n}} \times \vec{k_γ}$. A parity violating asymmetry can appear in this reaction due to the weak nucleon-nucleon (NN) interaction which mixes opposite parity S and P-wave levels in the excited compound $^{36}$Cl nucleus formed upon slow neutron capture. If parity-violating (PV) and parity-conserving (PC) terms both exist, the measured differential cross section can be related to them via $\frac{dσ}{dΩ}\propto1+A_{γ, PV}\cosθ+A_{γ,PC}\sinθ$. The PV and PC asymmetries for energy-weighted gamma cascade angular distributions for polarized slow neutron capture on $^{35}$Cl averaged over the neutron energies from 2.27~meV to 9.53~meV were measured to be $A_{γ,PV}=(-23.9\pm0.7)\times 10^{-6}$ and $A_{γ,PC}=(0.1\pm0.7)\times 10^{-6}$. These results are consistent with previous experimental results. Systematic errors were quantified and shown to be small compared to the statistical error. These asymmetries in the angular distributions of the gamma rays emitted from the capture of polarized neutrons in $^{35}$Cl were used to verify the operation and data analysis procedures for the NPDGamma experiment which measured the parity-odd asymmetry in the angular distribution of gammas from polarized slow neutron capture on protons.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Combing for Credentials: Active Pattern Extraction from Smart Reply
Authors:
Bargav Jayaraman,
Esha Ghosh,
Melissa Chase,
Sambuddha Roy,
Wei Dai,
David Evans
Abstract:
Pre-trained large language models, such as GPT\nobreakdash-2 and BERT, are often fine-tuned to achieve state-of-the-art performance on a downstream task. One natural example is the ``Smart Reply'' application where a pre-trained model is tuned to provide suggested responses for a given query message. Since the tuning data is often sensitive data such as emails or chat transcripts, it is important…
▽ More
Pre-trained large language models, such as GPT\nobreakdash-2 and BERT, are often fine-tuned to achieve state-of-the-art performance on a downstream task. One natural example is the ``Smart Reply'' application where a pre-trained model is tuned to provide suggested responses for a given query message. Since the tuning data is often sensitive data such as emails or chat transcripts, it is important to understand and mitigate the risk that the model leaks its tuning data. We investigate potential information leakage vulnerabilities in a typical Smart Reply pipeline. We consider a realistic setting where the adversary can only interact with the underlying model through a front-end interface that constrains what types of queries can be sent to the model. Previous attacks do not work in these settings, but require the ability to send unconstrained queries directly to the model. Even when there are no constraints on the queries, previous attacks typically require thousands, or even millions, of queries to extract useful information, while our attacks can extract sensitive data in just a handful of queries. We introduce a new type of active extraction attack that exploits canonical patterns in text containing sensitive data. We show experimentally that it is possible for an adversary to extract sensitive user information present in the training data, even in realistic settings where all interactions with the model must go through a front-end that limits the types of queries. We explore potential mitigation strategies and demonstrate empirically how differential privacy appears to be a reasonably effective defense mechanism to such pattern extraction attacks.
△ Less
Submitted 2 September, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Electric field control of labyrinth domain structures in core-shell ferroelectric nanoparticles
Authors:
Anna N. Morozovska,
Eugene A. Eliseev,
Salia Cherifi-Hertel,
Dean R. Evans,
Riccardo Hertel
Abstract:
In the framework of the Landau-Ginzburg-Devonshire (LGD) approach, we studied the possibility of controlling the polarity and chirality of equilibrium domain structures by a homogeneous external electric field in a nanosized ferroelectric core covered with an ultra-thin shell of screening charge. Under certain screening lengths and core sizes, the minimum of the LGD energy, which consists of Landa…
▽ More
In the framework of the Landau-Ginzburg-Devonshire (LGD) approach, we studied the possibility of controlling the polarity and chirality of equilibrium domain structures by a homogeneous external electric field in a nanosized ferroelectric core covered with an ultra-thin shell of screening charge. Under certain screening lengths and core sizes, the minimum of the LGD energy, which consists of Landau-Devonshire energy, Ginzburg polarization gradient energy, and electrostatic terms, leads to the spontaneous appearance of stable labyrinth domain structures in the core. The labyrinths evolve from an initial polarization distribution consisting of arbitrarily small randomly oriented nanodomains. The equilibrium labyrinth structure is weakly influenced by details of the initial polarization distribution, such that one can obtain a quasi-continuum of nearly degenerate labyrinth structures, whose number is limited only by the mesh discretization density. Applying a homogeneous electric field to a nanoparticle with labyrinth domains, and subsequently removing it, allows inducing changes to the labyrinth structure, as the maze polarity is controlled by a field projection on the particle polar axis. Under specific conditions of the screening charge relaxation, the quasi-static dielectric susceptibility of the labyrinth structure can be negative, potentially leading to a negative capacitance effect. Considering the general validity of the LGD approach, we expect that an electric field control of labyrinth domains is possible in many spatially-confined nanosized ferroics, which can be potentially interesting for advanced cryptography and modern nanoelectronics.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
Gaia Data Release 3: Reflectance spectra of Solar System small bodies
Authors:
Gaia Collaboration,
L. Galluccio,
M. Delbo,
F. De Angeli,
T. Pauwels,
P. Tanga,
F. Mignard,
A. Cellino,
A. G. A. Brown,
K. Muinonen,
A. Penttila,
S. Jordan,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
L. Eyer,
R. Guerra,
A. Hutton,
C. Jordi
, et al. (422 additional authors not shown)
Abstract:
The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was deriv…
▽ More
The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was derived from measurements obtained by means of the Blue and Red photometers (BP/RP), which were binned in 16 discrete wavelength bands. We describe the processing of the Gaia spectral data of SSOs, explaining both the criteria used to select the subset of asteroid spectra published in Gaia DR3, and the different steps of our internal validation procedures. In order to further assess the quality of Gaia SSO reflectance spectra, we carried out external validation against SSO reflectance spectra obtained from ground-based and space-borne telescopes and available in the literature. For each selected SSO, an epoch reflectance was computed by dividing the calibrated spectrum observed by the BP/RP at each transit on the focal plane by the mean spectrum of a solar analogue. The latter was obtained by averaging the Gaia spectral measurements of a selected sample of stars known to have very similar spectra to that of the Sun. Finally, a mean of the epoch reflectance spectra was calculated in 16 spectral bands for each SSO. The agreement between Gaia mean reflectance spectra and those available in the literature is good for bright SSOs, regardless of their taxonomic spectral class. We identify an increase in the spectral slope of S-type SSOs with increasing phase angle. Moreover, we show that the spectral slope increases and the depth of the 1 um absorption band decreases for increasing ages of S-type asteroid families.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A method for comparing multiple imputation techniques: a case study on the U.S. National COVID Cohort Collaborative
Authors:
Elena Casiraghi,
Rachel Wong,
Margaret Hall,
Ben Coleman,
Marco Notaro,
Michael D. Evans,
Jena S. Tronieri,
Hannah Blau,
Bryan Laraway,
Tiffany J. Callahan,
Lauren E. Chan,
Carolyn T. Bramante,
John B. Buse,
Richard A. Moffitt,
Til Sturmer,
Steven G. Johnson,
Yu Raymond Shao,
Justin Reese,
Peter N. Robinson,
Alberto Paccanaro,
Giorgio Valentini,
Jared D. Huling,
Kenneth Wilkins,
:,
Tell Bennet
, et al. (12 additional authors not shown)
Abstract:
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been propose…
▽ More
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been proposed to attempt to recover the missing information. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithms works best in a given scenario. Furthermore, the selection of each algorithm parameters and data-related modelling choices are also both crucial and challenging. In this paper, we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. The experiments presented here show that our approach could effectively highlight the most valid and performant missing-data handling strategy for our case study. Moreover, our methodology allowed us to gain an understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.
△ Less
Submitted 25 September, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3. Summary of the variability processing and analysis
Authors:
L. Eyer,
M. Audard,
B. Holl,
L. Rimoldini,
M. I. Carnerero,
G. Clementini,
J. De Ridder,
E. Distefano,
D. W. Evans,
P. Gavras,
R. Gomel,
T. Lebzelter,
G. Marton,
N. Mowlavi,
A. Panahi,
V. Ripepi,
L. Wyrzykowski,
K. Nienartowicz,
G. Jevardat de Fombelle,
I. Lecoeur-Taibi,
L. Rohrbasser,
M. Riello,
P. Garcia-Lario,
A. C. Lanzafame,
T. Mazeh
, et al. (33 additional authors not shown)
Abstract:
Context. Gaia has been in operations since 2014. The third Gaia data release expands from the early data release (EDR3) in 2020 by providing 34 months of multi-epoch observations that allowed us to probe, characterise and classify systematically celestial variable phenomena.
Aims. We present a summary of the variability processing and analysis of the photometric and spectroscopic time series of…
▽ More
Context. Gaia has been in operations since 2014. The third Gaia data release expands from the early data release (EDR3) in 2020 by providing 34 months of multi-epoch observations that allowed us to probe, characterise and classify systematically celestial variable phenomena.
Aims. We present a summary of the variability processing and analysis of the photometric and spectroscopic time series of 1.8 billion sources done for Gaia DR3.
Methods. We used statistical and Machine Learning methods to characterise and classify the variable sources. Training sets were built from a global revision of major published variable star catalogues. For a subset of classes, specific detailed studies were conducted to confirm their class membership and to derive parameters that are adapted to the peculiarity of the considered class.
Results. In total, 10.5 million objects are identified as variable in Gaia DR3 and have associated time series in G, GBP, and GRP and, in some cases, radial velocity time series. The DR3 variable sources subdivide into 9.5 million variable stars and 1 million Active Galactic Nuclei/Quasars. In addition, supervised classification identified 2.5 million galaxies thanks to spurious variability induced by the extent of these objects. The variability analysis output in the DR3 archive amounts to 17 tables containing a total of 365 parameters. We publish 35 types and sub-types of variable objects. For 11 variable types, additional specific object parameters are published. An overview of the estimated completeness and contamination of most variability classes is provided.
Conclusions. Thanks to Gaia we present the largest whole-sky variability analysis based on coherent photometric, astrometric, and spectroscopic data. Later Gaia data releases will more than double the span of time series and the number of observations, thus allowing for an even richer catalogue in the future.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Gaia DR3: Specific processing and validation of all-sky RR Lyrae and Cepheid stars -- The RR Lyrae sample
Authors:
G. Clementini,
V. Ripepi,
A. Garofalo,
R. Molinaro,
T. Muraveva,
S. Leccia,
L. Rimoldini,
B. Holl,
G. Jevardat de Fombelle,
P. Sartoretti,
O. Marchal,
M. Audard,
K. Nienartowicz,
R. Andrae,
M. Marconi,
L. Szabados,
D. W. Evans,
I. Lecoeur-Taibi,
N. Mowlavi,
I. Musella,
L. Eyer
Abstract:
Gaia DR3 publishes a catalogue of full-sky RR Lyrae stars (RRLs) observed during the initial 34 months of science operations, that were processed through the Specific Object Study (SOS) pipeline for Cepheids and RRLs (SOS Cep&RRL) observed by Gaia. The SOS Cep&RRL validation of DR3 candidate RRLs relies on tools that include the Period (P) G-amplitude diagram and the P-phi21 and -phi31 parameters…
▽ More
Gaia DR3 publishes a catalogue of full-sky RR Lyrae stars (RRLs) observed during the initial 34 months of science operations, that were processed through the Specific Object Study (SOS) pipeline for Cepheids and RRLs (SOS Cep&RRL) observed by Gaia. The SOS Cep&RRL validation of DR3 candidate RRLs relies on tools that include the Period (P) G-amplitude diagram and the P-phi21 and -phi31 parameters of the G light curve Fourier decomposition, based on a sample of bona fide known RRLs (Gold Sample). The SOS processing led to a catalogue of 271779 RRLs listed in the vari_rrlyrae table of DR3. By drop** sources that clearly are contaminants, or have an uncertain classification we produce the final catalogue of SOS-confirmed DR3 RRLs containing 270905 sources (174947 fundamental mode, 93952 first overtone and 2006 double-mode RRLs) confirmed and fully characterised by the SOS Cep&RRL pipeline. They are distributed all over the sky, including 95 globular clusters and 25 Milky Way companions. RVS time series radial velocities are also published for 1096 RRLs and 799 Cepheids. Of the 270905 DR3 RRLs, 200294 are already known in the literature and 70611 are, to the best of our knowledge, new discoveries by Gaia. An estimate of the interstellar absorption is published for 142660 fundamental-mode RRLs from a relation based on the G-band amplitude and the pulsation period. Metallicities derived from the Periods and the phi31 Fourier parameters of the G-light curves are also released for 133559 RRLs. The final Gaia DR3 catalogue of confirmed RRLs almost doubles the DR2 RRLs catalogue. An increase of statistical significance, a better characterization of the RRLs pulsational and astrophysical parameters, and the improved astrometry published with Gaia EDR3, make the SOS Cep&RRL DR3 sample, the largest, most homogeneous and parameter-rich catalogue of All-Sky RRLs published so far.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: The Galaxy in your preferred colours. Synthetic photometry from Gaia low-resolution spectra
Authors:
Gaia Collaboration,
P. Montegriffo,
M. Bellazzini,
F. De Angeli,
R. Andrae,
M. A. Barstow,
D. Bossini,
A. Bragaglia,
P. W. Burgess,
C. Cacciari,
J. M. Carrasco,
N. Chornay,
L. Delchambre,
D. W. Evans,
M. Fouesneau,
Y. Fremat,
D. Garabato,
C. Jordi,
M. Manteiga,
D. Massari,
L. Palaversa,
E. Pancino,
M. Riello,
D. Ruz Mieres,
N. Sanna
, et al. (5 additional authors not shown)
Abstract:
Gaia Data Release 3 provides novel flux-calibrated low-resolution spectrophotometry for about 220 million sources in the wavelength range 330nm - 1050nm (XP spectra). Synthetic photometry directly tied to a flux in physical units can be obtained from these spectra for any passband fully enclosed in this wavelength range. We describe how synthetic photometry can be obtained from XP spectra, illustr…
▽ More
Gaia Data Release 3 provides novel flux-calibrated low-resolution spectrophotometry for about 220 million sources in the wavelength range 330nm - 1050nm (XP spectra). Synthetic photometry directly tied to a flux in physical units can be obtained from these spectra for any passband fully enclosed in this wavelength range. We describe how synthetic photometry can be obtained from XP spectra, illustrating the performance that can be achieved under a range of different conditions - for example passband width and wavelength range - as well as the limits and the problems affecting it. Existing top-quality photometry can be reproduced within a few per cent over a wide range of magnitudes and colour, for wide and medium bands, and with up to millimag accuracy when synthetic photometry is standardised with respect to these external sources. Some examples of potential scientific application are presented, including the detection of multiple populations in globular clusters, the estimation of metallicity extended to the very metal-poor regime, and the classification of white dwarfs. A catalogue providing standardised photometry for ~220 million sources in several wide bands of widely used photometric systems is provided (Gaia Synthetic Photometry Catalogue; GSPC) as well as a catalogue of $\simeq 10^5$ white dwarfs with DA/non-DA classification obtained with a Random Forest algorithm (Gaia Synthetic Photometry Catalogue for White Dwarfs; GSPC-WD).
△ Less
Submitted 10 January, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia DR3: Specific processing and validation of all-sky RR Lyrae and Cepheid stars -- The Cepheid sample
Authors:
V. Ripepi,
G. Clementini,
R. Molinaro,
S. Leccia,
E. Plachy,
L. Molnár,
L. Rimoldini,
I. Musella,
M. Marconi,
A. Garofalo,
M. Audard,
B. Holl,
D. W. Evans,
G. Jevardat de Fombelle,
I. Lecoeur-Taibi,
O. Marchal,
N. Mowlavi,
T. Muraveva,
K. Nienartowicz,
P. Sartoretti,
L. Szabados,
L. Eyer
Abstract:
Context. Cepheids are pulsating stars that play a crucial role in several astrophysical contexts. Among the different types, the Classical Cepheids are fundamental tools for the calibration of the extragalactic distance ladder. They are also powerful stellar population tracers in the context of Galactic studies. The Gaia Third Data Release (DR3) publishes improved data on Cepheids collected during…
▽ More
Context. Cepheids are pulsating stars that play a crucial role in several astrophysical contexts. Among the different types, the Classical Cepheids are fundamental tools for the calibration of the extragalactic distance ladder. They are also powerful stellar population tracers in the context of Galactic studies. The Gaia Third Data Release (DR3) publishes improved data on Cepheids collected during the initial 34 months of operations. Aims. We present the Gaia DR3 catalogue of Cepheids of all types, obtained through the analysis carried out with the Specific Object Study (SOS) Cep&RRL pipeline. Methods. We discuss the procedures adopted to clean the Cepheid sample from spurious objects, to validate the results, and to re-classify sources with a wrong outcome from the SOS Cep&RRL pipeline. Results. The Gaia DR3 includes multi-band time-series photometry and characterisation by the SOS Cep&RRL pipeline for a sample of 15,006 Cepheids of all types. The sample includes 4,663, 4,616, 321 and 185 pulsators, distributed in the LMC, SMC, M31 and M33, respectively, as well as 5 221 objects in the remaining All Sky sub-region which includes stars in the MW field/clusters and in a number of small satellites of our Galaxy. Among this sample, 327 objects were known as variable stars in the literature but with a different classification, while, to the best of our knowledge, 474 stars have not been reported before to be variable stars and therefore they likely are new Cepheids discovered by Gaia.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Map** the asymmetric disc of the Milky Way
Authors:
Gaia Collaboration,
R. Drimmel,
M. Romero-Gomez,
L. Chemin,
P. Ramos,
E. Poggio,
V. Ripepi,
R. Andrae,
R. Blomme,
T. Cantat-Gaudin,
A. Castro-Ginard,
G. Clementini,
F. Figueras,
M. Fouesneau,
Y. Fremat,
K. Jardine,
S. Khanna,
A. Lobel,
D. J. Marshall,
T. Muraveva,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou
, et al. (431 additional authors not shown)
Abstract:
With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provid…
▽ More
With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provided in Gaia DR3, we select various stellar populations to explore and identify non-axisymmetric features in the disc of the Milky Way in both configuration and velocity space. Using more about 580 thousand sources identified as hot OB stars, together with 988 known open clusters younger than 100 million years, we map the spiral structure associated with star formation 4-5 kpc from the Sun. We select over 2800 Classical Cepheids younger than 200 million years, which show spiral features extending as far as 10 kpc from the Sun in the outer disc. We also identify more than 8.7 million sources on the red giant branch (RGB), of which 5.7 million have line-of-sight velocities, allowing the velocity field of the Milky Way to be mapped as far as 8 kpc from the Sun, including the inner disc. The spiral structure revealed by the young populations is consistent with recent results using Gaia EDR3 astrometry and source lists based on near infrared photometry, showing the Local (Orion) arm to be at least 8 kpc long, and an outer arm consistent with what is seen in HI surveys, which seems to be a continuation of the Perseus arm into the third quadrant. Meanwhile, the subset of RGB stars with velocities clearly reveals the large scale kinematic signature of the bar in the inner disc, as well as evidence of streaming motions in the outer disc that might be associated with spiral arms or bar resonances. (abridged)
△ Less
Submitted 5 August, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data
Authors:
P. Montegriffo,
F. De Angeli,
R. Andrae,
M. Riello,
E. Pancino,
N. Sanna,
M. Bellazzini,
D. W. Evans,
J. M. Carrasco,
R. Sordo,
G. Busso,
C. Cacciari,
C. Jordi,
F. van Leeuwen,
A. Vallenari,
G. Altavilla,
M. A. Barstow,
A. G. A. Brown,
P. W. Burgess,
M. Castellani,
S. Cowell,
M. Davidson,
F. De Luise,
L. Delchambre,
C. Diener
, et al. (24 additional authors not shown)
Abstract:
Context. Gaia Data Release 3 contains astrometry and photometry results for about 1.8 billion sources based on observations collected by the European Space Agency (ESA) Gaia satellite during the first 34 months of its operational phase (the same period covered Gaia early Data Release 3; Gaia EDR3). Low-resolution spectra for 220 million sources are one of the important new data products included i…
▽ More
Context. Gaia Data Release 3 contains astrometry and photometry results for about 1.8 billion sources based on observations collected by the European Space Agency (ESA) Gaia satellite during the first 34 months of its operational phase (the same period covered Gaia early Data Release 3; Gaia EDR3). Low-resolution spectra for 220 million sources are one of the important new data products included in this release.
Aims. In this paper, we focus on the external calibration of low-resolution spectroscopic content, describing the input data, algorithms, data processing, and the validation of the results. Particular attention is given to the quality of the data and to a number of features that users may need to take into account to make the best use of the catalogue.
Methods. We calibrated an instrument model to relate mean Gaia spectra to the corresponding spectral energy distributions using an extended set of calibrators: this includes modelling of the instrument dispersion relation, transmission, and line spread functions. Optimisation of the model is achieved through total least-squares regression, accounting for errors in Gaia and external spectra.
Results. The resulting instrument model can be used for forward modelling of Gaia spectra or for inverse modelling of externally calibrated spectra in absolute flux units.
Conclusions. The absolute calibration derived in this paper provides an essential ingredient for users of BP/RP spectra. It allows users to connect BP/RP spectra to absolute fluxes and physical wavelengths.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Processing and validation of BP/RP low-resolution spectral data
Authors:
F. De Angeli,
M. Weiler,
P. Montegriffo,
D. W. Evans,
M. Riello,
R. Andrae,
J. M. Carrasco,
G. Busso,
P. W. Burgess,
C. Cacciari,
M. Davidson,
D. L. Harrison,
S. T. Hodgkin,
C. Jordi,
P. J. Osborne,
E. Pancino,
G. Altavilla,
M. A. Barstow,
C. A. L. Bailer-Jones,
M. Bellazzini,
A. G. A. Brown,
M. Castellani,
S. Cowell,
L. Delchambre,
F. De Luise
, et al. (29 additional authors not shown)
Abstract:
(Abridged) Blue (BP) and Red (RP) Photometer low-resolution spectral data is one of the exciting new products in Gaia Data Release 3 (Gaia DR3). We calibrate about 65 billion individual transit spectra onto the same mean BP/RP instrument through a series of calibration steps, including background subtraction, calibration of the CCD geometry and an iterative procedure for the calibration of CCD eff…
▽ More
(Abridged) Blue (BP) and Red (RP) Photometer low-resolution spectral data is one of the exciting new products in Gaia Data Release 3 (Gaia DR3). We calibrate about 65 billion individual transit spectra onto the same mean BP/RP instrument through a series of calibration steps, including background subtraction, calibration of the CCD geometry and an iterative procedure for the calibration of CCD efficiency as well as variations of the line-spread function and dispersion across the focal plane and in time. The calibrated transit spectra are then combined for each source in terms of an expansion into continuous basis functions. Time-averaged mean spectra covering the optical to near-infrared wavelength range [330, 1050] nm are published for approximately 220 million objects. Most of these are brighter than G = 17.65 but some BP/RP spectra are published for sources down to G = 21.43. Their signal- to-noise ratio varies significantly over the wavelength range covered and with magnitude and colour of the observed objects, with sources around G = 15 having S/N above 100 in some wavelength ranges. The top-quality BP/RP spectra are achieved for sources with magnitudes 9 < G < 12, having S/N reaching 1000 in the central part of the RP wavelength range. Scientific validation suggests that the internal calibration was generally successful. However, there is some evidence for imperfect calibrations at the bright end G < 11, where calibrated BP/RP spectra can exhibit systematic flux variations that exceed their estimated flux uncertainties. We also report that due to long-range noise correlations, BP/RP spectra can exhibit wiggles when sampled in pseudo-wavelength.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Microlensing Events from All Over the Sky
Authors:
Łukasz Wyrzykowski,
K. Kruszyńska,
K. A. Rybicki,
B. Holl,
I. Lecøe ur-Taïbi,
N. Mowlavi,
K. Nienartowicz,
G. Jevardat de Fombelle,
L. Rimoldini,
M. Audard,
P. Garcia-Lario,
P. Gavras,
D. W. Evans,
S. T. Hodgkin,
L. Eyer
Abstract:
Context: One of the rarest types of variability is the phenomenon of gravitational microlensing, a transient brightening of a background star due to an intervening lensing object. Microlensing is a powerful tool in studying the invisible or otherwise undetectable populations in the Milky Way, including planets and black holes. Aims: We describe the first Gaia catalogue of microlensing event candid…
▽ More
Context: One of the rarest types of variability is the phenomenon of gravitational microlensing, a transient brightening of a background star due to an intervening lensing object. Microlensing is a powerful tool in studying the invisible or otherwise undetectable populations in the Milky Way, including planets and black holes. Aims: We describe the first Gaia catalogue of microlensing event candidates, give an overview of its content and discuss its validation. Methods: The catalogue of Gaia microlensing events is composed by analysing the light curves of around 2 billion sources of Gaia Data Release 3 from all over the sky covering 34 months between 2014 and 2017. Results: We present 363 Gaia microlensing events and discuss their properties. Ninety events were never reported before and were not discovered by other surveys. The contamination of the catalogue is assessed to 0.6-1.7%.
△ Less
Submitted 26 October, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Pulsations in main sequence OBAF-type stars
Authors:
Gaia Collaboration,
J. De Ridder,
V. Ripepi,
C. Aerts,
L. Palaversa,
L. Eyer,
B. Holl,
M. Audard,
L. Rimoldini,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux,
M. Biermann,
O. L. Creevey,
C. Ducourant,
D. W. Evans,
R. Guerra,
A. Hutton,
C. Jordi,
S. A. Klioner,
U. L. Lammers,
L. Lindegren
, et al. (423 additional authors not shown)
Abstract:
The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), del…
▽ More
The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), delta Sct, and gamma Dor stars. These stars are often multi-periodic and display low amplitudes, making them challenging targets to analyse with sparse time series. All datasets used in this analysis are part of the Gaia DR3 data release. The photometric time series were used to perform a Fourier analysis, while the global astrophysical parameters necessary for the empirical instability strips were taken from the Gaia DR3 gspphot tables, and the vsini data were taken from the Gaia DR3 esphs tables. We show that for nearby OBAF-type pulsators, the Gaia DR3 data are precise and accurate enough to pinpoint them in the Hertzsprung-Russell diagram. We find empirical instability strips covering broader regions than theoretically predicted. In particular, our study reveals the presence of fast rotating gravity-mode pulsators outside the strips, as well as the co-existence of rotationally modulated variables inside the strips as reported before in the literature. We derive an extensive period-luminosity relation for delta Sct stars and provide evidence that the relation features different regimes depending on the oscillation period. Finally, we demonstrate how stellar rotation attenuates the amplitude of the dominant oscillation mode of delta Sct stars.
△ Less
Submitted 16 August, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: A Golden Sample of Astrophysical Parameters
Authors:
Gaia Collaboration,
O. L. Creevey,
L. M. Sarro,
A. Lobel,
E. Pancino,
R. Andrae,
R. L. Smart,
G. Clementini,
U. Heiter,
A. J. Korn,
M. Fouesneau,
Y. Frémat,
F. De Angeli,
A. Vallenari,
D. L. Harrison,
F. Thévenin,
C. Reylé,
R. Sordo,
A. Garofalo,
A. G. A. Brown,
L. Eyer,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux
, et al. (423 additional authors not shown)
Abstract:
Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples…
▽ More
Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples of the stars of interest. We validate our results by using the Gaia catalogue itself and by comparison with external data. We have produced six homogeneous samples of stars with high quality astrophysical parameters across the HR diagram for the community to exploit. We first focus on three samples that span a large parameter space: young massive disk stars (~3M), FGKM spectral type stars (~3M), and UCDs (~20K). We provide these sources along with additional information (either a flag or complementary parameters) as tables that are made available in the Gaia archive. We furthermore identify 15740 bone fide carbon stars, 5863 solar-analogues, and provide the first homogeneous set of stellar parameters of the Spectro Photometric Standard Stars. We use a subset of the OBA sample to illustrate its usefulness to analyse the Milky Way rotation curve. We then use the properties of the FGKM stars to analyse known exoplanet systems. We also analyse the ages of some unseen UCD-companions to the FGKM stars. We additionally predict the colours of the Sun in various passbands (Gaia, 2MASS, WISE) using the solar-analogue sample.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: The extragalactic content
Authors:
Gaia Collaboration,
C. A. L. Bailer-Jones,
D. Teyssier,
L. Delchambre,
C. Ducourant,
D. Garabato,
D. Hatzidimitriou,
S. A. Klioner,
L. Rimoldini,
I. Bellas-Velidis,
R. Carballo,
M. I. Carnerero,
C. Diener,
M. Fouesneau,
L. Galluccio,
P. Gavras,
A. Krone-Martins,
C. M. Raiteri,
R. Teixeira,
A. G. A. Brown,
A. Vallenari,
T. Prusti,
J. H. J. de Bruijne,
F. Arenou,
C. Babusiaux
, et al. (422 additional authors not shown)
Abstract:
The Gaia Galactic survey mission is designed and optimized to obtain astrometry, photometry, and spectroscopy of nearly two billion stars in our Galaxy. Yet as an all-sky multi-epoch survey, Gaia also observes several million extragalactic objects down to a magnitude of G~21 mag. Due to the nature of the Gaia onboard selection algorithms, these are mostly point-source-like objects. Using data prov…
▽ More
The Gaia Galactic survey mission is designed and optimized to obtain astrometry, photometry, and spectroscopy of nearly two billion stars in our Galaxy. Yet as an all-sky multi-epoch survey, Gaia also observes several million extragalactic objects down to a magnitude of G~21 mag. Due to the nature of the Gaia onboard selection algorithms, these are mostly point-source-like objects. Using data provided by the satellite, we have identified quasar and galaxy candidates via supervised machine learning methods, and estimate their redshifts using the low resolution BP/RP spectra. We further characterise the surface brightness profiles of host galaxies of quasars and of galaxies from pre-defined input lists. Here we give an overview of the processing of extragalactic objects, describe the data products in Gaia DR3, and analyse their properties. Two integrated tables contain the main results for a high completeness, but low purity (50-70%), set of 6.6 million candidate quasars and 4.8 million candidate galaxies. We provide queries that select purer sub-samples of these containing 1.9 million probable quasars and 2.9 million probable galaxies (both 95% purity). We also use high quality BP/RP spectra of 43 thousand high probability quasars over the redshift range 0.05-4.36 to construct a composite quasar spectrum spanning restframe wavelengths from 72-100 nm.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Stellar multiplicity, a teaser for the hidden treasure
Authors:
Gaia Collaboration,
F. Arenou,
C. Babusiaux,
M. A. Barstow,
S. Faigler,
A. Jorissen,
P. Kervella,
T. Mazeh,
N. Mowlavi,
P. Panuzzo,
J. Sahlmann,
S. Shahaf,
A. Sozzetti,
N. Bauchet,
Y. Damerdji,
P. Gavras,
P. Giacobbe,
E. Gosset,
J. -L. Halbwachs,
B. Holl,
M. G. Lattanzi,
N. Leclerc,
T. Morel,
D. Pourbaix,
P. Re Fiorentin
, et al. (425 additional authors not shown)
Abstract:
The Gaia DR3 Catalogue contains for the first time about eight hundred thousand solutions with either orbital elements or trend parameters for astrometric, spectroscopic and eclipsing binaries, and combinations of them. This paper aims to illustrate the huge potential of this large non-single star catalogue. Using the orbital solutions together with models of the binaries, a catalogue of tens of t…
▽ More
The Gaia DR3 Catalogue contains for the first time about eight hundred thousand solutions with either orbital elements or trend parameters for astrometric, spectroscopic and eclipsing binaries, and combinations of them. This paper aims to illustrate the huge potential of this large non-single star catalogue. Using the orbital solutions together with models of the binaries, a catalogue of tens of thousands of stellar masses, or lower limits, partly together with consistent flux ratios, has been built. Properties concerning the completeness of the binary catalogues are discussed, statistical features of the orbital elements are explained and a comparison with other catalogues is performed. Illustrative applications are proposed for binaries across the H-R diagram. The binarity is studied in the RGB/AGB and a search for genuine SB1 among long-period variables is performed. The discovery of new EL CVn systems illustrates the potential of combining variability and binarity catalogues. Potential compact object companions are presented, mainly white dwarf companions or double degenerates, but one candidate neutron star is also presented. Towards the bottom of the main sequence, the orbits of previously-suspected binary ultracool dwarfs are determined and new candidate binaries are discovered. The long awaited contribution of Gaia to the analysis of the substellar regime shows the brown dwarf desert around solar-type stars using true, rather than minimum, masses, and provides new important constraints on the occurrence rates of substellar companions to M dwarfs. Several dozen new exoplanets are proposed, including two with validated orbital solutions and one super-Jupiter orbiting a white dwarf, all being candidates requiring confirmation. Beside binarity, higher order multiple systems are also found.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: The Gaia Andromeda Photometric Survey
Authors:
D. W. Evans,
L. Eyer,
G. Busso,
M. Riello,
F. De Angeli,
P. W. Burgess,
M. Audard,
G. Clementini,
A. Garofalo,
B. Holl,
G. Jevardat de Fombelle,
A. C. Lanzafame,
I. Lecoeur-Taibi,
N. Mowlavi,
K. Nienartowicz,
L. Palaversa,
L. Rimoldini
Abstract:
Context. As part of Gaia Data Release 3 (Gaia DR3), epoch photometry has been released for 1.2 million sources centred on M31. This is a taster for Gaia Data Release 4 where all the epoch photometry will be released. Aims. In this paper the content of the Gaia Andromeda Photometric Survey is described, including statistics to assess the quality of the data. Known issues with the photometry are als…
▽ More
Context. As part of Gaia Data Release 3 (Gaia DR3), epoch photometry has been released for 1.2 million sources centred on M31. This is a taster for Gaia Data Release 4 where all the epoch photometry will be released. Aims. In this paper the content of the Gaia Andromeda Photometric Survey is described, including statistics to assess the quality of the data. Known issues with the photometry are also outlined. Methods. Methods are given to improve interpretation of the photometry, in particular, a method for error renormalization. Also, use of correlations between the three photometric passbands allows clearer identification of variables that is not affected by false detections caused by systematic effects. Results. The Gaia Andromeda Photometric Survey presents a unique opportunity to look at Gaia epoch photometry that has not been preselected due to variability. This allows investigations to be carried out that can be applied to the rest of the sky using the mean source results. Additionally scientific studies of variability can be carried out on M31 and the Milky Way in general.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3: Chemical cartography of the Milky Way
Authors:
Gaia Collaboration,
A. Recio-Blanco,
G. Kordopatis,
P. de Laverny,
P. A. Palicio,
A. Spagna,
L. Spina,
D. Katz,
P. Re Fiorentin,
E. Poggio,
P. J. McMillan,
A. Vallenari,
M. G. Lattanzi,
G. M. Seabroke,
L. Casamiquela,
A. Bragaglia,
T. Antoja,
C. A. L. Bailer-Jones,
R. Andrae,
M. Fouesneau,
M. Cropper,
T. Cantat-Gaudin,
U. Heiter,
A. Bijaoui,
A. G. A. Brown
, et al. (425 additional authors not shown)
Abstract:
Gaia DR3 opens a new era of all-sky spectral analysis of stellar populations thanks to the nearly 5.6 million stars observed by the RVS and parametrised by the GSP-spec module. The all-sky Gaia chemical cartography allows a powerful and precise chemo-dynamical view of the Milky Way with unprecedented spatial coverage and statistical robustness. First, it reveals the strong vertical symmetry of the…
▽ More
Gaia DR3 opens a new era of all-sky spectral analysis of stellar populations thanks to the nearly 5.6 million stars observed by the RVS and parametrised by the GSP-spec module. The all-sky Gaia chemical cartography allows a powerful and precise chemo-dynamical view of the Milky Way with unprecedented spatial coverage and statistical robustness. First, it reveals the strong vertical symmetry of the Galaxy and the flared structure of the disc. Second, the observed kinematic disturbances of the disc -- seen as phase space correlations -- and kinematic or orbital substructures are associated with chemical patterns that favour stars with enhanced metallicities and lower [alpha/Fe] abundance ratios compared to the median values in the radial distributions. This is detected both for young objects that trace the spiral arms and older populations. Several alpha, iron-peak elements and at least one heavy element trace the thin and thick disc properties in the solar cylinder. Third, young disc stars show a recent chemical impoverishment in several elements. Fourth, the largest chemo-dynamical sample of open clusters analysed so far shows a steepening of the radial metallicity gradient with age, which is also observed in the young field population. Finally, the Gaia chemical data have the required coverage and precision to unveil galaxy accretion debris and heated disc stars on halo orbits through their [alpha/Fe] ratio, and to allow the study of the chemo-dynamical properties of globular clusters. Gaia DR3 chemo-dynamical diagnostics open new horizons before the era of ground-based wide-field spectroscopic surveys. They unveil a complex Milky Way that is the outcome of an eventful evolution, sha** it to the present day (abridged).
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Gaia Data Release 3. Rotational modulation and patterns of colour variations in solar-like variables
Authors:
E. Distefano,
A. C. Lanzafame,
E. Brugaletta,
B. Holl,
A. F. Lanza,
S. Messina,
I. Pagano,
M. Audard,
G. Jevardat De Fombelle,
I. Lecoeur-Taibi,
N. Mowlavi,
K. Nienartowicz,
L. Rimoldini,
D. W. Evans,
M. Riello,
P. Garcia-Lario,
P. Gavras,
L. Eyer
Abstract:
The Gaia third Data Release (DR3) presents a catalogue of 474\,026 stars with variability induced by magnetic activity. For each star, the catalogue provides a list of about 70 parameters among which the most important are the stellar rotation period $P$, the photometric amplitude $A$ of the rotational signal and the Pearson Correlation Coefficient $r_0$ between brightness and magnitude variations…
▽ More
The Gaia third Data Release (DR3) presents a catalogue of 474\,026 stars with variability induced by magnetic activity. For each star, the catalogue provides a list of about 70 parameters among which the most important are the stellar rotation period $P$, the photometric amplitude $A$ of the rotational signal and the Pearson Correlation Coefficient $r_0$ between brightness and magnitude variations. The Specific Objects Study (SOS) pipeline, developed to characterise magnetically active stars with Gaia Data, has been described in the paper accompanying the Gaia second Data Release. Here we describe the changes made to the pipeline and a new method developed to analyze Gaia time-series and to reveal spurious signals induced by instrumental effects or by the peculiar nature of the investigated stellar source. The period-amplitude diagram obtained with the DR3 data confirms the bimodal distribution of fast rotating stars seen in the DR2 release. The DR3 data permitted, for the first time, to analyze the patterns of magnitude-color variations for thousands of magnetically active stars. The measured $r_0$ values are tightly correlated with the stars position in the period-amplitude diagram. The relationship between the $P$, $A$ and $r_0$ parameters inferred for thousands of stars could be very useful to improve the understanding of stellar magnetic fields and to improve theoretical models. The method developed to reveal the spurious signals can be applied to each of the released Gaia photometric time-series and can be exploited by anyone interested in working directly with Gaia time-series.
△ Less
Submitted 8 November, 2022; v1 submitted 11 June, 2022;
originally announced June 2022.
-
Memorization in NLP Fine-tuning Methods
Authors:
Fatemehsadat Mireshghallah,
Archit Uniyal,
Tianhao Wang,
David Evans,
Taylor Berg-Kirkpatrick
Abstract:
Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorizati…
▽ More
Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the "pre-train and fine-tune" paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.
△ Less
Submitted 3 November, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
The GAPS Programme at TNG: XXXVI. Measurement of the Rossiter-McLaughlin effect and revising the physical and orbital parameters of the HAT-P-15, HAT-P-17, HAT-P-21, HAT-P-26, HAT-P-29 eccentric planetary systems
Authors:
L. Mancini,
M. Esposito,
E. Covino,
J. Southworth,
E. Poretti,
G. Andreuzzi,
D. Barbato,
K. Biazzo,
L. Borsato,
I. Bruni,
M. Damasso,
L. Di Fabrizio,
D. F. Evans,
V. Granata,
A. F. Lanza,
L. Naponiello,
V. Nascimbeni,
M. Pinamonti,
A. Sozzetti,
J. Tregloan-Reed,
M. Basilicata,
A. Bignamini,
A. S. Bonomo,
R. Claudi,
R. Cosentino
, et al. (12 additional authors not shown)
Abstract:
Aim: We aim to refine the orbital and physical parameters and determine the sky-projected planet orbital obliquity of five eccentric transiting planetary systems: HAT-P-15, HAT-P-17, HAT-P-21, HAT-P-26, and HAT-P-29. Each of the systems hosts a hot Jupiter, except for HAT-P-26 which hosts a Neptune-mass planet. Methods: We observed transit events of these planets with the HARPS-N spectrograph, obt…
▽ More
Aim: We aim to refine the orbital and physical parameters and determine the sky-projected planet orbital obliquity of five eccentric transiting planetary systems: HAT-P-15, HAT-P-17, HAT-P-21, HAT-P-26, and HAT-P-29. Each of the systems hosts a hot Jupiter, except for HAT-P-26 which hosts a Neptune-mass planet. Methods: We observed transit events of these planets with the HARPS-N spectrograph, obtaining high-precision radial velocity measurements that allow us to measure the Rossiter-McLaughlin effect for each of the target systems. We used these new HARPS-N spectra and archival data, including those from Gaia, to better characterise the stellar atmospheric parameters. The photometric parameters for four of the hot Jupiters were recalculated using 17 new transit light curves, obtained with an array of medium-class telescopes, and data from the TESS space telescope. HATNet time-series photometric data were checked for the signatures of rotation periods of the target stars and their spin axis inclination. Results: From the analysis of the Rossiter-McLaughlin effect, we derived a sky-projected obliquity of 13, -26.3, -0.7, -26 degree for HAT-P-15b, HAT-P-17b, HAT-P-21b and HAT-P-29b, respectively. Due to the quality of the data, we were not able to well constrain the sky-projected obliquity for HAT-P-26b, although a prograde orbit is favoured. The stellar activity of HAT-P-21 indicates a rotation period of 15.88 days, which allowed us to determine its true misalignment angle (25 degree). Our new analysis of the physical parameters of the five exoplanetary systems returned values compatible with those existing in the literature. Using TESS and the available transit light curves, we reviewed the orbital ephemeris for the five systems and confirmed that the HAT-P-26 system shows transit timing variations, which may tentatively be attributed to the presence of a third body.
△ Less
Submitted 13 September, 2022; v1 submitted 21 May, 2022;
originally announced May 2022.
-
The Detection of Transiting Exoplanets by Gaia
Authors:
Aviad Panahi,
Shay Zucker,
Gisella Clementini,
Marc Audard,
Avraham Binnenfeld,
Felice Cusano,
Dafydd Wyn Evans,
Roy Gomel,
Berry Holl,
Ilya Ilyin,
Grégory Jevardat de Fombelle,
Tsevi Mazeh,
Nami Mowlavi,
Krzysztof Nienartowicz,
Lorenzo Rimoldini,
Sahar Shahaf,
Laurent Eyer
Abstract:
Context: The space telescope Gaia is dedicated mainly to performing high-precision astrometry, but also spectroscopy and epoch photometry which can be used to study various types of photometric variability. One such variability type is exoplanetary transits. The photometric data accumulated so far have finally matured enough to allow the detection of some exoplanets.
Aims: In order to fully expl…
▽ More
Context: The space telescope Gaia is dedicated mainly to performing high-precision astrometry, but also spectroscopy and epoch photometry which can be used to study various types of photometric variability. One such variability type is exoplanetary transits. The photometric data accumulated so far have finally matured enough to allow the detection of some exoplanets.
Aims: In order to fully exploit the scientific potential of Gaia, we search its photometric data for the signatures of exoplanetary transits.
Methods: The search relies on a version of the Box-Least-Square (BLS) method, applied to a set of stars prioritized by machine-learning classification methods. An independent photometric validation was obtained using the public full-frame images of TESS. In order to validate the first two candidates, radial-velocity follow-up observations were performed using the spectrograph PEPSI of the Large Binocular Telescope (LBT).
Results: The radial-velocity measurements confirm that two of the candidates are indeed hot Jupiters. Thus, they are the first exoplanets detected by Gaia - Gaia-1b and Gaia-2b.
Conclusions: Gaia-1b and Gaia-2b demonstrate that the approach presented in this paper is indeed effective. This approach will be used to assemble a set of additional exoplanet candidates, to be released in Gaia third data release, ensuring better fulfillment of the exoplanet detection potential of Gaia.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Synchronization and clustering in complex quadratic networks
Authors:
Anca Radulescu,
Danae Evans,
Amani-Daisa Augustin,
Anthony Cooper,
Johan Nakuci,
Sarah Muldoon
Abstract:
In continuation of prior work, we investigate ties between a network's connectivity and ensemble dynamics. This relationship is notoriously difficult to approach mathematically in natural, complex networks. In our work, we aim to understand it in a canonical framework, using complex quadratic node dynamics, coupled in networks which we call complex quadratic networks (CQNs).
After previously def…
▽ More
In continuation of prior work, we investigate ties between a network's connectivity and ensemble dynamics. This relationship is notoriously difficult to approach mathematically in natural, complex networks. In our work, we aim to understand it in a canonical framework, using complex quadratic node dynamics, coupled in networks which we call complex quadratic networks (CQNs).
After previously defining extensions of the Mandelbrot and Julia sets for networks, we currently focus on the behavior of the node-wise projections of these sets, and on defining and analyzing the phenomena of node clustering and synchronization. We investigate the mechanisms that lead to nodes exhibiting identical or different Mandelbrot set. We propose that clustering is strongly determined by the network connectivity patterns, with the geometry of these clusters further controlled by the connection weights. We then illustrate the concept of synchronization in an existing set of whole brain, tractography-based networks obtained from 197 human subjects using diffusion tensor imaging.
Synchronization and clustering are well-studied in the context of networks of oscillators, such as neural networks. Understanding the similarities to how these concepts apply to CQNs contributes to our understanding of universal principles in dynamic networks, and may help extend theoretical results to natural, complex systems.
△ Less
Submitted 12 September, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Strain driven conducting domain walls in a Mott insulator
Authors:
L. Puntigam,
M. Altthaler,
S. Ghara,
L. Prodan,
V. Tsurkan,
S. Krohns,
I. Kézsmárki,
D. M. Evans
Abstract:
Rewritable nanoelectronics offers new perspectives and potential to both fundamental research and technological applications. Such interest has driven the research focus into conducting domain walls: pseudo 2D conducting channels that can be created, positioned, and deleted in situ. However, the study of conductive domain walls is largely limited to wide-gap ferroelectrics, where the conductivity…
▽ More
Rewritable nanoelectronics offers new perspectives and potential to both fundamental research and technological applications. Such interest has driven the research focus into conducting domain walls: pseudo 2D conducting channels that can be created, positioned, and deleted in situ. However, the study of conductive domain walls is largely limited to wide-gap ferroelectrics, where the conductivity typically arises from changes in charge carrier density, due to screening charge accumulation at polar discontinuities. This work shows that, in narrow-gap correlated insulators with strong charge lattice coupling, local strain gradients can drive enhanced conductivity at the domain walls, removing polar discontinuities as a criteria for conductivity. By combining different scanning probe microscopy techniques, we demonstrate that the domain wall conductivity in GaV4S8 does not follow the established screening charge model but rather arises from the large surface reconstruction across the Jahn-Teller transition and the associated strain gradients across the domain walls. This mechanism can turn any structural, or even magnetic, domain wall conducting, if the electronic structure of the host is susceptible to local strain gradients, drastically expanding the range of materials and phenomena that may be applicable to domain wall based nanoelectronics.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Gaia Early Data Release 3: The celestial reference frame (Gaia-CRF3)
Authors:
Gaia Collaboration,
S. A. Klioner,
L. Lindegren,
F. Mignard,
J. Hernández,
M. Ramos-Lerate,
U. Bastian,
M. Biermann,
A. Bombrun,
A. de Torres,
E. Gerlach,
R. Geyer,
T. Hilger,
D. Hobbs,
U. L. Lammers,
P. J. McMillan,
H. Steidelmüller,
D. Teyssier,
C. M. Raiteri,
S. Bartolomé,
M. Bernet,
J. Castañeda,
M. Clotet,
M. Davidson,
C. Fabricius
, et al. (426 additional authors not shown)
Abstract:
Gaia-CRF3 is the celestial reference frame for positions and proper motions in the third release of data from the Gaia mission, Gaia DR3 (and for the early third release, Gaia EDR3, which contains identical astrometric results). The reference frame is defined by the positions and proper motions at epoch 2016.0 for a specific set of extragalactic sources in the (E)DR3 catalogue.
We describe the c…
▽ More
Gaia-CRF3 is the celestial reference frame for positions and proper motions in the third release of data from the Gaia mission, Gaia DR3 (and for the early third release, Gaia EDR3, which contains identical astrometric results). The reference frame is defined by the positions and proper motions at epoch 2016.0 for a specific set of extragalactic sources in the (E)DR3 catalogue.
We describe the construction of Gaia-CRF3, and its properties in terms of the distributions in magnitude, colour, and astrometric quality.
Compact extragalactic sources in Gaia DR3 were identified by positional cross-matching with 17 external catalogues of quasars (QSO) and active galactic nuclei (AGN), followed by astrometric filtering designed to remove stellar contaminants. Selecting a clean sample was favoured over including a higher number of extragalactic sources. For the final sample, the random and systematic errors in the proper motions are analysed, as well as the radio-optical offsets in position for sources in the third realisation of the International Celestial Reference Frame (ICRF3).
The Gaia-CRF3 comprises about 1.6 million QSO-like sources, of which 1.2 million have five-parameter astrometric solutions in Gaia DR3 and 0.4 million have six-parameter solutions. The sources span the magnitude range G = 13 to 21 with a peak density at 20.6 mag, at which the typical positional uncertainty is about 1 mas. The proper motions show systematic errors on the level of 12 $μ$as yr${}^{-1}$ on angular scales greater than 15 deg. For the 3142 optical counterparts of ICRF3 sources in the S/X frequency bands, the median offset from the radio positions is about 0.5 mas, but exceeds 4 mas in either coordinate for 127 sources. We outline the future of the Gaia-CRF in the next Gaia data releases.
△ Less
Submitted 30 October, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Resolving structural changes and symmetry lowering in spinel FeCr2S4
Authors:
Donald M. Evans,
Ola G. Grendal,
Lilian Prodan,
Maximilian Winkler,
Noah Winterhalter-Stocker,
Philipp Gegenwart,
Somnath Ghara,
Joachim Deisenhofer,
István Kézsmárki,
Vladimir Tsurkan
Abstract:
The cubic spinel FeCr2S4 has been receiving immense research interest because of its emergent phases and the interplay of spin, orbital and lattice degrees of freedom. Despite the intense research, several fundamental questions are yet to be answered, such as the refinement of the crystal structure in the different magnetic and orbital ordered phases. Here, using high-resolution synchrotron powder…
▽ More
The cubic spinel FeCr2S4 has been receiving immense research interest because of its emergent phases and the interplay of spin, orbital and lattice degrees of freedom. Despite the intense research, several fundamental questions are yet to be answered, such as the refinement of the crystal structure in the different magnetic and orbital ordered phases. Here, using high-resolution synchrotron powder diffraction on stoichiometric crystals of FeCr2S4 we resolved the long sought-after cubic to tetragonal transition at ~65 K, reducing the lattice symmetry to I41/amd. With further lowering the temperature, at ~9 K, the crystal structure becomes polar, hence the compound becomes multiferroic. The elucidation of the lattice symmetry throughout different phases of FeCr2S4 provides a basis for the understanding this enigmatic system and also highlights the importance of structural deformation in correlated materials.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Homogeneous Transit Timing Analyses of Ten Exoplanet Systems
Authors:
Ö. Baştürk,
E. M. Esmer,
S. Yalçınkaya,
Ş. Torun,
L. Mancini,
F. Helweh,
E. Karamanlı,
J. Southworth,
S. Aliş,
A. Wünsche,
F. Tezcan,
Y. Aladağ,
N. Aksaker,
E. Tunç,
F. Davoudi,
S. Fişek,
M. Bretton,
D. F. Evans,
C. Yeşilyaprak,
M. Yılmaz,
C. T. Tezcan,
K. Yelkenci
Abstract:
We study the transit timings of 10 exoplanets in order to investigate potential Transit Timing Variations (TTVs) in them. We model their available ground-based light curves, some presented here and others taken from the literature, and homogeneously measure the mid-transit times. We statistically compare our results with published values and find that the measurement errors agree. However, in term…
▽ More
We study the transit timings of 10 exoplanets in order to investigate potential Transit Timing Variations (TTVs) in them. We model their available ground-based light curves, some presented here and others taken from the literature, and homogeneously measure the mid-transit times. We statistically compare our results with published values and find that the measurement errors agree. However, in terms of recovering the possible frequencies, homogeneous sets can be found to be more useful, of which no statistically relevant example has been found for the planets in our study. We corrected the ephemeris information of all ten planets we studied and provide these most precise light elements as references for future transit observations with space-borne and ground-based instruments. We found no evidence for secular or periodic changes in the orbital periods of the planets in our sample, including the ultra-short period WASP-103 b, whose orbit is expected to decay on an observable timescale. Therefore, we derive the lower limits for the reduced tidal quality factors (Q$^{\prime}_{\star}$) for the host stars based on best fitting quadratic functions to their timing data. We also present a global model of all available data for WASP-74 b, which has a Gaia parallax-based distance value ~25% larger than the published value.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Dynamic control of ferroionic states in ferroelectric nanoparticles
Authors:
Anna N. Morozovska,
Sergei V. Kalinin,
Mykola E. Yelisieiev,
Jonghee Yang,
Mahshid Ahmadi,
Eugene. A. Eliseev,
Dean R. Evans
Abstract:
The polar states of uniaxial ferroelectric nanoparticles interacting with a surface system of electronic and ionic charges with a broad distribution of mobilities is explored, which corresponds to the experimental case of nanoparticles in solution or ambient conditions. The nonlinear interactions between the ferroelectric dipoles and surface charges with slow relaxation dynamics in an external fie…
▽ More
The polar states of uniaxial ferroelectric nanoparticles interacting with a surface system of electronic and ionic charges with a broad distribution of mobilities is explored, which corresponds to the experimental case of nanoparticles in solution or ambient conditions. The nonlinear interactions between the ferroelectric dipoles and surface charges with slow relaxation dynamics in an external field lead to the emergence of a broad range of paraelectric-like, antiferroelectric-like ionic, and ferroelectric-like ferroionic states. The crossover between these states can be controlled not only by the static characteristics of the surface charges, but also by their relaxation dynamics in the applied field. Obtained results are not only promising for advanced applications of ferroelectric nanoparticles in nanoelectronics and optoelectronics, they also offer strategies for experimental verification.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Higher amalgamation properties in measured structures
Authors:
David M. Evans
Abstract:
Using an infinitary version of the Hypergraph Removal Lemma due to Towsner, we prove a model-theoretic higher amalgamation result. In particular, we obtain an independent amalgamation property which holds in structures which are measurable in the sense of Macpherson and Steinhorn, but which is not generally true in structures which are supersimple of finite SU-rank. We use this to show that some o…
▽ More
Using an infinitary version of the Hypergraph Removal Lemma due to Towsner, we prove a model-theoretic higher amalgamation result. In particular, we obtain an independent amalgamation property which holds in structures which are measurable in the sense of Macpherson and Steinhorn, but which is not generally true in structures which are supersimple of finite SU-rank. We use this to show that some of Hrushovski's non-locally-modular, supersimple $ω$-categorical structures are not MS-measurable.
△ Less
Submitted 14 July, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
The Diamond (111) Surface Reconstruction and Epitaxial Graphene Interface
Authors:
B. P. Reed,
M. E. Bathen,
J. W. R. Ash,
C. J. Meara,
A. A. Zakharov,
J. P. Goss,
J. W. Wells,
D. A. Evans,
S. P. Cooil
Abstract:
The evolution of the diamond (111) surface as it undergoes reconstruction and subsequent graphene formation is investigated with angle-resolved photoemission spectroscopy, low energy electron diffraction, and complementary density functional theory calculations. The process is examined starting at the C(111)-(2x1) surface reconstruction that occurs following detachment of the surface adatoms at 92…
▽ More
The evolution of the diamond (111) surface as it undergoes reconstruction and subsequent graphene formation is investigated with angle-resolved photoemission spectroscopy, low energy electron diffraction, and complementary density functional theory calculations. The process is examined starting at the C(111)-(2x1) surface reconstruction that occurs following detachment of the surface adatoms at 920 °C, and continues through to the liberation of the reconstructed surface atoms into a free-standing monolayer of epitaxial graphene at temperatures above 1000 °C. Our results show that the C(111)-(2x1) surface is metallic as it has electronic states that intersect the Fermi-level. This is in strong agreement with a symmetrically π-bonded chain model and should contribute to resolving the controversies that exist in the literature surrounding the electronic nature of this surface. The graphene formed at higher temperatures exists above a newly formed C(111)-(2\times1) surface and appears to have little substrate interaction as the Dirac-point is observed at the Fermi-level. Finally, we demonstrate that it is possible to hydrogen terminate the underlying diamond surface by means of plasma processing without removing the graphene layer, forming a graphene-semiconductor interface. This could have particular relevance for do** the graphene formed on the diamond (111)surface via tuneable substrate interactions as a result of changing the terminating species at the diamond-graphene interface by plasma processing.
△ Less
Submitted 21 February, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Equivariant higher Dixmier-Douady Theory for circle actions on UHF-algebras
Authors:
David E. Evans,
Ulrich Pennig
Abstract:
We develop an equivariant Dixmier-Douady theory for locally trivial bundles of $C^*$-algebras with fibre $D \otimes \mathbb{K}$ equipped with a fibrewise $\mathbb{T}$-action, where $\mathbb{T}$ denotes the circle group and $D = \operatorname{End}\left(V\right)^{\otimes \infty}$ for a $\mathbb{T}$-representation $V$. In particular, we show that the group of $\mathbb{T}$-equivariant $*$-automorphism…
▽ More
We develop an equivariant Dixmier-Douady theory for locally trivial bundles of $C^*$-algebras with fibre $D \otimes \mathbb{K}$ equipped with a fibrewise $\mathbb{T}$-action, where $\mathbb{T}$ denotes the circle group and $D = \operatorname{End}\left(V\right)^{\otimes \infty}$ for a $\mathbb{T}$-representation $V$. In particular, we show that the group of $\mathbb{T}$-equivariant $*$-automorphisms $\operatorname{Aut}_{\mathbb{T}}(D \otimes \mathbb{K})$ is an infinite loop space giving rise to a cohomology theory $E^*_{D,\mathbb{T}}(X)$. Isomorphism classes of equivariant bundles then form a group with respect to the fibrewise tensor product that is isomorphic to $E^1_{D,\mathbb{T}}(X) \cong [X, B\operatorname{Aut}_{\mathbb{T}}(D \otimes \mathbb{K})]$. We compute this group for tori and compare the case $D = \mathbb{C}$ to the equivariant Brauer group for trivial actions on the base space.
△ Less
Submitted 23 November, 2023; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Cold Atoms in Space: Community Workshop Summary and Proposed Road-Map
Authors:
Ivan Alonso,
Cristiano Alpigiani,
Brett Altschul,
Henrique Araujo,
Gianluigi Arduini,
Jan Arlt,
Leonardo Badurina,
Antun Balaz,
Satvika Bandarupally,
Barry C Barish Michele Barone,
Michele Barsanti,
Steven Bass,
Angelo Bassi,
Baptiste Battelier,
Charles F. A. Baynham,
Quentin Beaufils,
Aleksandar Belic,
Joel Berge,
Jose Bernabeu,
Andrea Bertoldi,
Robert Bingham,
Sebastien Bize,
Diego Blas,
Kai Bongs,
Philippe Bouyer
, et al. (224 additional authors not shown)
Abstract:
We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, a…
▽ More
We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, and atom interferometers. Prospective applications include metrology, geodesy and measurement of terrestrial mass change due to, e.g., climate change, and fundamental science experiments such as tests of the equivalence principle, searches for dark matter, measurements of gravitational waves and tests of quantum mechanics. We review the current status of cold atom technologies and outline the requirements for their space qualification, including the development paths and the corresponding technical milestones, and identifying possible pathfinder missions to pave the way for missions to exploit the full potential of cold atoms in space. Finally, we present a first draft of a possible road-map for achieving these goals, that we propose for discussion by the interested cold atom, Earth Observation, fundamental physics and other prospective scientific user communities, together with ESA and national space and research funding agencies.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Optimal discharge of patients from intensive care via a data-driven policy learning framework
Authors:
Fernando Lejarza,
Jacob Calvert,
Misty M Attwood,
Daniel Evans,
Qingqing Mao
Abstract:
Clinical decision support tools rooted in machine learning and optimization can provide significant value to healthcare providers, including through better management of intensive care units. In particular, it is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay (and associated hospitalization costs) and the risk of readmission…
▽ More
Clinical decision support tools rooted in machine learning and optimization can provide significant value to healthcare providers, including through better management of intensive care units. In particular, it is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay (and associated hospitalization costs) and the risk of readmission or even death following the discharge decision. This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions given a patient's electronic health records. A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition. Based on this model and a given cost function, an infinite-horizon discounted Markov decision process is formulated and solved numerically to compute an optimal discharge policy, whose value is assessed using off-policy evaluation strategies. Extensive numerical experiments are performed to validate the proposed framework using real-life intensive care unit patient data.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Lectures on Lagrangian torus fibrations
Authors:
Jonathan David Evans
Abstract:
This is a book aimed at graduate students and researchers in symplectic geometry, based on a course I taught in 2019. The primary message is that the base of a Lagrangian torus fibration inherits an integral affine structure, which you can use to "read off" a lot of interesting geometry of the total space. Topics covered include: action-angle coordinates, symplectic reduction, toric manifolds, vis…
▽ More
This is a book aimed at graduate students and researchers in symplectic geometry, based on a course I taught in 2019. The primary message is that the base of a Lagrangian torus fibration inherits an integral affine structure, which you can use to "read off" a lot of interesting geometry of the total space. Topics covered include: action-angle coordinates, symplectic reduction, toric manifolds, visible and tropical Lagrangians, almost toric systems, Milnor fibres of cyclic quotient singularities, mutation of polygons, non-toric blow-up, an almost toric view on Lisca's classification of fillings of lens spaces, resolutions of cusp singularities, Markov triples and Vianna tori. The book ends with a short list of open problems. Throughout there is an emphasis on examples and there are some exercises with solutions.
△ Less
Submitted 28 October, 2022; v1 submitted 16 October, 2021;
originally announced October 2021.
-
First Leptophobic Dark Matter Search from Coherent CAPTAIN-Mills
Authors:
A. A. Aguilar-Arevalo,
D. S. M. Alves,
S. Biedron,
J. Boissevain,
M. Borrego,
M. Chavez-Estrada,
A. Chavez,
J. M. Conrad,
R. L. Cooper,
A. Diaz,
J. R. Distel,
J. C. D'Olivo,
E. Dunton,
B. Dutta,
A. Elliott,
D. Evans,
D. Fields,
J. Greenwood,
M. Gold,
J. Gordon,
E. Guarincerri,
E. C. Huang,
N. Kamp,
C. Kelsey,
K. Knickerbocker
, et al. (26 additional authors not shown)
Abstract:
We report the first results of a search for leptophobic dark matter (DM) from the Coherent CAPTAIN-Mills (CCM) liquid argon (LAr) detector. An engineering run with 120 photomultiplier tubes (PMTs) and $17.9 \times 10^{20}$ protons-on-target (POT) was performed in Fall 2019 to study the characteristics of the CCM detector. The operation of this 10-ton detector was strictly light-based with a thresh…
▽ More
We report the first results of a search for leptophobic dark matter (DM) from the Coherent CAPTAIN-Mills (CCM) liquid argon (LAr) detector. An engineering run with 120 photomultiplier tubes (PMTs) and $17.9 \times 10^{20}$ protons-on-target (POT) was performed in Fall 2019 to study the characteristics of the CCM detector. The operation of this 10-ton detector was strictly light-based with a threshold of 50 keV and used coherent elastic scattering off argon nuclei to detect DM. Despite only 1.5 months of accumulated luminosity, contaminated LAr, and non-optimized shielding, CCM's first engineering run already achieved sensitivity to previously unexplored parameter space of light dark matter (LDM) models with a baryonic vector portal. With an expected background of 115,005 events, we observe 115,005+16.5 events which is compatible with background expectations. For a benchmark mediator-to-dark matter mass ratio of $m_{_{V_B}}/m_χ=2.1$, DM masses within the range $9\,\text{MeV} \lesssim m_χ\lesssim 50\,\text{MeV}$ have been excluded at 90% C.L. in the leptophobic model after applying the Feldman-Cousins test statistic. CCM's upgraded run with 200 PMTs, filtered LAr, improved shielding, and ten times more POT will be able to exclude the remaining thermal relic density parameter space of this model, as well as probe new parameter space of other leptophobic DM models.
△ Less
Submitted 19 May, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Formalizing and Estimating Distribution Inference Risks
Authors:
Anshuman Suri,
David Evans
Abstract:
Distribution inference, sometimes called property inference, infers statistical properties about a training set from access to a model trained on that data. Distribution inference attacks can pose serious risks when models are trained on private data, but are difficult to distinguish from the intrinsic purpose of statistical machine learning -- namely, to produce models that capture statistical pr…
▽ More
Distribution inference, sometimes called property inference, infers statistical properties about a training set from access to a model trained on that data. Distribution inference attacks can pose serious risks when models are trained on private data, but are difficult to distinguish from the intrinsic purpose of statistical machine learning -- namely, to produce models that capture statistical properties about a distribution. Motivated by Yeom et al.'s membership inference framework, we propose a formal definition of distribution inference attacks that is general enough to describe a broad class of attacks distinguishing between possible training distributions. We show how our definition captures previous ratio-based property inference attacks as well as new kinds of attack including revealing the average node degree or clustering coefficient of a training graph. To understand distribution inference risks, we introduce a metric that quantifies observed leakage by relating it to the leakage that would occur if samples from the training distribution were provided directly to the adversary. We report on a series of experiments across a range of different distributions using both novel black-box attacks and improved versions of the state-of-the-art white-box attacks. Our results show that inexpensive attacks are often as effective as expensive meta-classifier attacks, and that there are surprising asymmetries in the effectiveness of attacks. Code is available at https://github.com/iamgroot42/FormEstDistRisks
△ Less
Submitted 5 July, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.