-
Are Pretrained Multilingual Models Equally Fair Across Languages?
Authors:
Laura Cabello Piqueras,
Anders Søgaard
Abstract:
Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under…
▽ More
Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt -- mBERT, XLM-R, and mT5 -- and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Challenges and Strategies in Cross-Cultural NLP
Authors:
Daniel Hershcovich,
Stella Frank,
Heather Lent,
Miryam de Lhoneux,
Mostafa Abdou,
Stephanie Brandl,
Emanuele Bugliarello,
Laura Cabello Piqueras,
Ilias Chalkidis,
Ruixiang Cui,
Constanza Fierro,
Katerina Margatina,
Phillip Rust,
Anders Søgaard
Abstract:
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogo…
▽ More
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
ORIGIN: Blind detection of faint emission line galaxies in MUSE datacubes
Authors:
David Mary,
Roland Bacon,
Simon Conseil,
Laure Piqueras,
Antony Schutz
Abstract:
One of the major science cases of the MUSE integral field spectrograph is the detection of Lyman-alpha emitters at high redshifts. The on-going and planned deep fields observations will allow for one large sample of these sources. An efficient tool to perform blind detection of faint emitters in MUSE datacubes is a prerequisite of such an endeavor.
Several line detection algorithms exist but the…
▽ More
One of the major science cases of the MUSE integral field spectrograph is the detection of Lyman-alpha emitters at high redshifts. The on-going and planned deep fields observations will allow for one large sample of these sources. An efficient tool to perform blind detection of faint emitters in MUSE datacubes is a prerequisite of such an endeavor.
Several line detection algorithms exist but their performance during the deepest MUSE exposures is hard to quantify, in particular with respect to their actual false detection rate, or purity. {The aim of this work is to design and validate} an algorithm that efficiently detects faint spatial-spectral emission signatures, while allowing for a stable false detection rate over the data cube and providing in the same time an automated and reliable estimation of the purity.
Results on simulated data cubes providing ground truth show that the method reaches its aims in terms of purity and completeness. When applied to the deep 30-hour exposure MUSE datacube in the Hubble Ultra Deep Field, the algorithms allows for the confirmed detection of 133 intermediate redshifts galaxies and 248 Lyman Alpha Emitters, including 86 sources with no HST counterpart.
The algorithm fulfills its aims in terms of detection power and reliability. It is consequently implemented as a Python package whose code and documentation are available on GitHub and readthedocs.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
MPDAF - A Python package for the analysis of VLT/MUSE data
Authors:
Laure Piqueras,
Simon Conseil,
Martin Shepherd,
Roland Bacon,
Floriane Leclercq,
Johan Richard
Abstract:
MUSE (Multi Unit Spectroscopic Explorer) is an integral-field spectrograph mounted on the Very Large Telescope (VLT) in Chile and made available to the European community since October 2014. The Centre de Recherche Astrophysique de Lyon has developed a dedicated software to help MUSE users analyze the reduced data. In this paper we introduce MPDAF, the MUSE Python Data Analysis Framework, based on…
▽ More
MUSE (Multi Unit Spectroscopic Explorer) is an integral-field spectrograph mounted on the Very Large Telescope (VLT) in Chile and made available to the European community since October 2014. The Centre de Recherche Astrophysique de Lyon has developed a dedicated software to help MUSE users analyze the reduced data. In this paper we introduce MPDAF, the MUSE Python Data Analysis Framework, based on several well-known Python libraries (Numpy, Scipy, Matplotlib, Astropy) which offers new tools to manipulate MUSE-specific data. We present different examples showing how this Python package may be useful for MUSE data analysis.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
The MUSE Hubble Ultra Deep Field Survey: I. Survey description, data reduction and source detection
Authors:
Roland Bacon,
Simon Conseil,
David Mary,
Jarle Brinchmann,
Martin Shepherd,
Mohammad Akhlaghi,
Peter M. Weilbacher,
Laure Piqueras,
Lutz Wisotzki,
David Lagattuta,
Benoit Epinat,
Adrien Guerou,
Hanae Inami,
Sebastiano Cantalupo,
Jean Baptiste Courbot,
Thierry Contini,
Johan Richard,
Michael Maseda,
Rychard Bouwens,
Nicolas Bouche,
Wolfram Kollatschny,
Joop Schaye,
Raffaella Anna Marino,
Roser Pello,
Christian Herenz
, et al. (2 additional authors not shown)
Abstract:
We present the MUSE Hubble Ultra Deep Survey, a mosaic of nine MUSE fields covering 90\% of the entire HUDF region with a 10-hour deep exposure time, plus a deeper 31-hour exposure in a single 1.15 arcmin2 field. The improved observing strategy and advanced data reduction results in datacubes with sub-arcsecond spatial resolution (0.65 arcsec at 7000 A) and accurate astrometry (0.07 arcsec rms). W…
▽ More
We present the MUSE Hubble Ultra Deep Survey, a mosaic of nine MUSE fields covering 90\% of the entire HUDF region with a 10-hour deep exposure time, plus a deeper 31-hour exposure in a single 1.15 arcmin2 field. The improved observing strategy and advanced data reduction results in datacubes with sub-arcsecond spatial resolution (0.65 arcsec at 7000 A) and accurate astrometry (0.07 arcsec rms). We compare the broadband photometric properties of the datacubes to HST photometry, finding a good agreement in zeropoint up to mAB=28 but with an increasing scatter for faint objects. We have investigated the noise properties and developed an empirical way to account for the impact of the correlation introduced by the 3D drizzle interpolation. The achieved 3 sigma emission line detection limit for a point source is 1.5 and 3.1 10-19 erg.s-1.cm-2 for the single ultra-deep datacube and the mosaic, respectively. We extracted 6288 sources using an optimal extraction scheme that takes the published HST source locations as prior. In parallel, we performed a blind search of emission line galaxies using an original method based on advanced test statistics and filter matching. The blind search results in 1251 emission line galaxy candidates in the mosaic and 306 in the ultradeep datacube, including 72 sources without HST counterparts (mAB>31). In addition 88 sources missed in the HST catalog but with clear HST counterparts were identified. This data set is the deepest spectroscopic survey ever performed. In just over 100 hours of integration time, it provides nearly an order of magnitude more spectroscopic redshifts compared to the data that has been accumulated on the UDF over the past decade. The depth and high quality of these datacubes enables new and detailed studies of the physical properties of the galaxy population and their environments over a large redshift range.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Advanced Data Reduction for the MUSE Deep Fields
Authors:
Simon Conseil,
Roland Bacon,
Laure Piqueras,
Martin Shepherd
Abstract:
The Multi Unit Spectroscopic Explorer (MUSE) is an integral-field spectrograph operating in the visible wavelength range, and installed at the Very Large Telescope (VLT). The official MUSE pipeline is available from ESO. However, for the data reduction of the Deep Fields program (Bacon et al., in prep.), we have built a more sophisticated reduction pipeline, with additional reduction tasks, to ext…
▽ More
The Multi Unit Spectroscopic Explorer (MUSE) is an integral-field spectrograph operating in the visible wavelength range, and installed at the Very Large Telescope (VLT). The official MUSE pipeline is available from ESO. However, for the data reduction of the Deep Fields program (Bacon et al., in prep.), we have built a more sophisticated reduction pipeline, with additional reduction tasks, to extend the official pipeline and produce cubes with fewer instrumental residuals.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.