-
Gnuastro: Estimating the Zero Point Magnitude in Astronomical Imaging
Authors:
Sepideh Eskandarlou,
Mohammad Akhlaghi,
Raúl Infante-Sainz,
Elham Saremi,
Samane Raji,
Zahra Sharbaf,
Giulia Golini,
Zohreh Ghaffari,
Johan H. Knapen
Abstract:
Calibration of pixel values is a fundamental step for accurate measurements in astronomical imaging. In astronomical jargon this is known as estimating zero point magnitude. Here, we introduce a newly added script in GNU Astronomy Utilities (Gnuastro) version 0.20 for the zero point magnitude estimation, named: astscript-zeropoint. The script offers numerous features, such as the flexibility to us…
▽ More
Calibration of pixel values is a fundamental step for accurate measurements in astronomical imaging. In astronomical jargon this is known as estimating zero point magnitude. Here, we introduce a newly added script in GNU Astronomy Utilities (Gnuastro) version 0.20 for the zero point magnitude estimation, named: astscript-zeropoint. The script offers numerous features, such as the flexibility to use either image(s) or a catalog as the reference dataset. Additionally, steps are parallelized to enhance efficiency for big data. Thanks to Gnuastro's minimal dependencies, the script is both flexible and portable. The figures of this research note are reproducible with Maneage, on the Git commit c89275e.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Zero-shot Cross-Linguistic Learning of Event Semantics
Authors:
Malihe Alikhani,
Thomas Kober,
Bashar Alhafni,
Yue Chen,
Mert Inan,
Elizabeth Nielsen,
Shahab Raji,
Mark Steedman,
Matthew Stone
Abstract:
Typologically diverse languages offer systems of lexical and grammatical aspect that allow speakers to focus on facets of event structure in ways that comport with the specific communicative setting and discourse constraints they face. In this paper, we look specifically at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish and describe a computational model for predict…
▽ More
Typologically diverse languages offer systems of lexical and grammatical aspect that allow speakers to focus on facets of event structure in ways that comport with the specific communicative setting and discourse constraints they face. In this paper, we look specifically at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish and describe a computational model for predicting lexical aspects. Despite the heterogeneity of these languages, and the salient invocation of distinctive linguistic resources across their caption corpora, speakers of these languages show surprising similarities in the ways they frame image content. We leverage this observation for zero-shot cross-lingual learning and show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Once in a blue stream: Detection of recent star formation in the NGC 7241 stellar stream with MEGARA
Authors:
David Martinez-Delgado,
Santi Roca-Fabrega,
Armando Gil de Paz,
Denis Erkal,
Juan Miro-Carretero,
Dmitry Makarov,
Karina T. Voggel,
Ryan Leaman,
Walter Boschin,
Sarah Pearson,
Giuseppe Donatiello,
Evgenii Rubtsov,
Mohammad Akhlaghi,
M. Angeles Gomez-Flechoso,
Samane Raji,
Dustin Lang,
Adam Block,
Jesus Gallego,
Esperanza Carrasco,
Maria Luisa Garcia-Vargas,
Jorge Iglesias-Paramo,
Sergio Pascual,
Nicolas Cardiel,
Ana Perez-Calpena,
Africa Castillo-Morales
, et al. (1 additional authors not shown)
Abstract:
In this work we study the striking case of a narrow blue stream around the NGC 7241 galaxy and its foreground dwarf companion. We want to figure out if the stream was generated by tidal interaction with NGC 7241 or it first interacted with the foreground dwarf companion and later both fell together towards NGC 7241. We use four sets of observations, including a follow-up spectroscopic study with t…
▽ More
In this work we study the striking case of a narrow blue stream around the NGC 7241 galaxy and its foreground dwarf companion. We want to figure out if the stream was generated by tidal interaction with NGC 7241 or it first interacted with the foreground dwarf companion and later both fell together towards NGC 7241. We use four sets of observations, including a follow-up spectroscopic study with the MEGARA instrument at the 10.4-m Gran Telescopio Canarias. Our data suggest that the compact object we detected in the stream is a foreground Milky Way halo star. Near this compact object we detect emission lines overlap** a bluer and fainter blob of the stream that is clearly visible in both ultra-violet and optical deep images. From its heliocentric systemic radial velocity (Vsyst= 1548.58+/-1.80 km s^-1) and new UV and optical broad-band photometry, we conclude that this over-density could be the actual core of the stream, with an absolute magnitude of M_g ~ -10 and a (g-r) = 0.08 +/- 0.11, consistent with a remnant of a low-mass dwarf satellite undergoing a current episode of star formation. From the width of the stream and assuming a circular orbit, we calculate that the progenitor mass can be the typical of a dwarf galaxy, but it could also be substantially lower if the stream is on a very radial orbit or it was created by tidal interaction with the companion dwarf instead of with NGC 7241. Finally, we find that blue stellar streams containing star formation regions are commonly predicted by high-resolution cosmological simulations of galaxies lighter than the Milky Way. This scenario is consistent with the processes explaining the bursty star formation history of some dwarf satellites, which are followed by a gas depletion and a fast quenching once they enter within the virial radius of their host galaxies for the first time.
△ Less
Submitted 14 December, 2023; v1 submitted 13 December, 2021;
originally announced December 2021.
-
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Authors:
Kaustubh D. Dhole,
Varun Gangal,
Sebastian Gehrmann,
Aadesh Gupta,
Zhenhao Li,
Saad Mahamood,
Abinaya Mahendiran,
Simon Mille,
Ashish Shrivastava,
Samson Tan,
Tongshuang Wu,
Jascha Sohl-Dickstein,
**ho D. Choi,
Eduard Hovy,
Ondrej Dusek,
Sebastian Ruder,
Sajant Anand,
Nagender Aneja,
Rabin Banjade,
Lisa Barthe,
Hanna Behnke,
Ian Berlot-Attwell,
Connor Boyle,
Caroline Brun,
Marco Antonio Sobrevilla Cabezudo
, et al. (101 additional authors not shown)
Abstract:
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split…
▽ More
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).
△ Less
Submitted 11 October, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Guilt by Association: Emotion Intensities in Lexical Representations
Authors:
Shahab Raji,
Gerard de Melo
Abstract:
What do word vector representations reveal about the emotions associated with words? In this study, we consider the task of estimating word-level emotion intensity scores for specific emotions, exploring unsupervised, supervised, and finally a self-supervised method of extracting emotional associations from word vector representations. Overall, we find that word vectors carry substantial potential…
▽ More
What do word vector representations reveal about the emotions associated with words? In this study, we consider the task of estimating word-level emotion intensity scores for specific emotions, exploring unsupervised, supervised, and finally a self-supervised method of extracting emotional associations from word vector representations. Overall, we find that word vectors carry substantial potential for inducing fine-grained emotion intensity scores, showing a far higher correlation with human ground truth ratings than achieved by state-of-the-art emotion lexicons.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
ParsiNLU: A Suite of Language Understanding Challenges for Persian
Authors:
Daniel Khashabi,
Arman Cohan,
Siamak Shakeri,
Pedram Hosseini,
Pouya Pezeshkpour,
Malihe Alikhani,
Moin Aminnaseri,
Marzieh Bitaab,
Faeze Brahman,
Sarik Ghazarian,
Mozhdeh Gheini,
Arman Kabiri,
Rabeeh Karimi Mahabadi,
Omid Memarrast,
Ahmadreza Mosallanezhad,
Erfan Noury,
Shahab Raji,
Mohammad Sadegh Rasooli,
Sepideh Sadeghi,
Erfan Sadeqi Azer,
Niloofar Safi Samghabadi,
Mahsa Shafaei,
Saber Sheybani,
Ali Tazarv,
Yadollah Yaghoobzadeh
Abstract:
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat…
▽ More
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.
△ Less
Submitted 13 July, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.