-
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Authors:
Bahare Fatemi,
Mehran Kazemi,
Anton Tsitsulin,
Karishma Malkan,
**yeong Yim,
John Palowitch,
Sungyong Seo,
Jonathan Halcrow,
Bryan Perozzi
Abstract:
Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-trai…
▽ More
Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-training or employ anonymization techniques that can inadvertently introduce factual inconsistencies. In this work, we address these limitations by introducing novel synthetic datasets specifically designed to assess LLM temporal reasoning abilities in various scenarios. The diversity of question types across these datasets enables systematic investigation into the impact of the problem structure, size, question type, fact order, and other factors on LLM performance. Our findings provide valuable insights into the strengths and weaknesses of current LLMs in temporal reasoning tasks. To foster further research in this area, we are open-sourcing the datasets and evaluation framework used in our experiments: https://huggingface.co/datasets/baharef/ToT.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Unravelling the asphericities in the explosion and multi-faceted circumstellar matter of SN 2023ixf
Authors:
Avinash Singh,
R. S. Teja,
T. J. Moriya,
K. Maeda,
K. S. Kawabata,
M. Tanaka,
R. Imazawa,
T. Nakaoka,
A. Gangopadhyay,
M. Yamanaka,
V. Swain,
D. K. Sahu,
G. C. Anupama,
B. Kumar,
R. M. Anche,
Y. Sano,
A. Raj,
V. K. Agnihotri,
V. Bhalerao,
D. Bisht,
M. S. Bisht,
K. Belwal,
S. K. Chakrabarti,
M. Fujii,
T. Nagayama
, et al. (11 additional authors not shown)
Abstract:
We present a detailed investigation of photometric, spectroscopic, and polarimetric observations of the Type II SN 2023ixf. The early detection of highly-ionized flash features, rapid ascent in ultraviolet flux coupled with the blueward shift in near-ultraviolet colors and temperature provides compelling evidence for a delayed shock breakout from a confined dense circumstellar matter (CSM) envelop…
▽ More
We present a detailed investigation of photometric, spectroscopic, and polarimetric observations of the Type II SN 2023ixf. The early detection of highly-ionized flash features, rapid ascent in ultraviolet flux coupled with the blueward shift in near-ultraviolet colors and temperature provides compelling evidence for a delayed shock breakout from a confined dense circumstellar matter (CSM) envelo** the progenitor star. The temporal evolution of polarization in the SN 2023ixf phase revealed three distinct peaks in polarization evolution at 1.4 d, 6.4 d, and 79.2 d, indicating an asymmetric dense CSM, an aspherical shock front and clumpiness in the low-density extended CSM, and an aspherical inner ejecta/He-core. SN 2023ixf displayed two dominant axes, one along the CSM-outer ejecta and the other along the inner ejecta/He-core, showcasing the independent origin of asymmetry in the early and late evolution. The argument for an aspherical shock front is further strengthened by the presence of a high-velocity broad absorption feature in the blue wing of the Balmer features in addition to the P-Cygni absorption post 16 d. Hydrodynamical light curve modeling indicated a progenitor mass of 10 solar mass with a radius of 470 solar radius, explosion energy of 2e51 erg, and 0.06 solar mass of 56Ni. The modeling also indicated a two-zone CSM: a confined dense CSM extending up to 5e14 cm, with a mass-loss rate of 1e-2 solar mass per year, and an extended CSM spanning from 5e14 cm to 1e16 cm with a mass-loss rate of 1e-4 solar mass per year. The early nebular phase observations display an axisymmetric line profile of [OI] and red-ward attenuation of the emission of Halpha post 125 days, marking the onset of dust formation.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Do Transformer Modifications Transfer Across Implementations and Applications?
Authors:
Sharan Narang,
Hyung Won Chung,
Yi Tay,
William Fedus,
Thibault Fevry,
Michael Matena,
Karishma Malkan,
Noah Fiedel,
Noam Shazeer,
Zhenzhong Lan,
Yanqi Zhou,
Wei Li,
Nan Ding,
Jake Marcus,
Adam Roberts,
Colin Raffel
Abstract:
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we f…
▽ More
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.
△ Less
Submitted 10 September, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
WT5?! Training Text-to-Text Models to Explain their Predictions
Authors:
Sharan Narang,
Colin Raffel,
Katherine Lee,
Adam Roberts,
Noah Fiedel,
Karishma Malkan
Abstract:
Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their predicti…
▽ More
Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their prediction. Crucially, this requires no modifications to the loss function or training and decoding procedures -- we simply train the model to output the explanation after generating the (natural text) prediction. We show that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets. To facilitate reproducibility and future work, we release our code use to train the models.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Lyman Continuum Emission Esca** from Luminous Green Pea Galaxies at z=0.5
Authors:
Matthew A. Malkan,
Brian K. Malkan
Abstract:
Compact starburst galaxies are thought to include many or most of the galaxies from which substantial Lyman continuum emission can escape into the intergalactic medium. Li and Malkan (2018) used SDSS photometry to find a population of such starburst galaxies at z~0.5. They were discovered by their extremely strong [OIII]4959+5007 emission lines, which produce a clearly detectable excess brightness…
▽ More
Compact starburst galaxies are thought to include many or most of the galaxies from which substantial Lyman continuum emission can escape into the intergalactic medium. Li and Malkan (2018) used SDSS photometry to find a population of such starburst galaxies at z~0.5. They were discovered by their extremely strong [OIII]4959+5007 emission lines, which produce a clearly detectable excess brightness in the i bandpass, compared with surrounding filters. We therefore used the HST/COS spectrograph to observe two of the newly discovered i-band excess galaxies around their Lyman limits. One has strongly detected continuum below its Lyman limit, corresponding to a relative escape fraction of ionizing photons of 20+/-2%. The other, which is less compact in UV imaging, has a 2-sigma upper limit to its Lyman escape fraction of <5%. Before the UV spectroscopy, the existing data could not distinguish these two galaxies. Although a sample of two is hardly sufficient for statistical analysis, it shows the possibility that some fraction of these strong [OIII] emitters as a class have ionizing photons esca**. The differences might be determined by the luck of our particular viewing geometry. Obtaining the HST spectroscopy, revealed that the Lyman-continuum emitting galaxy differs in having no central absorption in its prominent Lyα emission line profile. The other target, with no esca** Lyman continuum, shows the more common double-peaked Lyα emission.
△ Less
Submitted 6 January, 2021; v1 submitted 23 December, 2019;
originally announced December 2019.