-
Probing of Quantitative Values in Abstractive Summarization Models
Authors:
Nathan M. White
Abstract:
Abstractive text summarization has recently become a popular approach, but data hallucination remains a serious problem, including with quantitative data. We propose a set of probing tests to evaluate the efficacy of abstract summarization models' modeling of quantitative values found in the input text. Our results show that in most cases, the encoders of recent SOTA-performing models struggle to…
▽ More
Abstractive text summarization has recently become a popular approach, but data hallucination remains a serious problem, including with quantitative data. We propose a set of probing tests to evaluate the efficacy of abstract summarization models' modeling of quantitative values found in the input text. Our results show that in most cases, the encoders of recent SOTA-performing models struggle to provide embeddings that adequately represent quantitative values in the input compared to baselines, and in particular, they outperform random representations in some, but surprisingly not all, cases. Under our assumptions, this suggests that the encoder's performance contributes to the quantity hallucination problem. One model type in particular, DistilBART-CDM, was observed to underperform randomly initialized representations for several experiments, and performance versus BERT suggests that standard pretraining and fine-tuning approaches for the summarization task may play a role in underperformance for some encoders.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
The Harrington Yowlumne Narrative Corpus
Authors:
Nathan M. White,
Timothy Henry-Rodriguez
Abstract:
Minority languages continue to lack adequate resources for their development, especially in the technological domain. Likewise, the J.P. Harrington Papers collection at the Smithsonian Institution are difficult to access in practical terms for community members and researchers due to its handwritten and disorganized format. Our current work seeks to make a portion of this publicly-available yet pr…
▽ More
Minority languages continue to lack adequate resources for their development, especially in the technological domain. Likewise, the J.P. Harrington Papers collection at the Smithsonian Institution are difficult to access in practical terms for community members and researchers due to its handwritten and disorganized format. Our current work seeks to make a portion of this publicly-available yet problematic material practically accessible for natural language processing use. Here, we present the Harrington Yowlumne Narrative Corpus, a corpus of 20 narrative texts that derive from the TejoneƱo Yowlumne community of the Tinliw rancheria in Kern County, California between 1910 and 1925. We digitally transcribe the texts and, through a Levenshtein distance-based algorithm and manual checking, we provide gold-standard aligned normalized and lemmatized text. We likewise provide POS tags for each lemmatized token via a lexicon-based deterministic approach. Altogether, the corpus contains 57,136 transcribed characters aligned with 10,719 gold standard text-normalized words.
△ Less
Submitted 19 May, 2022; v1 submitted 31 January, 2021;
originally announced February 2021.
-
Constraining Disk Parameters of Be Stars using Narrowband H-alpha Interferometry with the NPOI
Authors:
C. Tycner,
G. C. Gilbreath,
R. T. Zavala,
J. T. Armstrong,
J. A. Benson,
A. R. Hajian,
D. J. Hutter,
C. E. Jones,
T. A. Pauls,
N. M. White
Abstract:
Interferometric observations of two well-known Be stars, gamma Cas and phi Per, were collected and analyzed to determine the spatial characteristics of their circumstellar regions. The observations were obtained using the Navy Prototype Optical Interferometer equipped with custom-made narrowband filters. The filters isolate the H-alpha emission line from the nearby continuum radiation, which res…
▽ More
Interferometric observations of two well-known Be stars, gamma Cas and phi Per, were collected and analyzed to determine the spatial characteristics of their circumstellar regions. The observations were obtained using the Navy Prototype Optical Interferometer equipped with custom-made narrowband filters. The filters isolate the H-alpha emission line from the nearby continuum radiation, which results in an increased contrast between the interferometric signature due to the H-alpha-emitting circumstellar region and the central star. Because the narrowband filters do not significantly attenuate the continuum radiation at wavelengths 50 nm or more away from the line, the interferometric signal in the H-alpha channel is calibrated with respect to the continuum channels. The observations used in this study represent the highest spatial resolution measurements of the H-alpha-emitting regions of Be stars obtained to date. These observations allow us to demonstrate for the first time that the intensity distribution in the circumstellar region of a Be star cannot be represented by uniform disk or ring-like structures, whereas a Gaussian intensity distribution appears to be fully consistent with our observations.
△ Less
Submitted 3 February, 2006;
originally announced February 2006.
-
Properties of the H-alpha-emitting Circumstellar Regions of Be Stars
Authors:
Christopher Tycner,
John B. Lester,
Arsen R. Hajian,
J. T. Armstrong,
J. A. Benson,
G. C. Gilbreath,
D. J. Hutter,
T. A. Pauls,
N. M. White
Abstract:
Long-baseline interferometric observations obtained with the Navy Prototype Optical Interferometer of the H-alpha-emitting envelopes of the Be stars eta Tauri and beta Canis Minoris are presented. For compatibility with the previously published interferometric results in the literature of other Be stars, circularly symmetric and elliptical Gaussian models were fitted to the calibrated H-alpha ob…
▽ More
Long-baseline interferometric observations obtained with the Navy Prototype Optical Interferometer of the H-alpha-emitting envelopes of the Be stars eta Tauri and beta Canis Minoris are presented. For compatibility with the previously published interferometric results in the literature of other Be stars, circularly symmetric and elliptical Gaussian models were fitted to the calibrated H-alpha observations. The models are sufficient in characterizing the angular distribution of the H-alpha-emitting circumstellar material associated with these Be stars. To study the correlations between the various model parameters and the stellar properties, the model parameters for eta Tau and beta CMi were combined with data for other Be stars from the literature. After accounting for the different distances to the sources and stellar continuum flux levels, it was possible to study the relationship between the net H-alpha emission and the physical extent of the H-alpha-emitting circumstellar region. A clear dependence of the net H-alpha emission on the linear size of the emitting region is demonstrated and these results are consistent with an optically thick line emission that is directly proportional to the effective area of the emitting disk. Within the small sample of stars considered in this analysis, no clear dependence on the spectral type or stellar rotation is found, although the results do suggest that hotter stars might have more extended H-alpha-emitting regions.
△ Less
Submitted 25 January, 2005;
originally announced January 2005.
-
Direct multi-wavelength limb-darkening measurements of three late-type giants with the Navy Prototype Optical Interferometer
Authors:
M. Wittkowski,
C. A. Hummel,
K. J. Johnston,
D. Mozurkewich,
A. R. Hajian,
N. M. White
Abstract:
We present direct measurements of the limb-darkened intensity profiles of the late-type giant stars HR5299, HR7635, and HR8621 obtained with the Navy Prototype Optical Interferometer (NPOI) at the Lowell Observatory. A triangle of baselines with lengths of 18.9 m, 22.2 m, and 37.5 m was used. We utilized squared visibility amplitudes beyond the first minimum, as well as triple amplitudes and pha…
▽ More
We present direct measurements of the limb-darkened intensity profiles of the late-type giant stars HR5299, HR7635, and HR8621 obtained with the Navy Prototype Optical Interferometer (NPOI) at the Lowell Observatory. A triangle of baselines with lengths of 18.9 m, 22.2 m, and 37.5 m was used. We utilized squared visibility amplitudes beyond the first minimum, as well as triple amplitudes and phases in up to 10 spectral channels covering a wavelength range of ~650 nm to ~850 nm. We find that our data can best be described by featureless symmetric limb-darkened disk models while uniform disk and fully darkened disk models can be rejected. We derive high-precision angular limb-darkened diameters for the three stars of 7.44 mas +/- 0.11 mas, 6.18 mas +/- 0.07 mas, and 6.94 mas +/- 0.12 mas, respectively. Using the HIPPARCOS parallaxes, we determine linear limb-darkened radii of 114 R$_\odot \pm $13 R$_\odot$, 56 R$_\odot \pm $4 R$_\odot$, and 98 R$_\odot \pm $9 R$_\odot$, respectively. We compare our data to a grid of Kurucz stellar model atmospheres, with them derive the effective temperatures and surface gravities without additional information, and find agreement with independent estimates derived from empirical calibrations and bolometric fluxes. This confirms the consistency of model predictions and direct observations of the limb-darkening effect.
△ Less
Submitted 10 August, 2001;
originally announced August 2001.