-
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Authors:
Nicky Ringland,
Xiang Dai,
Ben Hachey,
Sarvnaz Karimi,
Cecile Paris,
James R. Curran
Abstract:
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation…
▽ More
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Web-scale Surface and Syntactic n-gram Features for Dependency Parsing
Authors:
Dominick Ng,
Mohit Bansal,
James R. Curran
Abstract:
We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks.
Surfac…
▽ More
We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks.
Surface and syntactic $n$-grams both produce substantial and complementary gains in parsing accuracy across domains. Our best system combines the two feature sets, achieving up to 0.8% absolute UAS improvements on newswire and 1.4% on web text.
△ Less
Submitted 24 February, 2015;
originally announced February 2015.
-
High velocity clouds in the Galactic All Sky Survey I. Catalogue
Authors:
Vanessa A. Moss,
Naomi M. McClure-Griffiths,
Tara Murphy,
D. J. Pisano,
Jonathan K. Kummerfeld,
James R. Curran
Abstract:
We present a catalogue of high-velocity clouds (HVCs) from the Galactic All Sky Survey (GASS) of southern-sky neutral hydrogen, which has 57 mK sensitivity and 1 km/s velocity resolution and was obtained with the Parkes Telescope. Our catalogue has been derived from the stray-radiation corrected second release of GASS. We describe the data and our method of identifying HVCs and analyse the overall…
▽ More
We present a catalogue of high-velocity clouds (HVCs) from the Galactic All Sky Survey (GASS) of southern-sky neutral hydrogen, which has 57 mK sensitivity and 1 km/s velocity resolution and was obtained with the Parkes Telescope. Our catalogue has been derived from the stray-radiation corrected second release of GASS. We describe the data and our method of identifying HVCs and analyse the overall properties of the GASS population. We catalogue a total of 1693 HVCs at declinations < 0 deg, including 1111 positive velocity HVCs and 582 negative velocity HVCs. Our catalogue also includes 295 anomalous velocity clouds (AVCs). The cloud line-widths of our HVC population have a median FWHM of ~19 km/s, which is lower than found in previous surveys. The completeness of our catalogue is above 95% based on comparison with the HIPASS catalogue of HVCs, upon which we improve with an order of magnitude in spectral resolution. We find 758 new HVCs and AVCs with no HIPASS counterpart. The GASS catalogue will shed an unprecedented light on the distribution and kinematic structure of southern-sky HVCs, as well as delve further into the cloud populations that make up the anomalous velocity gas of the Milky Way.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.
-
VAST: An ASKAP Survey for Variables and Slow Transients
Authors:
Tara Murphy,
Shami Chatterjee,
David L. Kaplan,
Jay Banyer,
Martin E. Bell,
Hayley E. Bignall,
Geoffrey C. Bower,
Robert Cameron,
David M. Coward,
James M. Cordes,
Steve Croft,
James R. Curran,
S. G. Djorgovski,
Sean A. Farrell,
Dale A. Frail,
B. M. Gaensler,
Duncan K. Galloway,
Bruce Gendre,
Anne J. Green,
Paul J. Hancock,
Simon Johnston,
Atish Kamble,
Casey J. Law,
T. Joseph W. Lazio,
Kitty K. Lo
, et al. (14 additional authors not shown)
Abstract:
The Australian Square Kilometre Array Pathfinder (ASKAP) will give us an unprecedented opportunity to investigate the transient sky at radio wavelengths. In this paper we present VAST, an ASKAP survey for Variables and Slow Transients. VAST will exploit the wide-field survey capabilities of ASKAP to enable the discovery and investigation of variable and transient phenomena from the local to the co…
▽ More
The Australian Square Kilometre Array Pathfinder (ASKAP) will give us an unprecedented opportunity to investigate the transient sky at radio wavelengths. In this paper we present VAST, an ASKAP survey for Variables and Slow Transients. VAST will exploit the wide-field survey capabilities of ASKAP to enable the discovery and investigation of variable and transient phenomena from the local to the cosmological, including flare stars, intermittent pulsars, X-ray binaries, magnetars, extreme scattering events, interstellar scintillation, radio supernovae and orphan afterglows of gamma ray bursts. In addition, it will allow us to probe unexplored regions of parameter space where new classes of transient sources may be detected. In this paper we review the known radio transient and variable populations and the current results from blind radio surveys. We outline a comprehensive program based on a multi-tiered survey strategy to characterise the radio transient sky through detection and monitoring of transient and variable sources on the ASKAP imaging timescales of five seconds and greater. We also present an analysis of the expected source populations that we will be able to detect with VAST.
△ Less
Submitted 6 July, 2012;
originally announced July 2012.
-
BLOBCAT: Software to Catalogue Flood-Filled Blobs in Radio Images of Total Intensity and Linear Polarization
Authors:
Christopher A. Hales,
Tara Murphy,
James R. Curran,
Enno Middelberg,
Bryan M. Gaensler,
Ray P. Norris
Abstract:
We present BLOBCAT, new source extraction software that utilises the flood fill algorithm to detect and catalogue blobs, or islands of pixels representing sources, in two-dimensional astronomical images. The software is designed to process radio-wavelength images of both Stokes I intensity and linear polarization, the latter formed through the quadrature sum of Stokes Q and U intensities or as a b…
▽ More
We present BLOBCAT, new source extraction software that utilises the flood fill algorithm to detect and catalogue blobs, or islands of pixels representing sources, in two-dimensional astronomical images. The software is designed to process radio-wavelength images of both Stokes I intensity and linear polarization, the latter formed through the quadrature sum of Stokes Q and U intensities or as a byproduct of rotation measure synthesis. We discuss an objective, automated method by which estimates of position-dependent background root-mean-square noise may be obtained and incorporated into BLOBCAT's analysis. We derive and implement within BLOBCAT corrections for two systematic biases to enable the flood fill algorithm to accurately measure flux densities for Gaussian sources. We discuss the treatment of non-Gaussian sources in light of these corrections. We perform simulations to validate the flux density and positional measurement performance of BLOBCAT, and we benchmark the results against those of a standard Gaussian fitting task. We demonstrate that BLOBCAT exhibits accurate measurement performance in total intensity and, in particular, linear polarization. BLOBCAT is particularly suited to the analysis of large survey data. The BLOBCAT software, supplemented with test data to illustrate its use, is available at: http://blobcat.sourceforge.net/ .
△ Less
Submitted 23 May, 2012;
originally announced May 2012.
-
Compact continuum source-finding for next generation radio surveys
Authors:
Paul J Hancock,
Tara Murphy,
Bryan M Gaensler,
Andrew Hopkins,
James R Curran
Abstract:
We present a detailed analysis of four of the most widely used radio source finding packages in radio astronomy, and a program being developed for the Australian Square Kilometer Array Pathfinder (ASKAP) telescope. The four packages; SExtractor, SFind, IMSAD and Selavy are shown to produce source catalogues with high completeness and reliability. In this paper we analyse the small fraction (~1%) o…
▽ More
We present a detailed analysis of four of the most widely used radio source finding packages in radio astronomy, and a program being developed for the Australian Square Kilometer Array Pathfinder (ASKAP) telescope. The four packages; SExtractor, SFind, IMSAD and Selavy are shown to produce source catalogues with high completeness and reliability. In this paper we analyse the small fraction (~1%) of cases in which these packages do not perform well. This small fraction of sources will be of concern for the next generation of radio surveys which will produce many thousands of sources on a daily basis, in particular for blind radio transients surveys. From our analysis we identify the ways in which the underlying source finding algorithms fail. We demonstrate a new source finding algorithm Aegean, based on the application of a Laplacian kernel, which can avoid these problems and can produce complete and reliable source catalogues for the next generation of radio surveys.
△ Less
Submitted 20 February, 2012;
originally announced February 2012.