-
arXiv:cs/0306050 [pdf, ps, other]
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
Abstract: We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.
Submitted 12 June, 2003; originally announced June 2003.
ACM Class: I.2.7
Journal ref: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147
-
arXiv:cs/0209010 [pdf, ps, other]
Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition
Abstract: We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.
Submitted 5 September, 2002; originally announced September 2002.
Comments: 4 pages
ACM Class: I.2.7
Journal ref: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 155-158
-
arXiv:cs/0204049 [pdf, ps, other]
Memory-Based Shallow Parsing
Abstract: We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and th… ▽ More
Submitted 24 April, 2002; originally announced April 2002.
Report number: jmlr-2002-tks ACM Class: I.2.7
Journal ref: Journal of Machine Learning Research, volume 2 (March), 2002, pp. 559-594
-
arXiv:cs/0107018 [pdf, ps, other]
Combining a self-organising map with memory-based learning
Abstract: Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some in… ▽ More
Submitted 15 July, 2001; originally announced July 2001.
ACM Class: I.2.7
Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 9-14
-
arXiv:cs/0107017 [pdf, ps, other]
Learning Computational Grammars
Abstract: This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the da… ▽ More
Submitted 15 July, 2001; originally announced July 2001.
ACM Class: I.2.7
Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 97-104
-
arXiv:cs/0107016 [pdf, ps, other]
Introduction to the CoNLL-2001 Shared Task: Clause Identification
Abstract: We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.
Submitted 15 July, 2001; originally announced July 2001.
ACM Class: I.2.7
Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 53-57
-
arXiv:cs/0009008 [pdf, ps, other]
Introduction to the CoNLL-2000 Shared Task: Chunking
Abstract: We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.
Submitted 18 September, 2000; originally announced September 2000.
Comments: 6 pages
ACM Class: I.2.7
Journal ref: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal
-
arXiv:cs/0008012 [pdf, ps, other]
Applying System Combination to Base Noun Phrase Identification
Abstract: We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this… ▽ More
Submitted 17 August, 2000; originally announced August 2000.
Comments: 7 pages
ACM Class: I.2.7
Journal ref: Proceedings of COLING 2000, Saarbruecken, Germany
-
arXiv:cs/0005015 [pdf, ps, other]
Noun Phrase Recognition by System Combination
Abstract: The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases.We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et.al. 1998) we manage to improve the best reported performances… ▽ More
Submitted 10 May, 2000; originally announced May 2000.
Comments: 6 pages
ACM Class: I.2.7
Journal ref: Proceedings of NAACL 2000, Seattle, WA, USA
-
arXiv:cs/9907006 [pdf, ps, other]
Representing Text Chunks
Abstract: Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the… ▽ More
Submitted 6 July, 1999; originally announced July 1999.
Comments: 7 pages
ACM Class: I.2.7
Journal ref: EACL'99, Bergen