Search | arXiv e-print repository

Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

Authors: Erik F. Tjong Kim Sang, Fien De Meulder

Abstract: We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. △ Less

Submitted 12 June, 2003; originally announced June 2003.

ACM Class: I.2.7

Journal ref: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147

arXiv:cs/0209010 [pdf, ps, other]

Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition

Authors: Erik F. Tjong Kim Sang

Abstract: We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. △ Less

Submitted 5 September, 2002; originally announced September 2002.

Comments: 4 pages

ACM Class: I.2.7

Journal ref: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 155-158

arXiv:cs/0204049 [pdf, ps, other]

Memory-Based Shallow Parsing

Authors: Erik F. Tjong Kim Sang

Abstract: We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and th… ▽ More We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement. △ Less

Submitted 24 April, 2002; originally announced April 2002.

Report number: jmlr-2002-tks ACM Class: I.2.7

Journal ref: Journal of Machine Learning Research, volume 2 (March), 2002, pp. 559-594

arXiv:cs/0107018 [pdf, ps, other]

Combining a self-organising map with memory-based learning

Authors: James Hammerton, Erik F. Tjong Kim Sang

Abstract: Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some in… ▽ More Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some interest in various forms of memory editing whereby some method of selecting a subset of the memory base is employed to reduce the number of comparisons. This paper investigates the use of a modified self-organising map (SOM) to select a subset of the memory items for comparison. This method involves reducing the number of comparisons to a value proportional to the square root of the number of training items. The method is tested on the identification of base noun-phrases in the Wall Street Journal corpus, using sections 15 to 18 for training and section 20 for testing. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 9-14

arXiv:cs/0107017 [pdf, ps, other]

Learning Computational Grammars

Authors: John Nerbonne, Anja Belz, Nicola Cancedda, Herve Dejean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard, Erik F. Tjong Kim Sang

Abstract: This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the da… ▽ More This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the data, and the availability of knowledge bases (grammars). We focused on syntax, esp. noun phrase (NP) syntax. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 97-104

arXiv:cs/0107016 [pdf, ps, other]

Introduction to the CoNLL-2001 Shared Task: Clause Identification

Authors: Erik F. Tjong Kim Sang, Herve Dejean

Abstract: We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 53-57

arXiv:cs/0009008 [pdf, ps, other]

Introduction to the CoNLL-2000 Shared Task: Chunking

Authors: Erik F. Tjong Kim Sang, Sabine Buchholz

Abstract: We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. △ Less

Submitted 18 September, 2000; originally announced September 2000.

Comments: 6 pages

ACM Class: I.2.7

Journal ref: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal

arXiv:cs/0008012 [pdf, ps, other]

Applying System Combination to Base Noun Phrase Identification

Authors: Erik F. Tjong Kim Sang, Walter Daelemans, Herve Dejean, Rob Koeling, Yuval Krymolowski, Vasin Punyakanok, Dan Roth

Abstract: We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this… ▽ More We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this data set. △ Less

Submitted 17 August, 2000; originally announced August 2000.

Comments: 7 pages

ACM Class: I.2.7

Journal ref: Proceedings of COLING 2000, Saarbruecken, Germany

arXiv:cs/0005015 [pdf, ps, other]

Noun Phrase Recognition by System Combination

Authors: Erik F. Tjong Kim Sang

Abstract: The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases.We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et.al. 1998) we manage to improve the best reported performances… ▽ More The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases.We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et.al. 1998) we manage to improve the best reported performances on standard data sets for base noun phrases and arbitrary noun phrases. △ Less

Submitted 10 May, 2000; originally announced May 2000.

Comments: 6 pages

ACM Class: I.2.7

Journal ref: Proceedings of NAACL 2000, Seattle, WA, USA

arXiv:cs/9907006 [pdf, ps, other]

Representing Text Chunks

Authors: Erik F. Tjong Kim Sang, Jorn Veenstra

Abstract: Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the… ▽ More Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance. However, equipped with the most suitable data representation, our memory-based learning chunker was able to improve the best published chunking results for a standard data set. △ Less

Submitted 6 July, 1999; originally announced July 1999.

Comments: 7 pages

ACM Class: I.2.7

Journal ref: EACL'99, Bergen

Showing 1–10 of 10 results for author: Sang, E F T K