Showing 1–2 of 2 results for author: Kessler, B
-
Automatic Detection of Text Genre
Authors:
Brett Kessler,
Geoffrey Nunberg,
Hinrich Schuetze
Abstract:
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection ba…
▽ More
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural properties.
△ Less
Submitted 8 July, 1997;
originally announced July 1997.
-
Computational dialectology in Irish Gaelic
Authors:
Brett Kessler
Abstract:
Dialect grou**s can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determining a…
▽ More
Dialect grou**s can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determining and counting isoglosses, and is more accurate than the more familiar metric of computing Hamming distance based on whether vocabulary entries match. In the actual clustering step, traditional agglomerative clustering works better than the top-down technique of partitioning around medoids. When agglomerative clustering of phonetic string comparison distances is applied to Gaelic, reasonable dialect boundaries are obtained, corresponding to national and (within Ireland) provincial boundaries.
△ Less
Submitted 1 March, 1995;
originally announced March 1995.