Search | arXiv e-print repository

Hierarchical Non-Emitting Markov Models

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitt… ▽ More We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitting model is also much less prone to overfitting. Keywords: Markov model, interpolated Markov model, hidden Markov model, mixture modeling, non-emitting state transitions, state-conditional interpolation, statistical language model, discrete time series, Brown corpus, Wall Street Journal. △ Less

Submitted 20 January, 1998; v1 submitted 14 January, 1998; originally announced January 1998.

Comments: http://www.cs.princeton.edu/~ristad/papers/pu-544-97.ps.gz

Report number: CS-TR-544-97

arXiv:cmp-lg/9611004 [pdf, ps, other]

Nonuniform Markov models

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model… ▽ More A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model. This result is somewhat remarkable because both models contain identical numbers of parameters whose values are estimated in a similar manner. The only difference between the two models is how they combine the statistics of longer and shorter strings. Keywords: nonuniform Markov model, interpolated Markov model, conditional independence, statistical language model, discrete time series. △ Less

Submitted 16 November, 1996; originally announced November 1996.

Comments: 17 pages

Report number: CS-TR-536-96

arXiv:cmp-lg/9505002 [pdf, ps]

New Techniques for Context Modeling

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast,… ▽ More We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast, the character 4-gram model requires more than 250 times as many parameters in order to achieve a message entropy of only 2.47 bits/char. The fact that our model performs significantly better while using vastly fewer parameters indicates that it is a better probability model of natural language text. △ Less

Submitted 1 May, 1995; originally announced May 1995.

Comments: 8 pages, to appear in Proc. ACL 1995

Showing 1–3 of 3 results for author: Thomas, R G