Skip to main content

Showing 1–3 of 3 results for author: Thomas, R G

Searching in archive cs. Search in all archives.
.
  1. Hierarchical Non-Emitting Markov Models

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitt… ▽ More

    Submitted 20 January, 1998; v1 submitted 14 January, 1998; originally announced January 1998.

    Comments: http://www.cs.princeton.edu/~ristad/papers/pu-544-97.ps.gz

    Report number: CS-TR-544-97

  2. Nonuniform Markov models

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model… ▽ More

    Submitted 16 November, 1996; originally announced November 1996.

    Comments: 17 pages

    Report number: CS-TR-536-96

  3. arXiv:cmp-lg/9505002  [pdf, ps

    cs.CL

    New Techniques for Context Modeling

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast,… ▽ More

    Submitted 1 May, 1995; originally announced May 1995.

    Comments: 8 pages, to appear in Proc. ACL 1995