Skip to main content

Showing 1–7 of 7 results for author: Ristad, E S

Searching in archive cs. Search in all archives.
.
  1. Hierarchical Non-Emitting Markov Models

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitt… ▽ More

    Submitted 20 January, 1998; v1 submitted 14 January, 1998; originally announced January 1998.

    Comments: http://www.cs.princeton.edu/~ristad/papers/pu-544-97.ps.gz

    Report number: CS-TR-544-97

  2. Library of Practical Abstractions, Release 1.2

    Authors: Eric Sven Ristad, Peter N. Yianilos

    Abstract: The library of practical abstractions (LIBPA) provides efficient implementations of conceptually simple abstractions, in the C programming language. We believe that the best library code is conceptually simple so that it will be easily understood by the application programmer; parameterized by type so that it enjoys wide applicability; and at least as efficient as a straightforward special-purpo… ▽ More

    Submitted 9 June, 1997; originally announced June 1997.

    Comments: 19 pages, texinfo format

  3. Maximum Entropy Modeling Toolkit

    Authors: Eric Sven Ristad

    Abstract: The Maximum Entropy Modeling Toolkit supports parameter estimation and prediction for statistical language models in the maximum entropy framework. The maximum entropy framework provides a constructive method for obtaining the unique conditional distribution p*(y|x) that satisfies a set of linear constraints and maximizes the conditional entropy H(p|f) with respect to the empirical distribution… ▽ More

    Submitted 31 December, 1996; originally announced December 1996.

    Comments: 32 pages, texinfo format

  4. Nonuniform Markov models

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model… ▽ More

    Submitted 16 November, 1996; originally announced November 1996.

    Comments: 17 pages

    Report number: CS-TR-536-96

  5. Learning string edit distance

    Authors: Eric Sven Ristad, Peter N. Yianilos

    Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance funct… ▽ More

    Submitted 2 November, 1997; v1 submitted 29 October, 1996; originally announced October 1996.

    Comments: http://www.cs.princeton.edu/~ristad/papers/pu-532-96.ps.gz

    Report number: CS-TR-532-96

  6. arXiv:cmp-lg/9508012  [pdf, ps

    cs.CL

    A Natural Law of Succession

    Authors: Eric Sven Ristad

    Abstract: Consider the problem of multinomial estimation. You are given an alphabet of k distinct symbols and are told that the i-th symbol occurred exactly n_i times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that o… ▽ More

    Submitted 30 August, 1995; originally announced August 1995.

    Comments: 23 pages

    Report number: pu-495-95

  7. arXiv:cmp-lg/9505002  [pdf, ps

    cs.CL

    New Techniques for Context Modeling

    Authors: Eric Sven Ristad, Robert G. Thomas

    Abstract: We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast,… ▽ More

    Submitted 1 May, 1995; originally announced May 1995.

    Comments: 8 pages, to appear in Proc. ACL 1995