-
Efficient Induction of Language Models Via Probabilistic Concept Formation
Authors:
Christopher J. MacLellan,
Peter Matsakis,
Pat Langley
Abstract:
This paper presents a novel approach to the acquisition of language models from corpora. The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of probabilistic concepts that used a tabular, attribute-value encoding of training cases and concepts, making it unsuitable for sequential input like language. In response, we explore three new extensions to Cobweb -- the W…
▽ More
This paper presents a novel approach to the acquisition of language models from corpora. The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of probabilistic concepts that used a tabular, attribute-value encoding of training cases and concepts, making it unsuitable for sequential input like language. In response, we explore three new extensions to Cobweb -- the Word, Leaf, and Path variants. These systems encode each training case as an anchor word and surrounding context words, and they store probabilistic descriptions of concepts as distributions over anchor and context information. As in the original Cobweb, a performance element sorts a new instance downward through the hierarchy and uses the final node to predict missing features. Learning is interleaved with performance, updating concept probabilities and hierarchy structure as classification occurs. Thus, the new approaches process training cases in an incremental, online manner that it very different from most methods for statistical language learning. We examine how well the three variants place synonyms together and keep homonyms apart, their ability to recall synonyms as a function of training set size, and their training efficiency. Finally, we discuss related work on incremental learning and directions for further research.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Induction of Selective Bayesian Classifiers
Authors:
Pat Langley,
Stephanie Sage
Abstract:
In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that c arries out a greedy search through the space of features. We hypothesize that this approach will improve asymptotic accuracy in domains that i…
▽ More
In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that c arries out a greedy search through the space of features. We hypothesize that this approach will improve asymptotic accuracy in domains that involve correlated features without reducing the rate of learning in ones that do not. We report experimental results on six natural domains, including comparisons with decision-tree induction, that support these hypotheses. In closing, we discuss other approaches to extending naive Bayesian classifiers and outline some directions for future research.
△ Less
Submitted 27 February, 2013;
originally announced February 2013.
-
Estimating Continuous Distributions in Bayesian Classifiers
Authors:
George H. John,
Pat Langley
Abstract:
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive…
▽ More
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.
△ Less
Submitted 20 February, 2013;
originally announced February 2013.
-
A Personalized System for Conversational Recommendations
Authors:
M. H. Goker,
P. Langley,
C. A. Thompson
Abstract:
Searching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases. Recommendation systems help users find items of interest of a particular type, such as movies or restaurants, but are still somewhat awkward to use. Our solution is to take advantage of the complementary strengths of personalized recommendation sys…
▽ More
Searching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases. Recommendation systems help users find items of interest of a particular type, such as movies or restaurants, but are still somewhat awkward to use. Our solution is to take advantage of the complementary strengths of personalized recommendation systems and dialogue systems, creating personalized aides. We present a system -- the Adaptive Place Advisor -- that treats item selection as an interactive, conversational process, with the program inquiring about item attributes and the user responding. Individual, long-term user preferences are unobtrusively obtained in the course of normal recommendation dialogues and used to direct future conversations with the same user. We present a novel user model that influences both item search and the questions asked during a conversation. We demonstrate the effectiveness of our system in significantly reducing the time and number of interactions required to find a satisfactory item, as compared to a control group of users interacting with a non-adaptive version of the system.
△ Less
Submitted 30 June, 2011;
originally announced July 2011.