-
Exploring Category Structure with Contextual Language Models and Lexical Semantic Networks
Authors:
Joseph Renner,
Pascal Denis,
Rémi Gilleron,
Angèle Brunellière
Abstract:
Recent work on predicting category structure with distributional models, using either static word embeddings (Heyman and Heyman, 2019) or contextualized language models (CLMs) (Misra et al., 2021), report low correlations with human ratings, thus calling into question their plausibility as models of human semantic memory. In this work, we revisit this question testing a wider array of methods for…
▽ More
Recent work on predicting category structure with distributional models, using either static word embeddings (Heyman and Heyman, 2019) or contextualized language models (CLMs) (Misra et al., 2021), report low correlations with human ratings, thus calling into question their plausibility as models of human semantic memory. In this work, we revisit this question testing a wider array of methods for probing CLMs for predicting typicality scores. Our experiments, using BERT (Devlin et al., 2018), show the importance of using the right type of CLM probes, as our best BERT-based typicality prediction methods substantially improve over previous works. Second, our results highlight the importance of polysemy in this task: our best results are obtained when using a disambiguation mechanism. Finally, additional experiments reveal that Information Contentbased WordNet (Miller, 1995), also endowed with disambiguation, match the performance of the best BERT-based method, and in fact capture complementary information, which can be combined with BERT to achieve enhanced typicality predictions.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
On Probability Distributions for Trees: Representations, Inference and Learning
Authors:
François Denis,
Amaury Habrard,
Rémi Gilleron,
Marc Tommasi,
Édouard Gilbert
Abstract:
We study probability distributions over free algebras of trees. Probability distributions can be seen as particular (formal power) tree series [Berstel et al 82, Esik et al 03], i.e. map**s from trees to a semiring K . A widely studied class of tree series is the class of rational (or recognizable) tree series which can be defined either in an algebraic way or by means of multiplicity tree aut…
▽ More
We study probability distributions over free algebras of trees. Probability distributions can be seen as particular (formal power) tree series [Berstel et al 82, Esik et al 03], i.e. map**s from trees to a semiring K . A widely studied class of tree series is the class of rational (or recognizable) tree series which can be defined either in an algebraic way or by means of multiplicity tree automata. We argue that the algebraic representation is very convenient to model probability distributions over a free algebra of trees. First, as in the string case, the algebraic representation allows to design learning algorithms for the whole class of probability distributions defined by rational tree series. Note that learning algorithms for rational tree series correspond to learning algorithms for weighted tree automata where both the structure and the weights are learned. Second, the algebraic representation can be easily extended to deal with unranked trees (like XML trees where a symbol may have an unbounded number of children). Both properties are particularly relevant for applications: nondeterministic automata are required for the inference problem to be relevant (recall that Hidden Markov Models are equivalent to nondeterministic string automata); nowadays applications for Web Information Extraction, Web Services and document processing consider unranked trees.
△ Less
Submitted 18 July, 2008;
originally announced July 2008.
-
Residual Finite Tree Automata
Authors:
J. Carme,
R. Gilleron,
A. Lemay,
A. Terlutte,
M. Tommasi
Abstract:
Tree automata based algorithms are essential in many fields in computer science such as verification, specification, program analysis. They become also essential for databases with the development of standards such as XML. In this paper, we define new classes of non deterministic tree automata, namely residual finite tree automata (RFTA). In the bottom-up case, we obtain a new characterization o…
▽ More
Tree automata based algorithms are essential in many fields in computer science such as verification, specification, program analysis. They become also essential for databases with the development of standards such as XML. In this paper, we define new classes of non deterministic tree automata, namely residual finite tree automata (RFTA). In the bottom-up case, we obtain a new characterization of regular tree languages. In the top-down case, we obtain a subclass of regular tree languages which contains the class of languages recognized by deterministic top-down tree automata. RFTA also come with the property of existence of canonical non deterministic tree automata.
△ Less
Submitted 5 September, 2006;
originally announced September 2006.