Skip to main content

Showing 1–3 of 3 results for author: Shmidman, C S

Searching in archive cs. Search in all archives.
.
  1. Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

    Authors: Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty

    Abstract: Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and different morphosyntactic properties. This ambiguity goes beyond word-sense disambiguation (WSD), and may include token segmentation into multiple word… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Journal ref: In Proceedings of EACL 2023, 849-864 (2023)

  2. arXiv:2211.15199  [pdf

    cs.CL

    Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All

    Authors: Eylon Gueta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty

    Abstract: We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before. We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance. Our experiments show that larger vocabularies… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

  3. arXiv:2208.01875  [pdf

    cs.CL

    Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

    Authors: Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel

    Abstract: We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, AlephBert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographical, morphological, syntactic and orthographic norms. We demonstr… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.