Skip to main content

Showing 1–7 of 7 results for author: Sierra, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2101.10877  [pdf, other

    eess.AS cs.SD

    Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

    Authors: Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe

    Abstract: "Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, es… ▽ More

    Submitted 5 March, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: Accepted by EACL2021

  2. arXiv:1806.04291  [pdf, ps, other

    cs.CL

    Challenges of language technologies for the indigenous languages of the Americas

    Authors: Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, Ivan Meza

    Abstract: Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We w… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)

  3. arXiv:1710.06524  [pdf, ps, other

    cs.CL

    Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF

    Authors: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov

    Abstract: Sentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an open question due to complexities of semantic interactions among words. In this paper, we present an embedding method, which is aimed at learning unsupervised sen… ▽ More

    Submitted 19 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.

  4. arXiv:1703.03923  [pdf, other

    cs.IR cs.CL

    A German Corpus for Text Similarity Detection Tasks

    Authors: Juan-Manuel Torres-Moreno, Gerardo Sierra, Peter Peinl

    Abstract: Text similarity detection aims at measuring the degree of similarity between a pair of texts. Corpora available for text similarity detection are designed to evaluate the algorithms to assess the paraphrase level among documents. In this paper we present a textual German corpus for similarity detection. The purpose of this corpus is to automatically assess the similarity between a pair of texts an… ▽ More

    Submitted 11 March, 2017; originally announced March 2017.

    Comments: 1 figure; 13 pages

    Journal ref: Preprint of International Journal of Computational Linguistics and Applications, vol. 5, no. 2, 2014, pp. 9-24

  5. arXiv:1702.06467  [pdf, other

    cs.IR cs.CL cs.SI

    Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

    Authors: Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Azucena Montes Rendón, Gerardo Sierra

    Abstract: In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, $n$-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character floodi… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: 8 pages, 6 figures, Conference paper

    Journal ref: Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol 1: KDIR, 307-314, 2016, Porto, Portugal

  6. arXiv:1501.04920  [pdf

    cs.IR cs.CL

    Regroupement sémantique de définitions en espagnol

    Authors: Gerardo Sierra, Juan-Manuel Torres-Moreno, Alejandro Molina

    Abstract: This article focuses on the description and evaluation of a new unsupervised learning method of clustering of definitions in Spanish according to their semantic. Textual Energy was used as a clustering measure, and we study an adaptation of the Precision and Recall to evaluate our method.

    Submitted 20 January, 2015; originally announced January 2015.

    Comments: 11 pages, in French, 5 figures. Workshop Evaluation des méthodes d'Extraction de Connaissances dans les Données EvalECD EGC'10, 2010 Tunis

  7. arXiv:1212.3493  [pdf, ps, other

    cs.CL cs.IR

    Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

    Authors: Alejandro Molina, Juan-Manuel Torres-Moreno, Iria da Cunha, Eric SanJuan, Gerardo Sierra

    Abstract: Previous works demonstrated that Automatic Text Summarization (ATS) by sentences extraction may be improved using sentence compression. In this work we present a sentence compressions approach guided by level-sentence discourse segmentation and probabilistic language models (LM). The results presented here show that the proposed solution is able to generate coherent summaries with grammatical comp… ▽ More

    Submitted 17 December, 2012; v1 submitted 14 December, 2012; originally announced December 2012.

    Comments: 7 pages, 3 tables