Search | arXiv e-print repository

Nested Nonparametric Instrumental Variable Regression: Long Term, Mediated, and Time Varying Treatment Effects

Abstract: Several causal parameters in short panel data models are scalar summaries of a function called a nested nonparametric instrumental variable regression (nested NPIV). Examples include long term, mediated, and time varying treatment effects identified using proxy variables. However, it appears that no prior estimators or guarantees for nested NPIV exist, preventing flexible estimation and inference… ▽ More Several causal parameters in short panel data models are scalar summaries of a function called a nested nonparametric instrumental variable regression (nested NPIV). Examples include long term, mediated, and time varying treatment effects identified using proxy variables. However, it appears that no prior estimators or guarantees for nested NPIV exist, preventing flexible estimation and inference for these causal parameters. A major challenge is compounding ill posedness due to the nested inverse problems. We analyze adversarial estimators of nested NPIV, and provide sufficient conditions for efficient inference on the causal parameter. Our nonasymptotic analysis has three salient features: (i) introducing techniques that limit how ill posedness compounds; (ii) accommodating neural networks, random forests, and reproducing kernel Hilbert spaces; and (iii) extending to causal functions, e.g. long term heterogeneous treatment effects. We measure long term heterogeneous treatment effects of Project STAR and mediated proximal treatment effects of the Job Corps. △ Less

Submitted 10 March, 2024; v1 submitted 28 December, 2021; originally announced December 2021.

arXiv:2102.12564 [pdf, other]

doi 10.1007/s00521-021-06408-6

Triplet loss based embeddings for forensic speaker identification in Spanish

Authors: Emmanuel Maqueda, Javier Alvarez-Jimenez, Carlos Mena, Ivan Meza

Abstract: With the advent of digital technology, it is more common that committed crimes or legal disputes involve some form of speech recording where the identity of a speaker is questioned [1]. In face of this situation, the field of forensic speaker identification has been looking to shed light on the problem by quantifying how much a speech recording belongs to a particular person in relation to a popul… ▽ More With the advent of digital technology, it is more common that committed crimes or legal disputes involve some form of speech recording where the identity of a speaker is questioned [1]. In face of this situation, the field of forensic speaker identification has been looking to shed light on the problem by quantifying how much a speech recording belongs to a particular person in relation to a population. In this work, we explore the use of speech embeddings obtained by training a CNN using the triplet loss. In particular, we focus on the Spanish language which has not been extensively studies. We propose extracting the embeddings from speech spectrograms samples, then explore several configurations of such spectrograms, and finally, quantify the embeddings quality. We also show some limitations of our data setting which is predominantly composed by male speakers. At the end, we propose two approaches to calculate the Likelihood Radio given out speech embeddings and we show that triplet loss is a good alternative to create speech embeddings for forensic speaker identification. △ Less

Submitted 13 September, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: Long Paper: Neural Computing and Applications, Special Issue on LatinX in AI Research (2021). 11 pages, 5 figures

arXiv:2002.05673 [pdf]

Hacia los Comités de Ética en Inteligencia Artificial

Authors: Sofía Trejo, Ivan Meza, Fernanda López-Escobedo

Abstract: The goal of Artificial Intelligence based systems is to take decisions that have an effect in their environment and impact society. This points out to the necessity of mechanism that regulate the impact of this type of system in society. For this reason, it is priority to create the rules and specialized organizations that can oversight the following of such rules, particularly that human rights p… ▽ More The goal of Artificial Intelligence based systems is to take decisions that have an effect in their environment and impact society. This points out to the necessity of mechanism that regulate the impact of this type of system in society. For this reason, it is priority to create the rules and specialized organizations that can oversight the following of such rules, particularly that human rights precepts at local and international level. This work proposes the creation, at the universities, of Ethical Committees or Commissions specialized on Artificial Intelligence that would be in charge of define the principles and will guarantee the following of good practices in the field Artificial Intelligence. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: 14 pages, in Spanish

ACM Class: K.m

arXiv:1807.00286 [pdf, ps, other]

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Authors: Manuel Mager, Elisabeth Mager, Alfonso Medina-Urrea, Ivan Meza, Katharina Kann

Abstract: Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we stu… ▽ More Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process. △ Less

Submitted 1 July, 2018; originally announced July 2018.

Comments: To appear in "All Together Now? Computational Modeling of Polysynthetic Languages" Workshop, at COLING 2018

arXiv:1806.04291 [pdf, ps, other]

Challenges of language technologies for the indigenous languages of the Americas

Authors: Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, Ivan Meza

Abstract: Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We w… ▽ More Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We would like to encourage NLP research in linguistically rich and diverse areas like the Americas. △ Less

Submitted 11 June, 2018; originally announced June 2018.

Comments: In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)

Showing 1–5 of 5 results for author: Meza, I