Search | arXiv e-print repository

End to end developments for the Multipurpose Interferometer Array Pathfinder from the IAR Electronics Laboratory

Authors: J. M. Gonzalez, H. Command, G. Valdez

Abstract: The Multipurpose Interferometer Array Pathfinder (MIA), developed from the Argentine Institute of Radio Astronomy (IAR), is a radio astronomical instrument based on interferometry techniques, designed for the detection of radio emission from astronomical sources. Phase one consists of 16 antennas of 5 meters in diameter, with the possibility of increasing their number. In addition, it is equipped… ▽ More The Multipurpose Interferometer Array Pathfinder (MIA), developed from the Argentine Institute of Radio Astronomy (IAR), is a radio astronomical instrument based on interferometry techniques, designed for the detection of radio emission from astronomical sources. Phase one consists of 16 antennas of 5 meters in diameter, with the possibility of increasing their number. In addition, it is equipped with a dual polarization receiver with a bandwidth of 250 MHz, centered at 1325 MHz, and a digitizer and processor for the correlation functions. For the development of this instrument, a three antenna pathfinder is currently being built with its positioning control, radio frequency systems, acquisition and processing stages. This paper will describe the concept design and their current progress for each stage. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 5 pages, 7 figures

Journal ref: Volume #56 of the Revista Mexicana de Astronomía y Astrofísica Serie de Conferencias, 2023

arXiv:2303.04963 [pdf, other]

doi 10.1515/jqas-2022-0039

Predicting Elite NBA Lineups Using Individual Player Order Statistics

Authors: Susan E. Martonosi, Martin Gonzalez, Nicolas Oshiro

Abstract: NBA team managers and owners try to acquire high-performing players. An important consideration in these decisions is how well the new players will perform in combination with their teammates. Our objective is to identify elite five-person lineups, which we define as those having a positive plus-minus per minute (PMM). Using individual player order statistics, our model can identify an elite lineu… ▽ More NBA team managers and owners try to acquire high-performing players. An important consideration in these decisions is how well the new players will perform in combination with their teammates. Our objective is to identify elite five-person lineups, which we define as those having a positive plus-minus per minute (PMM). Using individual player order statistics, our model can identify an elite lineup even if the five players in the lineup have never played together, which can inform player acquisition decisions, salary negotiations, and real-time coaching decisions. We combine seven classification tools into a unanimous consent classifier (all-or-nothing classifier, or ANC) in which a lineup is predicted to be elite only if all seven classifiers predict it to be elite. In this way, we achieve high positive predictive value (i.e., precision), the likelihood that a lineup classified as elite will indeed have a positive PMM. We train and test the model on individual player and lineup data from the 2017-18 season and use the model to predict the performance of lineups drawn from all 30 NBA teams' 2018-19 regular season rosters. Although the ANC is conservative and misses some high-performing lineups, it achieves high precision and recommends positionally balanced lineups. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: 29 pages, 20 tables, 8 figures. Under review

MSC Class: 90B50

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2110.14122 [pdf, other]

doi 10.1109/TSP.2021.3135689

Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection with Mutual Information Estimation

Authors: Mauricio E. Gonzalez, Jorge F. Silva, Miguel Videla, Marcos E. Orchard

Abstract: This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learn… ▽ More This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learning criterion for designing a data-driven partition that connects with the problem of mutual information estimation. Applying these ideas in the context of a data-dependent tree-structured partition (TSP), we derive conditions on the TSP's parameters to achieve a strongly consistent distribution-free test of independence over the family of probabilities equipped with a density. Complementing this result, we present finite-length results that show our TSP scheme's capacity to detect the scenario of independence structurally with the data-driven partition as well as new sampling complexity bounds for this detection. Finally, some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2108.03691 [pdf, ps, other]

Model choice and parameter inference in controlled branching processes

Authors: Miguel González, Carmen Minuesa, Inés del Puerto

Abstract: Our purpose is to estimate the posterior distribution of the parameters of interest for controlled branching processes (CBPs) without prior knowledge of the maximum number of offspring that an individual can give birth to and without explicit likelihood calculations. We consider that only the population sizes at each generation and at least the number of progenitors of the last generation are obse… ▽ More Our purpose is to estimate the posterior distribution of the parameters of interest for controlled branching processes (CBPs) without prior knowledge of the maximum number of offspring that an individual can give birth to and without explicit likelihood calculations. We consider that only the population sizes at each generation and at least the number of progenitors of the last generation are observed, but the number of offspring produced by any individual at any generation is unknown. The proposed approach is two-fold. Firstly, to estimate the maximum progeny per individual we make use of an approximate Bayesian computation (ABC) algorithm for model choice and based on sequential importance sampling with the raw data. Secondly, given such an estimate and taking advantage of the simulated values of the previous stage, we approximate the posterior distribution of the main parameters of a CBP by applying the rejection ABC algorithm with an appropriate summary statistic and a post-processing adjustment. The accuracy of the proposed method is illustrated by means of simulated examples developed with the statistical software R. Moreover, we apply the methodology to two real datasets describing populations with logistic growth. To this end, different population growth models based on CBPs are proposed for the first time. △ Less

Submitted 8 August, 2021; originally announced August 2021.

Comments: 27 pages

arXiv:2106.07754 [pdf, other]

Counterfactual Explanations as Interventions in Latent Space

Authors: Riccardo Crupi, Alessandro Castelnovo, Daniele Regoli, Beatriz San Miguel Gonzalez

Abstract: Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and non-technical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, trans… ▽ More Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and non-technical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, transparency, accountability, etc. Within XAI techniques, counterfactual explanations aim to provide to end users a set of features (and their corresponding values) that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations, and in particular they fall short of considering the causal impact of such actions. In this paper, we present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile. Moreover, our methodology has the advantage that it can be set on top of existing counterfactuals generator algorithms, thus minimising the complexity of imposing additional causal constrains. We demonstrate the effectiveness of our approach with a set of different experiments using synthetic and real datasets (including a proprietary dataset of the financial domain). △ Less

Submitted 8 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 34 pages, 4 figures, 4 tables

arXiv:2103.01648 [pdf, other]

doi 10.1137/21M140225X

Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior

Authors: Mario González, Andrés Almansa, Pauline Tan

Abstract: In this work we address the problem of solving ill-posed inverse problems in imaging where the prior is a variational autoencoder (VAE). Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algori… ▽ More In this work we address the problem of solving ill-posed inverse problems in imaging where the prior is a variational autoencoder (VAE). Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique (JPMAP) performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. We also highlight the importance of correctly training the VAE using a denoising criterion, in order to ensure that the encoder generalizes well to out-of-distribution images, without affecting the quality of the generative model. This simple modification is key to providing robustness to the whole procedure. Finally we show how our joint MAP methodology relates to more common MAP approaches, and we propose a continuation scheme that makes use of our JPMAP algorithm to provide more robust MAP estimates. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima. △ Less

Submitted 25 April, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: arXiv admin note: text overlap with arXiv:1911.06379

arXiv:1911.06379 [pdf, other]

Solving Inverse Problems by Joint Posterior Maximization with a VAE Prior

Authors: Mario González, Andrés Almansa, Mauricio Delbracio, Pablo Musé, Pauline Tan

Abstract: In this paper we address the problem of solving ill-posed inverse problems in imaging where the prior is a neural generative model. Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms,… ▽ More In this paper we address the problem of solving ill-posed inverse problems in imaging where the prior is a neural generative model. Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique is called JPMAP because it performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1911.05464 [pdf, other]

doi 10.1049/PBPC035G_ch5

Mining urban lifestyles: urban computing, human behavior and recommender systems

Authors: Sharon Xu, Riccardo Di Clemente, Marta C. González

Abstract: In the last decade, the digital age has sharply redefined the way we study human behavior. With the advancement of data storage and sensing technologies, electronic records now encompass a diverse spectrum of human activity, ranging from location data, phone and email communication to Twitter activity and open-source contributions on Wikipedia and OpenStreetMap. In particular, the study of the sho… ▽ More In the last decade, the digital age has sharply redefined the way we study human behavior. With the advancement of data storage and sensing technologies, electronic records now encompass a diverse spectrum of human activity, ranging from location data, phone and email communication to Twitter activity and open-source contributions on Wikipedia and OpenStreetMap. In particular, the study of the shop** and mobility patterns of individual consumers has the potential to give deeper insight into the lifestyles and infrastructure of the region. Credit card records (CCRs) provide detailed insight into purchase behavior and have been found to have inherent regularity in consumer shop** patterns; call detail records (CDRs) present new opportunities to understand human mobility, analyze wealth, and model social network dynamics. In this chapter, we jointly model the lifestyles of individuals, a more challenging problem with higher variability when compared to the aggregated behavior of city regions. Using collective matrix factorization, we propose a unified dual view of lifestyles. Understanding these lifestyles will not only inform commercial opportunities, but also help policymakers and nonprofit organizations understand the characteristics and needs of the entire region, as well as of the individuals within that region. The applications of this range from targeted advertisements and promotions to the diffusion of digital financial services among low-income groups. △ Less

Submitted 4 November, 2019; originally announced November 2019.

Comments: 8 pages, 4 figures

Journal ref: Big Data Recommender Systems - Volume 2: Application Paradigms, Chapter 5 Mining urban lifestyles: urban computing, human behavior and recommender systems, pp. 71-81, (Institution of Engineering and Technology 2019)

arXiv:1907.12797 [pdf, other]

Comparing partitions through the Matching Error

Authors: Mathias Bourel, Badih Ghattas, Meliza González

Abstract: With the aim to propose a non parametric hypothesis test, this paper carries out a study on the Matching Error (ME), a comparison index of two partitions obtained from the same data set, using for example two clustering methods. This index is related to the misclassifica-tion error in supervised learning. Some properties of the ME and, especially, its distribution function for the case of two inde… ▽ More With the aim to propose a non parametric hypothesis test, this paper carries out a study on the Matching Error (ME), a comparison index of two partitions obtained from the same data set, using for example two clustering methods. This index is related to the misclassifica-tion error in supervised learning. Some properties of the ME and, especially, its distribution function for the case of two independent partitions are analyzed. Extensive simulations show the efficiency of the ME and we propose a hypothesis test based on it. △ Less

Submitted 30 July, 2019; originally announced July 2019.

arXiv:1907.09543 [pdf, other]

Spatial sensitivity analysis for urban land use prediction with physics-constrained conditional generative adversarial networks

Authors: Adrian Albert, Jasleen Kaur, Emanuele Strano, Marta Gonzalez

Abstract: Accurately forecasting urban development and its environmental and climate impacts critically depends on realistic models of the spatial structure of the built environment, and of its dependence on key factors such as population and economic development. Scenario simulation and sensitivity analysis, i.e., predicting how changes in underlying factors at a given location affect urbanization outcomes… ▽ More Accurately forecasting urban development and its environmental and climate impacts critically depends on realistic models of the spatial structure of the built environment, and of its dependence on key factors such as population and economic development. Scenario simulation and sensitivity analysis, i.e., predicting how changes in underlying factors at a given location affect urbanization outcomes at other locations, is currently not achievable at a large scale with traditional urban growth models, which are either too simplistic, or depend on detailed locally-collected socioeconomic data that is not available in most places. Here we develop a framework to estimate, purely from globally-available remote-sensing data and without parametric assumptions, the spatial sensitivity of the (\textit{static}) rate of change of urban sprawl to key macroeconomic development indicators. We formulate this spatial regression problem as an image-to-image translation task using conditional generative adversarial networks (GANs), where the gradients necessary for comparative static analysis are provided by the backpropagation algorithm used to train the model. This framework allows to naturally incorporate physical constraints, e.g., the inability to build over water bodies. To validate the spatial structure of model-generated built environment distributions, we use spatial statistics commonly used in urban form analysis. We apply our method to a novel dataset comprising of layers on the built environment, nightlighs measurements (a proxy for economic development and energy use), and population density for the world's most populous 15,000 cities. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: 8 pages

arXiv:1803.04235 [pdf, ps, other]

Approximate Bayesian Computation in controlled branching processes: the role of summary statistics

Authors: M. González, R. Martínez, C. Minuesa, I. del Puerto

Abstract: Controlled branching processes are stochastic growth population models in which the number of individuals with reproductive capacity in each generation is controlled by a random control function. The purpose of this work is to examine the Approximate Bayesian Computation (ABC) methods and to propose appropriate summary statistics for them in the context of these processes. This methodology enables… ▽ More Controlled branching processes are stochastic growth population models in which the number of individuals with reproductive capacity in each generation is controlled by a random control function. The purpose of this work is to examine the Approximate Bayesian Computation (ABC) methods and to propose appropriate summary statistics for them in the context of these processes. This methodology enables to approximate the posterior distribution of the parameters of interest satisfactorily without explicit likelihood calculations and under a minimal set of assumptions. In particular, the tolerance rejection algorithm, the sequential Monte Carlo ABC algorithm, and a post-sampling correction method based on local-linear regression are provided. The accuracy of the proposed methods are illustrated and compared with a "likelihood free" Markov chain Monte Carlo technique by the way of a simulated example developed with the statistical software R. △ Less

Submitted 1 July, 2019; v1 submitted 12 March, 2018; originally announced March 2018.

arXiv:1802.05917 [pdf, ps, other]

Robust estimation in controlled branching processes: Bayesian estimators via disparities

Authors: M. González, C. Minuesa, I. del Puerto, A. N. Vidyashankar

Abstract: This paper is concerned with Bayesian inferential methods for data from controlled branching processes that account for model robustness through the use of disparities. Under regularity conditions, we establish that estimators built on disparity-based posterior, such as expectation and maximum a posteriori estimates, are consistent and efficient under the posited model. Additionally, we show that… ▽ More This paper is concerned with Bayesian inferential methods for data from controlled branching processes that account for model robustness through the use of disparities. Under regularity conditions, we establish that estimators built on disparity-based posterior, such as expectation and maximum a posteriori estimates, are consistent and efficient under the posited model. Additionally, we show that the estimates are robust to model misspecification and presence of aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity estimators to Bayesian estimators built on the disparity-based posterior, for dependent tree-structured data. We illustrate the methodology through a simulated example and apply our methods to a real data set from cell kinetics. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: Paper and suplementary material

arXiv:1801.09064 [pdf, ps, other]

Bayesian inference in Y-linked two-sex branching processes with mutations: ABC approach

Authors: Miguel González, Rodrigo Martínez, Cristina Gutiérrez

Abstract: A Y-linked two-sex branching process with mutations and blind choice of males is a suitable model for analyzing the evolution of the number of carriers of an allele and its mutations of a Y-linked gene. Considering a two-sex monogamous population, in this model each female chooses her partner from among the male population without caring about his type (i.e., the allele he carries). In this work,… ▽ More A Y-linked two-sex branching process with mutations and blind choice of males is a suitable model for analyzing the evolution of the number of carriers of an allele and its mutations of a Y-linked gene. Considering a two-sex monogamous population, in this model each female chooses her partner from among the male population without caring about his type (i.e., the allele he carries). In this work, we deal with the problem of estimating the main parameters of such model develo** the Bayesian inference in a parametric framework. Firstly, we consider, as sample scheme, the observation of the total number of females and males up to some generation as well as the number of males of each genotype at last generation. Later, we introduce the information of the mutated males only in the last generation obtaining in this way a second sample scheme. For both samples, we apply the Approximate Bayesian Computation (ABC) methodology to approximate the posterior distributions of the main parameters of this model. The accuracy of the procedure based on these samples is illustrated and discussed by way of simulated examples. △ Less

Submitted 27 January, 2018; originally announced January 2018.

arXiv:1703.00409 [pdf, other]

doi 10.1038/s41467-018-05690-8

Sequences of purchases in credit card data reveal life styles in urban populations

Authors: Riccardo Di Clemente, Miguel Luengo-Oroz, Matias Travizano, Sharon Xu, Bapu Vaitla, Marta C. González

Abstract: Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics and social sciences. In human activities, Zipf-laws describe for example the frequency of words appearance in a text or the purchases types in shop** patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work,… ▽ More Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics and social sciences. In human activities, Zipf-laws describe for example the frequency of words appearance in a text or the purchases types in shop** patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work, we define a framework using a text compression technique on the sequences of credit card purchases to detect ubiquitous patterns of collective behavior. Clustering the consumers by their similarity in purchases sequences, we detect five consumer groups. Remarkably, post checking, individuals in each group are also similar in their age, total expenditure, gender, and the diversity of their social and mobility networks extracted by their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, this method uncovers sets of significant sequences that reveal insights on collective human behavior. △ Less

Submitted 6 August, 2018; v1 submitted 1 March, 2017; originally announced March 2017.

Comments: 30 pages, 26 figures

Journal ref: Nature Communications 9:3330 (2018)

arXiv:1612.02460 [pdf, other]

Demographical Priors for Health Conditions Diagnosis Using Medicare Data

Authors: Fahad Alhasoun, May Alhazzani, Marta C. González

Abstract: This paper presents an example of how demographical characteristics of patients influence their susceptibility to certain medical conditions. In this paper, we investigate the association of health conditions to age of patients in a heterogeneous population. We show that besides the symptoms a patients is having, the age has the potential of aiding the diagnostic process in hospitals. Working with… ▽ More This paper presents an example of how demographical characteristics of patients influence their susceptibility to certain medical conditions. In this paper, we investigate the association of health conditions to age of patients in a heterogeneous population. We show that besides the symptoms a patients is having, the age has the potential of aiding the diagnostic process in hospitals. Working with Electronic Health Records (EHR), we show that medical conditions group into clusters that share distinctive population age densities. We use Electronic Health Records from Brazil for a period of 15 months from March of 2013 to July of 2014. The number of patients in the data is 1.7 million patients and the number of records is 47 million records. The findings has the potential of hel** in a setting where an automated system undergoes the task of predicting the condition of a patient given their symptoms and demographical information. △ Less

Submitted 9 January, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: NIPS 2016 Workshop on Machine Learning for Health

arXiv:1511.06400 [pdf, ps, other]

Minimum disparity estimation in controlled branching processes

Authors: Miguel Gonzalez, Carmen Minuesa, Ines del Puerto

Abstract: Minimum disparity estimation in controlled branching processes is dealt with by assuming that the offspring law belongs to a general parametric family. Under some regularity conditions it is proved that the minimum disparity estimators proposed -based on the nonparametric maximum likelihood estimator of the offspring law when the entire family tree is observed- are consistent and asymptotic normal… ▽ More Minimum disparity estimation in controlled branching processes is dealt with by assuming that the offspring law belongs to a general parametric family. Under some regularity conditions it is proved that the minimum disparity estimators proposed -based on the nonparametric maximum likelihood estimator of the offspring law when the entire family tree is observed- are consistent and asymptotic normally distributed. Moreover, it is discussed the robustness of the estimators proposed. Through a simulated example, focussing on the minimum Hellinger and negative exponential disparity estimators, it is shown that both are robust against outliers, being the negative exponential one also robust against inliers. △ Less

Submitted 19 November, 2015; originally announced November 2015.

arXiv:1509.04069 [pdf, ps, other]

doi 10.1214/15-AOAS818

Spatial Bayesian variable selection and grou** for high-dimensional scalar-on-image regression

Authors: Fan Li, Tingting Zhang, Quanli Wang, Marlen Z. Gonzalez, Erin L. Maresh, James A. Coan

Abstract: Multi-subject functional magnetic resonance imaging (fMRI) data has been increasingly used to study the population-wide relationship between human brain activity and individual biological or behavioral traits. A common method is to regress the scalar individual response on imaging predictors, known as a scalar-on-image (SI) regression. Analysis and computation of such massive and noisy data with c… ▽ More Multi-subject functional magnetic resonance imaging (fMRI) data has been increasingly used to study the population-wide relationship between human brain activity and individual biological or behavioral traits. A common method is to regress the scalar individual response on imaging predictors, known as a scalar-on-image (SI) regression. Analysis and computation of such massive and noisy data with complex spatio-temporal correlation structure is challenging. In this article, motivated by a psychological study on human affective feelings using fMRI, we propose a joint Ising and Dirichlet Process (Ising-DP) prior within the framework of Bayesian stochastic search variable selection for selecting brain voxels in high-dimensional SI regressions. The Ising component of the prior makes use of the spatial information between voxels, and the DP component groups the coefficients of the large number of voxels to a small set of values and thus greatly reduces the posterior computational burden. To address the phase transition phenomenon of the Ising prior, we propose a new analytic approach to derive bounds for the hyperparameters, illustrated on 2- and 3-dimensional lattices. The proposed method is compared with several alternative methods via simulations, and is applied to the fMRI data collected from the KLIFF hand-holding experiment. △ Less

Submitted 14 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS818 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS818

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 687-713

arXiv:1407.5341 [pdf, other]

Maximum likelihood estimation and Expectation-Maximization algorithm for controlled branching processes

Authors: M. Gonzalez, C. Minuesa, I. del Puerto

Abstract: The controlled branching process is a generalization of the classical Bienaymé-Galton-Watson branching process. It is a useful model for describing the evolution of populations in which the population size at each generation needs to be controlled. The maximum likelihood estimation of the parameters of interest for this process is addressed under various sample schemes. Firstly, assuming that the… ▽ More The controlled branching process is a generalization of the classical Bienaymé-Galton-Watson branching process. It is a useful model for describing the evolution of populations in which the population size at each generation needs to be controlled. The maximum likelihood estimation of the parameters of interest for this process is addressed under various sample schemes. Firstly, assuming that the entire family tree can be observed, the corresponding estimators are obtained and their asymptotic properties investigated. Secondly, since in practice it is not usual to observe such a sample, the maximum likelihood estimation is initially considered using the sample given by the total number of individuals and progenitors of each generation, and then using the sample given by only the generation sizes. Expectation-maximization algorithms are developed to address these problems as incomplete data estimation problems. The accuracy of the procedures is illustrated by means of a simulated example. △ Less

Submitted 5 February, 2015; v1 submitted 20 July, 2014; originally announced July 2014.

MSC Class: 60J80; 62M05

arXiv:1207.1115 [pdf, other]

Inferring land use from mobile phone activity

Authors: Jameson L. Toole, Michael Ulm, Dietmar Bauer, Marta C. Gonzalez

Abstract: Understanding the spatiotemporal distribution of people within a city is crucial to many planning applications. Obtaining data to create required knowledge, currently involves costly survey methods. At the same time ubiquitous mobile sensors from personal GPS devices to mobile phones are collecting massive amounts of data on urban systems. The locations, communications, and activities of millions… ▽ More Understanding the spatiotemporal distribution of people within a city is crucial to many planning applications. Obtaining data to create required knowledge, currently involves costly survey methods. At the same time ubiquitous mobile sensors from personal GPS devices to mobile phones are collecting massive amounts of data on urban systems. The locations, communications, and activities of millions of people are recorded and stored by new information technologies. This work utilizes novel dynamic data, generated by mobile phone users, to measure spatiotemporal changes in population. In the process, we identify the relationship between land use and dynamic population over the course of a typical week. A machine learning classification algorithm is used to identify clusters of locations with similar zoned uses and mobile phone activity patterns. It is shown that the mobile phone data is capable of delivering useful information on actual land use that supplements zoning regulations. △ Less

Submitted 3 July, 2012; originally announced July 2012.

Comments: To be presented at ACM UrbComp2012

ACM Class: H.2.8

Showing 1–20 of 20 results for author: Gonzalez, M