Search | arXiv e-print repository

More than Formulas -- Integrity, Communication, Computing and Reproducibility in Statistics Education

Authors: Eva Furrer, Annina Cincera, Reinhard Furrer

Abstract: This paper introduces a novel course design in the Master Program in Biostatistics at the University of Zurich that integrates computing skills, effective communication, reproducibility, and scientific integrity within one course. Utilizing a flipped classroom model, the course aims to equip students with the necessary competencies to handle real-world data analysis challenges and effective statis… ▽ More This paper introduces a novel course design in the Master Program in Biostatistics at the University of Zurich that integrates computing skills, effective communication, reproducibility, and scientific integrity within one course. Utilizing a flipped classroom model, the course aims to equip students with the necessary competencies to handle real-world data analysis challenges and effective statistical practice in general. The curriculum includes practical tools such as version control with Git, dynamic reporting, unit testing and containerization to foster reproducibility, and integrity in statistical practice. Feedback gathered from both staff and students post-implementation indicates that the course significantly enhances student readiness for professional and academic environments, demonstrating the effectiveness of this educational approach. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2405.14492 [pdf, other]

Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data

Authors: Tim Gyger, Reinhard Furrer, Fabio Sigrist

Abstract: Gaussian processes are flexible probabilistic regression models which are widely used in statistics and machine learning. However, a drawback is their limited scalability to large data sets. To alleviate this, we consider full-scale approximations (FSAs) that combine predictive process methods and covariance tapering, thus approximating both global and local structures. We show how iterative metho… ▽ More Gaussian processes are flexible probabilistic regression models which are widely used in statistics and machine learning. However, a drawback is their limited scalability to large data sets. To alleviate this, we consider full-scale approximations (FSAs) that combine predictive process methods and covariance tapering, thus approximating both global and local structures. We show how iterative methods can be used to reduce the computational costs for calculating likelihoods, gradients, and predictive distributions with FSAs. We introduce a novel preconditioner and show that it accelerates the conjugate gradient method's convergence speed and mitigates its sensitivity with respect to the FSA parameters and the eigenvalue structure of the original covariance matrix, and we demonstrate empirically that it outperforms a state-of-the-art pivoted Cholesky preconditioner. Further, we present a novel, accurate, and fast way to calculate predictive variances relying on stochastic estimations and iterative methods. In both simulated and real-world data experiments, we find that our proposed methodology achieves the same accuracy as Cholesky-based computations with a substantial reduction in computational time. Finally, we also compare different approaches for determining inducing points in predictive process and FSA models. All methods are implemented in a free C++ software library with high-level Python and R packages. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2401.04618 [pdf]

AFM-IR of EHD-Printed PbS Quantum Dots: Quantifying Ligand Exchange at the Nanoscale

Authors: Lorenzo J. A. Ferraresi, Gökhan Kara, Nancy A. Burnham, Roman Furrer, Dmitry N. Dirin, Fabio La Mattina, Maksym V. Kovalenko, Michel Calame, Ivan Shorubalko

Abstract: Colloidal quantum dots (cQDs) recently emerged as building blocks for semiconductor materials with tuneable properties. Electro-hydrodynamic printing can be used to obtain sub-micrometre patterns of cQDs without elaborate and aggressive photolithography steps. Post-deposition ligand exchange is necessary for the introduction of new functionalities into cQD solids. However, achieving a complete bul… ▽ More Colloidal quantum dots (cQDs) recently emerged as building blocks for semiconductor materials with tuneable properties. Electro-hydrodynamic printing can be used to obtain sub-micrometre patterns of cQDs without elaborate and aggressive photolithography steps. Post-deposition ligand exchange is necessary for the introduction of new functionalities into cQD solids. However, achieving a complete bulk exchange is challenging and conventional infrared spectroscopy lacks the required spatial resolution. Infrared nanospectroscopy (AFM-IR) enables quantitative analysis of the evolution of vibrational signals and structural topography on the nano-metre scale upon ligand substitution on lead sulphide (PbS) cQDs. A solution of ethane-dithiol in acetonitrile demonstrated rapid (~60 s) and controllable exchange of approximately 90% of the ligands, encompassing structures up to ~800 nm in thickness. Prolonged exposures (>1 h) led to the degradation of the microstructures, with a systematic removal of cQDs regulated by surface-to-bulk ratios and solvent interactions. This study establishes a method for the development of devices through a combination of tuneable photoactive materials, additive manufacturing of microstructures, and their quantitative nanometre-scale analysis. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 22 pages and 13 figures

MSC Class: -

arXiv:2312.05083 [pdf, other]

doi 10.1021/acsphotonics.3c01759

Scaling of Hybrid QDs-Graphene Photodetectors to Subwavelength Dimension

Authors: Gökhan Kara, Patrik Rohner, Erfu Wu, Dmitry N. Dirin, Roman Furrer, Dimos Poulikakos, Maksym V. Kovalenko, Michel Calame, Ivan Shorubalko

Abstract: Emerging colloidal quantum dot (cQD) photodetectors currently challenge established state-of-the-art infrared photodetectors in response speed, spectral tunability, simplicity of solution processable fabrication, and integration onto curved or flexible substrates. Hybrid phototransistors based on 2D materials and cQDs, in particular, are promising due to their inherent photogain enabling direct ph… ▽ More Emerging colloidal quantum dot (cQD) photodetectors currently challenge established state-of-the-art infrared photodetectors in response speed, spectral tunability, simplicity of solution processable fabrication, and integration onto curved or flexible substrates. Hybrid phototransistors based on 2D materials and cQDs, in particular, are promising due to their inherent photogain enabling direct photosignal enhancement. The photogain is sensitive to both, measurement conditions and photodetector geometry. This makes the cross-comparison of devices reported in the literature rather involved. Here, the effect of device length L and width W scaling to subwavelength dimensions (sizes down to 500 nm) on the photoresponse of graphene-PbS cQD phototransistors was experimentally investigated. Photogain and responsivity were found to scale with 1/LW, whereas the photocurrent and specific detectivity were independent of geometrical parameters. The measurements were performed at scaled bias voltage conditions for comparable currents. Contact effects were found to limit the photoresponse for devices with L < 3 μm. The relation of gate voltage, bias current, light intensity, and frequency on the photoresponse was investigated in detail, and a photogating efficiency to assess the cQD-graphene interface is presented. In particular, the specific detectivity values in the range between 10^8 to 10^9 Jones (wavelength of 1550 nm, frequency 6 Hz, room temperature) were found to be limited by the charge transfer across the photoactive interface. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 16 pages and 4 figures

arXiv:2306.06829 [pdf, ps, other]

Compatibility of Space-Time Kernels with Full, Dynamical, or Compact Support

Authors: Tarik Faouzi, Reinhard Furrer, Emilio Porcu

Abstract: We deal with the comparison of space-time covariance kernels having, either, full, spatially dynamical, or space-time compact support. Such a comparison is based on compatibility of these covariance models under fixed domain asymptotics, having a theoretical background that is substantially coming from equivalence or orthogonality of Gaussian measures. In turn, such a theory is intimately related… ▽ More We deal with the comparison of space-time covariance kernels having, either, full, spatially dynamical, or space-time compact support. Such a comparison is based on compatibility of these covariance models under fixed domain asymptotics, having a theoretical background that is substantially coming from equivalence or orthogonality of Gaussian measures. In turn, such a theory is intimately related to the tails of the spectral densities associated with the three models. Models with space-time compact support are still elusive. We taper the temporal part of a model with dynamical support, obtaining a space-time compact support. The spectrum related to such a construction is obtained through temporal convolution of the spatially dynamical spectrum with the spectrum associated with the temporal taper. The solution of such a challenge opens the door to the compatibility-based comparison. Our findings show that indeed these three models can be compatible under some suitable parametric restrictions. As a corollary, we deduce implications in terms of maximum likelihood estimation and misspecified kriging prediction under fixed domain asymptotics. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: 28 pages

arXiv:2304.10292 [pdf]

doi 10.1039/D2NR06682C

Nanoscale electronic transport at graphene/pentacene van der Waals interface

Authors: Michel Daher Mansour, Jacopo Oswald, Davide Beretta, Michael Stiefe, Roman Furrer, Michel Calame, Dominique Vuillaume

Abstract: We report a study on the relationship between structure and electron transport properties of nanoscale graphene/pentacene interfaces. We fabricated graphene/pentacene interfaces from 10-30 nm thick needle-like pentacene nanostructures down to two-three layers (2L-3L) dendritic pentacene islands, and we measured their electron transport properties by conductive atomic force microscopy (C-AFM). The… ▽ More We report a study on the relationship between structure and electron transport properties of nanoscale graphene/pentacene interfaces. We fabricated graphene/pentacene interfaces from 10-30 nm thick needle-like pentacene nanostructures down to two-three layers (2L-3L) dendritic pentacene islands, and we measured their electron transport properties by conductive atomic force microscopy (C-AFM). The energy barrier at the interfaces, i.e. the energy position of the pentacene highest occupied molecular orbital (HOMO) with respect to the Fermi energy of the graphene and the C-AFM metal tip, are determined and discussed with the appropriate electron transport model (double Schottky diode model and Landauer-Buttiker model, respectively) taking into account the voltage-dependent charge do** of graphene. In both types of samples, the energy barrier at the graphene/pentacene interface is slightly larger than that at the pentacene/metal tip interface, resulting in 0.47-0.55 eV and 0.21-0.34 eV, respectively, for the 10-30 nm thick needle-like pentacene islands, and in 0.92-1.44 eV and 0.67-1.05 eV, respectively, for the 2L-3L thick dendritic pentacene nanostructures. We attribute this difference to the molecular organization details of the pentacene/graphene heterostructures, with pentacene molecules lying flat on the graphene in the needle-like pentacene nansotructures, while standing upright in 2L-3L dendritic islands, as observed from Raman spectroscopy. △ Less

Submitted 4 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: Paper and its supplementary information

Journal ref: Nanoscale, 2023

arXiv:2212.10239 [pdf, ps, other]

doi 10.1016/j.spa.2024.104356

On the orthogonality of zero-mean Gaussian measures: Sufficiently dense sampling

Authors: Reinhard Furrer, Michael Hediger

Abstract: For a stationary random function $ξ$, sampled on a subset $D$ of $\mathbb{R}^{d}$, we examine the equivalence and orthogonality of two zero-mean Gaussian measures $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ associated with $ξ$. We give the isotropic analog to the result that the equivalence of $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ is linked with the existence of a square-integrable extension of the diff… ▽ More For a stationary random function $ξ$, sampled on a subset $D$ of $\mathbb{R}^{d}$, we examine the equivalence and orthogonality of two zero-mean Gaussian measures $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ associated with $ξ$. We give the isotropic analog to the result that the equivalence of $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ is linked with the existence of a square-integrable extension of the difference between the covariance functions of $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ from $D$ to $\mathbb{R}^{d}$. We show that the orthogonality of $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ can be recovered when the set of distances from points of $D$ to the origin is dense in the set of non-negative real numbers. △ Less

Submitted 21 April, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

MSC Class: 60G10; 60G15; 60G17; 60G30; 60G60

arXiv:2211.13190 [pdf, other]

BiasBed -- Rigorous Texture Bias Evaluation

Authors: Nikolai Kalischek, Rodrigo C. Daudt, Torben Peters, Reinhard Furrer, Jan D. Wegner, Konrad Schindler

Abstract: The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limit… ▽ More The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. In particular, we also show that proper evaluation and meaningful comparisons between methods are not trivial. We introduce BiasBed, a testbed for texture- and style-biased training, including multiple datasets and a range of existing algorithms. It comes with an extensive evaluation protocol that includes rigorous hypothesis testing to gauge the significance of the results, despite the considerable training instability of some style bias methods. Our extensive experiments, shed new light on the need for careful, statistically founded evaluation protocols for style bias (and beyond). E.g., we find that some algorithms proposed in the literature do not significantly mitigate the impact of style bias at all. With the release of BiasBed, we hope to foster a common understanding of consistent and meaningful comparisons, and consequently faster progress towards learning methods free of texture bias. Code is available at https://github.com/D1noFuzi/BiasBed △ Less

Submitted 24 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.03366 [pdf, other]

Tunable quantum dots from atomically precise graphene nanoribbons using a multi-gate architecture

Authors: Jian Zhang, Oliver Braun, Gabriela Borin Barin, Sara Sangtarash, Jan Overbeck, Rimah Darawish, Michael Stiefel, Roman Furrer, Antonis Olziersky, Klaus Müllen, Ivan Shorubalko, Abdalghani H. S. Daaoub, Pascal Ruffieux, Roman Fasel, Hatef Sadeghi, Mickael L. Perrin, Michel Calame

Abstract: Atomically precise graphene nanoribbons (GNRs) are increasingly attracting interest due to their largely modifiable electronic properties, which can be tailored by controlling their width and edge structure during chemical synthesis. In recent years, the exploitation of GNR properties for electronic devices has focused on GNR integration into field-effect-transistor (FET) geometries. However, such… ▽ More Atomically precise graphene nanoribbons (GNRs) are increasingly attracting interest due to their largely modifiable electronic properties, which can be tailored by controlling their width and edge structure during chemical synthesis. In recent years, the exploitation of GNR properties for electronic devices has focused on GNR integration into field-effect-transistor (FET) geometries. However, such FET devices have limited electrostatic tunability due to the presence of a single gate. Here, we report on the device integration of 9-atom wide armchair graphene nanoribbons (9-AGNRs) into a multi-gate FET geometry, consisting of an ultra-narrow finger gate and two side gates. We use high-resolution electron-beam lithography (EBL) for defining finger gates as narrow as 12 nm and combine them with graphene electrodes for contacting the GNRs. Low-temperature transport spectroscopy measurements reveal quantum dot (QD) behavior with rich Coulomb diamond patterns, suggesting that the GNRs form QDs that are connected both in series and in parallel. Moreover, we show that the additional gates enable differential tuning of the QDs in the nanojunction, providing the first step towards multi-gate control of GNR-based multi-dot systems. △ Less

Submitted 27 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2203.03322 [pdf, other]

Dominant-feature identification in data from Gaussian processes applied to Finnish forest inventory records

Authors: Roman Flury, Tuomas Aakala, Leena Ruha, Timo Kuuluvainen, Reinhard Furrer

Abstract: In spatial data, location-dependent variation leads to connected structures known as features. Variations occur at different spatial scales and possibly originate from distinct underlying processes. Each of these scales is characterized by its own dominant features. Here we introduce a statistical method for identifying these scales and their dominant features in data from Gaussian processes. This… ▽ More In spatial data, location-dependent variation leads to connected structures known as features. Variations occur at different spatial scales and possibly originate from distinct underlying processes. Each of these scales is characterized by its own dominant features. Here we introduce a statistical method for identifying these scales and their dominant features in data from Gaussian processes. This identification involves credibly recognizing the dominant features by scale-space decomposition and assessing feature attributes by estimating covariance function parameters of the underlying processes and their associations to potential drivers. We analyze Finnish forest inventory data from the 1920s using this dominant-feature identification method and identify the scales of variation in basal area estimates of most common Finnish trees, including Scots pine, Norway spruce, birch, and other native deciduous trees. Comparing the resulting scale-dependent features and their attributes in these tree species, we identify the different effects of edaphic and anthropogenic drivers on the spatial distribution of their basal areas. These data are analyzed for the first time in terms of their scale of variation, and the resulting scale-dependent maps and estimates are an essential contribution to the historical forest ecology of Fennoscandia. Until now, this analysis was not possible with conventional methods. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 28 pages, 9 figures

arXiv:2112.12317 [pdf, other]

doi 10.1214/23-EJS2170

Asymptotic analysis of ML-covariance parameter estimators based on covariance approximations

Authors: Reinhard Furrer, Michael Hediger

Abstract: Given a zero-mean Gaussian random field with a covariance function that belongs to a parametric family of covariance functions, we introduce a new notion of likelihood approximations, termed truncated-likelihood functions. Truncated-likelihood functions are based on direct functional approximations of the presumed family of covariance functions. For compactly supported covariance functions, within… ▽ More Given a zero-mean Gaussian random field with a covariance function that belongs to a parametric family of covariance functions, we introduce a new notion of likelihood approximations, termed truncated-likelihood functions. Truncated-likelihood functions are based on direct functional approximations of the presumed family of covariance functions. For compactly supported covariance functions, within an increasing-domain asymptotic framework, we provide sufficient conditions under which consistency and asymptotic normality of estimators based on truncated-likelihood functions are preserved. We apply our result to the family of generalized Wendland covariance functions and discuss several examples of Wendland approximations. For families of covariance functions that are not compactly supported, we combine our results with the covariance tapering approach and show that ML estimators, based on truncated-tapered likelihood functions, asymptotically minimize the Kullback-Leibler divergence, when the taper range is fixed. △ Less

Submitted 15 November, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 39 pages, 1 Figure

MSC Class: 60G15; 62F12 (Primary) 41A99 (Secondary)

arXiv:2106.10462 [pdf, other]

doi 10.1007/s13253-021-00461-3

Discussion on Competition for Spatial Statistics for Large Datasets

Authors: Roman Flury, Reinhard Furrer

Abstract: We discuss the experiences and results of the AppStatUZH team's participation in the comprehensive and unbiased comparison of different spatial approximations conducted in the Competition for Spatial Statistics for Large Datasets. In each of the different sub-competitions, we estimated parameters of the covariance model based on a likelihood function and predicted missing observations with simple… ▽ More We discuss the experiences and results of the AppStatUZH team's participation in the comprehensive and unbiased comparison of different spatial approximations conducted in the Competition for Spatial Statistics for Large Datasets. In each of the different sub-competitions, we estimated parameters of the covariance model based on a likelihood function and predicted missing observations with simple kriging. We approximated the covariance model either with covariance tapering or a compactly supported Wendland covariance function. △ Less

Submitted 19 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 figure

arXiv:2106.02364 [pdf, other]

varycoef: An R Package for Gaussian Process-based Spatially Varying Coefficient Models

Authors: Jakob A. Dambon, Fabio Sigrist, Reinhard Furrer

Abstract: Gaussian processes (GPs) are well-known tools for modeling dependent data with applications in spatial statistics, time series analysis, or econometrics. In this article, we present the R package varycoef that implements estimation, prediction, and variable selection of linear models with spatially varying coefficients (SVC) defined by GPs, so called GP-based SVC models. Such models offer a high d… ▽ More Gaussian processes (GPs) are well-known tools for modeling dependent data with applications in spatial statistics, time series analysis, or econometrics. In this article, we present the R package varycoef that implements estimation, prediction, and variable selection of linear models with spatially varying coefficients (SVC) defined by GPs, so called GP-based SVC models. Such models offer a high degree of flexibility while being relatively easy to interpret. Using varycoef, we show versatile applications of (spatially) varying coefficient models on spatial and time series data. This includes model and coefficient estimation with predictions and variable selection. The package uses state-of-the-art computational statistics techniques like parallelization, model-based optimization, and covariance tapering. This allows the user to work with (S)VC models in a computationally efficient manner, i.e., model estimation on large data sets is possible in a feasible amount of time. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2102.13033 [pdf]

Optimized Graphene Electrodes for contacting Graphene Nanoribbons

Authors: Oliver Braun, Jan Overbeck, Maria El Abbassi, Silvan Käser, Roman Furrer, Antonis Olziersky, Alexander Flasby, Gabriela Borin Barin, Rimah Darawish, Klaus Müllen, Pascal Ruffieux, Roman Fasel, Ivan Shorubalko, Mickael L. Perrin, Michel Calame

Abstract: Atomically precise graphene nanoribbons are a promising emerging class of designer quantum materials with electronic properties that are tunable by chemical design. However, many challenges remain in the device integration of these materials, especially regarding contacting strategies. We report on the device integration of uniaxially aligned and non-aligned 9-atom wide armchair graphene nanoribbo… ▽ More Atomically precise graphene nanoribbons are a promising emerging class of designer quantum materials with electronic properties that are tunable by chemical design. However, many challenges remain in the device integration of these materials, especially regarding contacting strategies. We report on the device integration of uniaxially aligned and non-aligned 9-atom wide armchair graphene nanoribbons (9-AGNRs) in a field-effect transistor geometry using electron beam lithography-defined graphene electrodes. This approach yields controlled electrode geometries and enables higher fabrication throughput compared to previous approaches using an electrical breakdown technique. Thermal annealing is found to be a crucial step for successful device operation resulting in electronic transport characteristics showing a strong gate dependence. Raman spectroscopy confirms the integrity of the graphene electrodes after patterning and of the GNRs after device integration. Our results demonstrate the importance of the GNR-graphene electrode interface and pave the way for GNR device integration with structurally well-defined electrodes. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2101.12238 [pdf, other]

Spatially map** the thermal conductivity of graphene by an opto-thermal method

Authors: Oliver Braun, Roman Furrer, Pascal Butti, Kishan R. Thodkar, Ivan Shorubalko, Ilaria Zardo, Michel Calame, Mickael L. Perrin

Abstract: Map** the thermal transport properties of materials at the nanoscale is of critical importance for optimizing heat conduction in nanoscale devices. Several methods to determine the thermal conductivity of materials have been developed, most of them yielding an average value across the sample, thereby disregarding the role of local variations. Here, we present a method for the spatially-resolved… ▽ More Map** the thermal transport properties of materials at the nanoscale is of critical importance for optimizing heat conduction in nanoscale devices. Several methods to determine the thermal conductivity of materials have been developed, most of them yielding an average value across the sample, thereby disregarding the role of local variations. Here, we present a method for the spatially-resolved assessment of the thermal conductivity of suspended graphene by using a combination of confocal Raman thermometry and a finite-element calculations-based fitting procedure. We demonstrate the working principle of our method by extracting the two-dimensional thermal conductivity map of one pristine suspended single-layer graphene sheet and one irradiated using helium ions. Our method paves the way for spatially resolving the thermal conductivity of other types of layered materials. This is particularly relevant for the design and engineering of nanoscale thermal circuits (e.g. thermal diodes). △ Less

Submitted 15 March, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

arXiv:2101.01932 [pdf, other]

Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models

Authors: Jakob A. Dambon, Fabio Sigrist, Reinhard Furrer

Abstract: Spatially varying coefficient (SVC) models are a type of regression model for spatial data where covariate effects vary over space. If there are several covariates, a natural question is which covariates have a spatially varying effect and which not. We present a new variable selection approach for Gaussian process-based SVC models. It relies on a penalized maximum likelihood estimation (PMLE) and… ▽ More Spatially varying coefficient (SVC) models are a type of regression model for spatial data where covariate effects vary over space. If there are several covariates, a natural question is which covariates have a spatially varying effect and which not. We present a new variable selection approach for Gaussian process-based SVC models. It relies on a penalized maximum likelihood estimation (PMLE) and allows variable selection both with respect to fixed effects and Gaussian process random effects. We validate our approach both in a simulation study as well as a real world data set. Our novel approach shows good selection performance in the simulation study. In the real data application, our proposed PMLE yields sparser SVC models and achieves a smaller information criterion than classical MLE. In a cross-validation applied on the real data, we show that sparser PML estimated SVC models are on par with ML estimated SVC models with respect to predictive performance. △ Less

Submitted 11 February, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

Comments: 26 pages including appendix. Containing 6 figures and 6 tables. Updated Declarations

arXiv:2010.00534 [pdf, other]

Bayesian spatial modelling of terrestrial radiation in Switzerland

Authors: Christophe L. Folly, Garyfallos Konstantinoudis, Antonella Mazzei-Abba, Christian Kreis, Benno Bucher, Reinhard Furrer, Ben D. Spycher

Abstract: The geographic variation of terrestrial radiation can be exploited in epidemiological studies of the health effects of protracted low-dose exposure. Various methods have been applied to derive maps of this variation. We aimed to construct a map of terrestrial radiation for Switzerland. We used airborne $γ$-spectrometry measurements to model the ambient dose rates from terrestrial radiation through… ▽ More The geographic variation of terrestrial radiation can be exploited in epidemiological studies of the health effects of protracted low-dose exposure. Various methods have been applied to derive maps of this variation. We aimed to construct a map of terrestrial radiation for Switzerland. We used airborne $γ$-spectrometry measurements to model the ambient dose rates from terrestrial radiation through a Bayesian mixed-effects model and conducted inference using Integrated Nested Laplace Approximation (INLA). We predicted higher levels of ambient dose rates in the alpine regions and Ticino compared with the western and northern parts of Switzerland. We provide a map that can be used for exposure assessment in epidemiological studies and as a baseline map for assessing potential contamination. △ Less

Submitted 1 October, 2020; originally announced October 2020.

Comments: 27 pages, 10 figures

arXiv:2007.14684 [pdf, other]

Asymptotically Equivalent Prediction in Multivariate Geostatistics

Authors: François Bachoc, Emilio Porcu, Moreno Bevilacqua, Reinhard Furrer, Tarik Faouzi

Abstract: Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. I… ▽ More Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. In particular, we deal with the problem of misspecified cokriging prediction within the framework of fixed domain asymptotics. Specifically, we provide conditions for equivalence of measures associated with multivariate Gaussian random fields, with index set in a compact set of a d-dimensional Euclidean space. Such conditions have been elusive for over about 50 years of spatial statistics. We then focus on the multivariate Matérn and Generalized Wendland classes of matrix valued covariance functions, that have been very popular for having parameters that are crucial to spatial interpolation, and that control the mean square differentiability of the associated Gaussian process. We provide sufficient conditions, for equivalence of Gaussian measures, relying on the covariance parameters of these two classes. This enables to identify the parameters that are crucial to asymptotically equivalent interpolation in multivariate geostatistics. Our findings are then illustrated through simulation studies. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2006.07183 [pdf, other]

doi 10.1016/j.spasta.2020.100483

Identification of Dominant Features in Spatial Data

Authors: Roman Flury, Florian Gerber, Bernhard Schmid, Reinhard Furrer

Abstract: Dominant features of spatial data are connected structures or patterns that emerge from location-based variation and manifest at specific scales or resolutions. To identify dominant features, we propose a sequential application of multiresolution decomposition and variogram function estimation. Multiresolution decomposition separates data into additive components, and in this way enables the recog… ▽ More Dominant features of spatial data are connected structures or patterns that emerge from location-based variation and manifest at specific scales or resolutions. To identify dominant features, we propose a sequential application of multiresolution decomposition and variogram function estimation. Multiresolution decomposition separates data into additive components, and in this way enables the recognition of their dominant features. A dedicated multiresolution decomposition method is developed for arbitrary gridded spatial data, where the underlying model includes a precision and spatial-weight matrix to capture spatial correlation. The data are separated into their components by smoothing on different scales, such that larger scales have longer spatial correlation ranges. Moreover, our model can handle missing values, which is often useful in applications. Variogram function estimation can be used to describe properties in spatial data. Such functions are therefore estimated for each component to determine its effective range, which assesses the width-extent of the dominant feature. Finally, Bayesian analysis enables the inference of identified dominant features and to judge whether these are credibly different. The efficient implementation of the method relies mainly on a sparse-matrix data structure and algorithms. By applying the method to simulated data we demonstrate its applicability and theoretical soundness. In disciplines that use spatial data, this method can lead to new insights, as we exemplify by identifying the dominant features in a forest dataset. In that application, the width-extents of the dominant features have an ecological interpretation, namely the species interaction range, and their estimates support the derivation of ecosystem properties such as biodiversity indices. △ Less

Submitted 18 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: 25 pages, 14 figures

arXiv:2005.14473 [pdf, other]

doi 10.6092/GRASPA19_pp86-89

Multiresolution Decomposition of Areal Count Data

Authors: Roman Flury, Reinhard Furrer

Abstract: Multiresolution decomposition is commonly understood as a procedure to capture scale-dependent features in random signals. Such methods were first established for image processing and typically rely on raster or regularly gridded data. In this article, we extend a particular multiresolution decomposition procedure to areal count data, i.e.~discrete irregularly gridded data. More specifically, we i… ▽ More Multiresolution decomposition is commonly understood as a procedure to capture scale-dependent features in random signals. Such methods were first established for image processing and typically rely on raster or regularly gridded data. In this article, we extend a particular multiresolution decomposition procedure to areal count data, i.e.~discrete irregularly gridded data. More specifically, we incorporate in a new model concept and distributions from the so-called Besag--York--Mollié model to include a priori demographical knowledge. These adaptions and subsequent changes in the computation schemes are carefully outlined below, whereas the main idea of the original multiresolution decomposition remains. Finally, we show the extension's feasibility by applying it to oral cavity cancer counts in Germany. △ Less

Submitted 29 May, 2020; originally announced May 2020.

Comments: 4 pages, 3 figures, GRASPA 2019 conference proceeding

Journal ref: Proceedings of the GRASPA 2019 Conference, Pescara, 15-16 July 2019

arXiv:2001.08089 [pdf, other]

doi 10.1016/j.spasta.2020.100470

Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction

Authors: Jakob A. Dambon, Fabio Sigrist, Reinhard Furrer

Abstract: In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian process-based spatially varying coefficient (SVC) model can be estimated using maximum likelihood estimation (MLE). In addition, we present an approach that scale… ▽ More In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian process-based spatially varying coefficient (SVC) model can be estimated using maximum likelihood estimation (MLE). In addition, we present an approach that scales to large data by applying covariance tapering. We compare our methodology to existing methods such as a Bayesian approach using the stochastic partial differential equation (SPDE) link, geographically weighted regression (GWR), and eigenvector spatial filtering (ESF) in both a simulation study and an application where the goal is to predict prices of real estate apartments in Switzerland. The results from both the simulation study and application show that the MLE approach results in increased predictive accuracy and more precise estimates. Since we use a model-based approach, we can also provide predictive variances. In contrast to existing model-based approaches, our method scales better to data where both the number of spatial points is large and the number of spatially varying covariates is moderately-sized, e.g., above ten. △ Less

Submitted 12 November, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

Comments: revision: 35 pages, 14 figures, typo in likelihood corrected, DOI added

arXiv:1911.11199 [pdf, other]

Asymptotic properties of the maximum likelihood and cross validation estimators for transformed Gaussian processes

Authors: François Bachoc, José Bétancourt, Reinhard Furrer, Thierry Klein

Abstract: The asymptotic analysis of covariance parameter estimation of Gaussian processes has been subject to intensive investigation. However, this asymptotic analysis is very scarce for non-Gaussian processes. In this paper, we study a class of non-Gaussian processes obtained by regular non-linear transformations of Gaussian processes. We provide the increasing-domain asymptotic properties of the (Gaussi… ▽ More The asymptotic analysis of covariance parameter estimation of Gaussian processes has been subject to intensive investigation. However, this asymptotic analysis is very scarce for non-Gaussian processes. In this paper, we study a class of non-Gaussian processes obtained by regular non-linear transformations of Gaussian processes. We provide the increasing-domain asymptotic properties of the (Gaussian) maximum likelihood and cross validation estimators of the covariance parameters of a non-Gaussian process of this class. We show that these estimators are consistent and asymptotically normal, although they are defined as if the process was Gaussian. They do not need to model or estimate the non-linear transformation. Our results can thus be interpreted as a robustness of (Gaussian) maximum likelihood and cross validation towards non-Gaussianity. Our proofs rely on two technical results that are of independent interest for the increasing-domain asymptotic literature of spatial processes. First, we show that, under mild assumptions, coefficients of inverses of large covariance matrices decay at an inverse polynomial rate as a function of the corresponding observation location distances. Second, we provide a general central limit theorem for quadratic forms obtained from transformed Gaussian processes. Finally, our asymptotic results are illustrated by numerical simulations. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: 40 pages, 4 figures

arXiv:1911.09006 [pdf, other]

Additive Bayesian Network Modelling with the R Package abn

Authors: Gilles Kratzer, Fraser Iain Lewis, Arianna Comin, Marta Pittavino, Reinhard Furrer

Abstract: The R package abn is designed to fit additive Bayesian models to observational datasets. It contains routines to score Bayesian networks based on Bayesian or information theoretic formulations of generalized linear models. It is equipped with exact search and greedy search algorithms to select the best network. It supports a possible blend of continuous, discrete and count data and input of prior… ▽ More The R package abn is designed to fit additive Bayesian models to observational datasets. It contains routines to score Bayesian networks based on Bayesian or information theoretic formulations of generalized linear models. It is equipped with exact search and greedy search algorithms to select the best network. It supports a possible blend of continuous, discrete and count data and input of prior knowledge at a structural level. The Bayesian implementation supports random effects to control for one-layer clustering. In this paper, we give an overview of the methodology and illustrate the package's functionalities using a veterinary dataset about respiratory diseases in commercial swine production. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Comments: 37 pages, 14 figures and 2 tables

arXiv:1906.00364 [pdf, other]

Combining Heterogeneous Spatial Datasets with Process-based Spatial Fusion Models: A Unifying Framework

Authors: Craig Wang, Reinhard Furrer

Abstract: In modern spatial statistics, the structure of data that is collected has become more heterogeneous. Depending on the type of spatial data, different modeling strategies for spatial data are used. For example, a kriging approach for geostatistical data; a Gaussian Markov random field model for lattice data; or a log Gaussian Cox process for point-pattern data. Despite these different modeling choi… ▽ More In modern spatial statistics, the structure of data that is collected has become more heterogeneous. Depending on the type of spatial data, different modeling strategies for spatial data are used. For example, a kriging approach for geostatistical data; a Gaussian Markov random field model for lattice data; or a log Gaussian Cox process for point-pattern data. Despite these different modeling choices, the nature of underlying scientific data-generating (latent) processes is often the same, which can be represented by some continuous spatial surfaces. In this paper, we introduce a unifying framework for process-based multivariate spatial fusion models. The framework can jointly analyze all three aforementioned types of spatial data (or any combinations thereof). Moreover, the framework accommodates different conditional distributions for geostatistical and lattice data. We show that some established approaches, such as linear models of coregionalization, can be viewed as special cases of our proposed framework. We offer flexible and scalable implementations in R using Stan and INLA. Simulation studies confirm that the predictive performance of latent processes improves as we move from univariate spatial models to multivariate spatial fusion models. The introduced framework is illustrated using a cross-sectional study linked with a national cohort dataset in Switzerland, we examine differences in underlying spatial risk patterns between respiratory disease and lung cancer. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: 33 pages, 5 figures

arXiv:1902.06641 [pdf, other]

Is a single unique Bayesian network enough to accurately represent your data?

Authors: Gilles Kratzer, Reinhard Furrer

Abstract: Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on… ▽ More Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on very crude decisions that result in too simplistic approaches for such complex systems. In practice, with limited data sampled from complex system, this approach seems too simplistic. An alternative would be to use the Monte Carlo Markov chain model choice (MC3) over the network to learn the landscape of reasonably supported networks, and then to present all possible arcs with their MCMC support. This paper presents an R implementation, called mcmcabn, of a flexible structural MC3 that is accessible to non-specialists. △ Less

Submitted 18 February, 2019; originally announced February 2019.

Comments: 2 pages, 3 figures

arXiv:1809.06636 [pdf, other]

Comparison between Suitable Priors for Additive Bayesian Networks

Authors: Gilles Kratzer, Reinhard Furrer, Marta Pittavino

Abstract: Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is use… ▽ More Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 8 pages, 4 figures

arXiv:1808.01126 [pdf, other]

Information-Theoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology

Authors: Gilles Kratzer, Reinhard Furrer

Abstract: Bayesian network modelling is a well adapted approach to study messy and highly correlated datasets which are very common in, e.g., systems epidemiology. A popular approach to learn a Bayesian network from an observational datasets is to identify the maximum a posteriori network in a search-and-score approach. Many scores have been proposed both Bayesian or frequentist based. In an applied perspec… ▽ More Bayesian network modelling is a well adapted approach to study messy and highly correlated datasets which are very common in, e.g., systems epidemiology. A popular approach to learn a Bayesian network from an observational datasets is to identify the maximum a posteriori network in a search-and-score approach. Many scores have been proposed both Bayesian or frequentist based. In an applied perspective, a suitable approach would allow multiple distributions for the data and is robust enough to run autonomously. A promising framework to compute scores are generalized linear models. Indeed, there exists fast algorithms for estimation and many tailored solutions to common epidemiological issues. The purpose of this paper is to present an R package abn that has an implementation of multiple frequentist scores and some realistic simulations that show its usability and performance. It includes features to deal efficiently with data separation and adjustment which are very common in systems epidemiology. △ Less

Submitted 3 August, 2018; originally announced August 2018.

Comments: 16 pages, 3 figures

arXiv:1804.11224 [pdf, other]

EggCounts: a Bayesian hierarchical toolkit to model faecal egg count reductions

Authors: Craig Wang, Reinhard Furrer

Abstract: This is a vignette for the R package eggCounts version 2.0. The package implements a suite of Bayesian hierarchical models dealing with faecal egg count reductions. The models are designed for a variety of practical situations, including individual treatment efficacy, zero inflation, small sample size (less than 10) and potential outliers. The functions are intuitive to use and their output are ea… ▽ More This is a vignette for the R package eggCounts version 2.0. The package implements a suite of Bayesian hierarchical models dealing with faecal egg count reductions. The models are designed for a variety of practical situations, including individual treatment efficacy, zero inflation, small sample size (less than 10) and potential outliers. The functions are intuitive to use and their output are easy to interpret, such that users are protected from being exposed to complex Bayesian hierarchical modelling tasks. In addition, the package includes plotting functions to display data and results in a visually appealing manner. The models are implemented in Stan modelling language, which provides efficient sampling technique to obtain posterior samples. This vignette briefly introduces different models, and provides a short walk-through analysis with example data. △ Less

Submitted 3 February, 2022; v1 submitted 30 April, 2018; originally announced April 2018.

Comments: 13 pages, 3 figures

arXiv:1804.11058 [pdf, other]

optimParallel: an R Package Providing Parallel Versions of the Gradient-Based Optimization Methods of optim()

Authors: Florian Gerber, Reinhard Furrer

Abstract: The R package optimParallel provides a parallel version of the gradient-based optimization methods of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce optimization times. We introduce the R package and illustrate its implementation, which takes advantage of the lexical sco** mechanism of R… ▽ More The R package optimParallel provides a parallel version of the gradient-based optimization methods of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce optimization times. We introduce the R package and illustrate its implementation, which takes advantage of the lexical sco** mechanism of R. △ Less

Submitted 30 April, 2018; originally announced April 2018.

arXiv:1804.07134 [pdf, other]

varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasets

Authors: Gilles Kratzer, Reinhard Furrer

Abstract: This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based… ▽ More This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest. △ Less

Submitted 19 April, 2018; originally announced April 2018.

Comments: 18 pages, 4 figures

arXiv:1710.05013 [pdf, other]

A Case Study Competition Among Methods for Analyzing Large Spatial Data

Authors: Matthew J. Heaton, Abhirup Datta, Andrew Finley, Reinhard Furrer, Rajarshi Guhaniyogi, Florian Gerber, Robert B. Gramacy, Dorit Hammerling, Matthias Katzfuss, Finn Lindgren, Douglas W. Nychka, Furong Sun, Andrew Zammit-Mangion

Abstract: The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structu… ▽ More The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. △ Less

Submitted 25 April, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

arXiv:1706.09233 [pdf, other]

Modeling Temporally Evolving and Spatially Globally Dependent Data

Authors: Emilio Porcu, Alfredo Alegría, Reinhard Furrer

Abstract: The last decades have seen an unprecedented increase in the availability of data sets that are inherently global and temporally evolving, from remotely sensed networks to climate model ensembles. This paper provides a view of statistical modeling techniques for space-time processes, where space is the sphere representing our planet. In particular, we make a distintion between (a) second order-base… ▽ More The last decades have seen an unprecedented increase in the availability of data sets that are inherently global and temporally evolving, from remotely sensed networks to climate model ensembles. This paper provides a view of statistical modeling techniques for space-time processes, where space is the sphere representing our planet. In particular, we make a distintion between (a) second order-based, and (b) practical approaches to model temporally evolving global processes. The former are based on the specification of a class of space-time covariance functions, with space being the two-dimensional sphere. The latter are based on explicit description of the dynamics of the space-time process, i.e., by specifying its evolution as a function of its past history with added spatially dependent noise. We especially focus on approach (a), where the literature has been sparse. We provide new models of space-time covariance functions for random fields defined on spheres cross time. Practical approaches, (b), are also discussed, with special emphasis on models built directly on the sphere, without projecting the spherical coordinate on the plane. We present a case study focused on the analysis of air pollution from the 2015 wildfires in Equatorial Asia, an event which was classified as the year's worst environmental disaster. The paper finishes with a list of the main theoretical and applied research problems in the area, where we expect the statistical community to engage over the next decade. △ Less

Submitted 28 June, 2017; originally announced June 2017.

arXiv:1706.07766 [pdf, other]

doi 10.1080/00949655.2017.1406488

Asymmetric Matrix-Valued Covariances for Multivariate Random Fields on Spheres

Authors: Alfredo Alegría, Emilio Porcu, Reinhard Furrer

Abstract: Matrix-valued covariance functions are crucial to geostatistical modeling of multivariate spatial data. The classical assumption of symmetry of a multivariate covariance function is overlay restrictive and has been considered as unrealistic for most of real data applications. Despite of that, the literature on asymmetric covariance functions has been very sparse. In particular, there is some work… ▽ More Matrix-valued covariance functions are crucial to geostatistical modeling of multivariate spatial data. The classical assumption of symmetry of a multivariate covariance function is overlay restrictive and has been considered as unrealistic for most of real data applications. Despite of that, the literature on asymmetric covariance functions has been very sparse. In particular, there is some work related to asymmetric covariances on Euclidean spaces, depending on the Euclidean distance. However, for data collected over large portions of planet Earth, the most natural spatial domain is a sphere, with the corresponding geodesic distance being the natural metric. In this work, we propose a strategy based on spatial rotations to generate asymmetric covariances for multivariate random fields on the $d$-dimensional unit sphere. We illustrate through simulations as well as real data analysis that our proposal allows to achieve improvements in the predictive performance in comparison to the symmetric counterpart. △ Less

Submitted 23 June, 2017; originally announced June 2017.

arXiv:1702.08188 [pdf, other]

doi 10.1016/j.softx.2018.06.002

dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors

Authors: Florian Gerber, Kaspar Mösinger, Reinhard Furrer

Abstract: The R functions .C() and .Fortran() can be used to call compiled C/C++ and Fortran code from R. This so-called foreign function interface is convenient, since it does not require any interactions with the C API of R. However, it does not support long vectors (i.e., vectors of more than 2^31 elements). To overcome this limitation, the R package dotCall64 provides .C64(), which can be used to call c… ▽ More The R functions .C() and .Fortran() can be used to call compiled C/C++ and Fortran code from R. This so-called foreign function interface is convenient, since it does not require any interactions with the C API of R. However, it does not support long vectors (i.e., vectors of more than 2^31 elements). To overcome this limitation, the R package dotCall64 provides .C64(), which can be used to call compiled C/C++ and Fortran functions. It transparently supports long vectors and does the necessary castings to pass numeric R vectors to 64-bit integer arguments of the compiled code. Moreover, .C64() features a mechanism to avoid unnecessary copies of function arguments, making it efficient in terms of speed and memory usage. △ Less

Submitted 27 February, 2017; originally announced February 2017.

Comments: 17 pages

Journal ref: SoftwareX, 7, 217-221, 2018

arXiv:1701.06010 [pdf, other]

Covariance Functions for Multivariate Gaussian Fields Evolving Temporally over Planet Earth

Authors: Alfredo Alegría, Emilio Porcu, Reinhard Furrer, Jorge Mateu

Abstract: The construction of valid and flexible cross-covariance functions is a fundamental task for modeling multivariate space-time data arising from climatological and oceanographical phenomena. Indeed, a suitable specification of the covariance structure allows to capture both the space-time dependencies between the observations and the development of accurate predictions. For data observed over large… ▽ More The construction of valid and flexible cross-covariance functions is a fundamental task for modeling multivariate space-time data arising from climatological and oceanographical phenomena. Indeed, a suitable specification of the covariance structure allows to capture both the space-time dependencies between the observations and the development of accurate predictions. For data observed over large portions of planet Earth it is necessary to take into account the curvature of the planet. Hence the need for random field models defined over spheres across time. In particular, the associated covariance function should depend on the geodesic distance, which is the most natural metric over the spherical surface. In this work, we propose a flexible parametric family of matrix-valued covariance functions, with both marginal and cross structure being of the Gneiting type. We additionally introduce a different multivariate Gneiting model based on the adaptation of the latent dimension approach to the spherical context. Finally, we assess the performance of our models through the study of a bivariate space-time data set of surface air temperatures and precipitations. △ Less

Submitted 21 November, 2017; v1 submitted 21 January, 2017; originally announced January 2017.

arXiv:1607.06921 [pdf, other]

Estimation and Prediction using generalized Wendland Covariance Functions under fixed domain asymptotics

Authors: M. Bevilacqua, T. Faouzi, R. Furrer, E. Porcu

Abstract: We study estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland (GW) class, under fixed domain asymptotics. As the Matérn case, this class allows a continuous parameterization of smoothness of the underlying Gaussian random field, being additionally compactly supported. The paper is divided into two parts: First, we characterize the equival… ▽ More We study estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland (GW) class, under fixed domain asymptotics. As the Matérn case, this class allows a continuous parameterization of smoothness of the underlying Gaussian random field, being additionally compactly supported. The paper is divided into two parts: First, we characterize the equivalence of two Gaussian measures with GW covariance function, and we provide sufficient conditions for the equivalence of two Gaussian measures with Matérn and GW covariance functions. We elucidate the consequences of these facts in terms of (misspecified) best linear unbiased predictors. In the second part, we establish strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to GW covariance model, under fixed domain asymptotics. Our findings are illustrated through a simulation study: The first compares the finite sample behavior of the maximum likelihood estimation of the microergodic parameter with the given asymptotic distribution. We then compare the finite-sample behavior of the prediction and its associated mean square error when using two equivalent Gaussian measures with Matérn and GW covariance model, using covariance tapering as benchmark. △ Less

Submitted 15 November, 2017; v1 submitted 23 July, 2016; originally announced July 2016.

MSC Class: 62F10; 62M30; 62H11; 86A32

arXiv:1605.01038 [pdf]

doi 10.1109/TGRS.2017.2785240

Predicting missing values in spatio-temporal satellite data

Authors: Florian Gerber, Reinhard Furrer, Gabriela Schaepman-Strub, Rogier de Jong, Michael E. Schaepman

Abstract: Remotely sensed data are sparse, which means that data have missing values, for instance due to cloud cover. This is problematic for applications and signal processing algorithms that require complete data sets. To address the sparse data issue, we present a new gap-fill algorithm. The proposed method predicts each missing value separately based on data points in a spatio-temporal neighborhood aro… ▽ More Remotely sensed data are sparse, which means that data have missing values, for instance due to cloud cover. This is problematic for applications and signal processing algorithms that require complete data sets. To address the sparse data issue, we present a new gap-fill algorithm. The proposed method predicts each missing value separately based on data points in a spatio-temporal neighborhood around the missing data point. The computational workload can be distributed among several computers, making the method suitable for large datasets. The prediction of the missing values and the estimation of the corresponding prediction uncertainties are based on sorting procedures and quantile regression. The algorithm was applied to MODIS NDVI data from Alaska and tested with realistic cloud cover scenarios featuring up to 50% missing data. Validation against established software showed that the proposed method has a good performance in terms of the root mean squared prediction error. The procedure is implemented and available in the open-source R package gapfill. We demonstrate the software performance with a real data example and show how it can be tailored to specific data. Due to the flexible software design, users can control and redesign major parts of the procedure with little effort. This makes it an interesting tool for gap-filling satellite data and for the future development of gap-fill procedures. △ Less

Submitted 3 May, 2016; originally announced May 2016.

Comments: 35 pages

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, Volume 55, Issue 5, 2841-2853, 2018

arXiv:1604.05478 [pdf, other]

Valid parameter space of a bivariate Gaussian Markov random field with a generalized block-Toeplitz precision matrix

Authors: Mattia Molinaro, Reinhard Furrer

Abstract: Gaussian Markov random fields (GMRFs) are extensively used in statistics to model area-based data and usually depend on several parameters in order to capture complex spatial correlations. In this context, it is important to determine the valid parameter space, namely the domain ensuring (semi) positive-definiteness of the precision matrix. Depending on the structure of the latter, this task can b… ▽ More Gaussian Markov random fields (GMRFs) are extensively used in statistics to model area-based data and usually depend on several parameters in order to capture complex spatial correlations. In this context, it is important to determine the valid parameter space, namely the domain ensuring (semi) positive-definiteness of the precision matrix. Depending on the structure of the latter, this task can be challenging. While univari- ate GMRFs with block-Toeplitz precision are well studied in the literature, not much is analytically known about bivariate GMRFs. So far, only restrictive sufficient conditions and brute-force approaches were proposed, which are computationally expensive for the size of modern datasets. In this paper, we consider a bivariate GMRF, which is part of a hierarchical model used in spatial statistics to analyze data coming from projec- tions of regional climate change. By extending classical convergence results of univariate fields with toroidal boundary conditions to fields without boundary conditions, we pro- vide asymptotically closed-form expressions of the valid parameter space. We develop a general methodology that can be used to determine the valid parameter space of bivariate GMRFs whose precision matrix has a generalized block-Toeplitz structure and for which classical convergence results are not directly applicable. Finally, we quantify the rate of convergence of our approach through a numerical study in R. △ Less

Submitted 19 April, 2016; originally announced April 2016.

MSC Class: 15A18; 62M30

arXiv:1602.02882 [pdf, ps, other]

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

Authors: François Bachoc, Reinhard Furrer

Abstract: There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum lik… ▽ More There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum likelihood estimators for multivariate spatial processes have not received their deserved attention yet. In this article we consider the classical increasing-domain asymptotic setting restricting the minimum distance between the locations. Then, one of the main components to be studied from a theoretical point of view is the asymptotic positive definiteness of the underlying covariance matrix. Based on very weak assumptions on the matrix covariance function we show that the smallest eigenvalue of the covariance matrix is asymptotically bounded away from zero. Several practical implications are discussed as well. △ Less

Submitted 9 February, 2016; originally announced February 2016.

arXiv:1506.01833 [pdf, other]

Asymptotic properties of multivariate tapering for estimation and prediction

Authors: R. Furrer, F. Bachoc, J. Du

Abstract: Parameter estimation for and prediction of spatially or spatio--temporally correlated random processes are used in many areas and often require the solution of a large linear system based on the covariance matrix of the observations. In recent years, the dataset sizes to which these methods are applied have steadily increased such that straightforward statistical tools are computationally too expe… ▽ More Parameter estimation for and prediction of spatially or spatio--temporally correlated random processes are used in many areas and often require the solution of a large linear system based on the covariance matrix of the observations. In recent years, the dataset sizes to which these methods are applied have steadily increased such that straightforward statistical tools are computationally too expensive to be used. In the univariate context, tapering, i.e., creating sparse approximate linear systems, has been shown to be an efficient tool in both the estimation and prediction settings. The asymptotic properties are derived under an infill asymptotic setting. In this paper we use a domain increasing framework for estimation and prediction using multivariate tapering. Under this asymptotic regime we prove that tapering (one-tapered form) preserves the consistency of the untapered maximum likelihood estimator and show that tapering has asymptotically the same mean squared prediction error as using the corresponding untapered predictor. The theoretical results are illustrated with simulations. △ Less

Submitted 5 June, 2015; originally announced June 2015.

arXiv:1401.2642 [pdf, other]

Hierarchical modelling of faecal egg counts to assess anthelmintic efficacy

Authors: Michaela Paul, Paul R. Torgerson, Johan Höglund, Reinhard Furrer

Abstract: Counting the number of parasite eggs in faecal samples is a widely used diagnostic method to evaluate parasite burden. Typically a sub-sample of the diluted faeces is examined for eggs. The resulting egg counts are multiplied by a specific correction factor to estimate the mean parasite burden. To detect anthelmintic resistance, the mean parasite burden from treated and untreated animals are compa… ▽ More Counting the number of parasite eggs in faecal samples is a widely used diagnostic method to evaluate parasite burden. Typically a sub-sample of the diluted faeces is examined for eggs. The resulting egg counts are multiplied by a specific correction factor to estimate the mean parasite burden. To detect anthelmintic resistance, the mean parasite burden from treated and untreated animals are compared. However, this standard method has some limitations. In particular, the analysis of repeated samples may produce quite variable results as the sampling variability due to the counting technique is ignored. We propose a hierarchical model that takes this sampling variability as well as between-animal variation into account. Bayesian inference is done via Markov chain Monte Carlo sampling. The performance of the hierarchical model is illustrated by a re-analysis of faecal egg count data from a Swedish study assessing the anthelmintic resistance of nematode parasite in sheep. A simulation study shows that the hierarchical model provides better classification of anthelmintic resistance compared to the standard method. △ Less

Submitted 12 January, 2014; originally announced January 2014.

Comments: 14 pages, 7 figures, 1 table

arXiv:1303.3390 [pdf, other]

Conjugate distributions in hierarchical Bayesian ANOVA for computational efficiency and assessments of both practical and statistical significance

Authors: Steven Geinitz, Reinhard Furrer

Abstract: Assessing variability according to distinct factors in data is a fundamental technique of statistics. The method commonly regarded to as analysis of variance (ANOVA) is, however, typically confined to the case where all levels of a factor are present in the data (i.e. the population of factor levels has been exhausted). Random and mixed effects models are used for more elaborate cases, but require… ▽ More Assessing variability according to distinct factors in data is a fundamental technique of statistics. The method commonly regarded to as analysis of variance (ANOVA) is, however, typically confined to the case where all levels of a factor are present in the data (i.e. the population of factor levels has been exhausted). Random and mixed effects models are used for more elaborate cases, but require distinct nomenclature, concepts and theory, as well as distinct inferential procedures. Following a hierarchical Bayesian approach, a comprehensive ANOVA framework is shown, which unifies the above statistical models, emphasizes practical rather than statistical significance, addresses issues of parameter identifiability for random effects, and provides straightforward computational procedures for inferential steps. Although this is done in a rigorous manner the contents herein can be seen as ideological in supporting a shift in the approach taken towards analysis of variance. △ Less

Submitted 14 March, 2013; originally announced March 2013.

Comments: 24 pages

arXiv:1302.4659 [pdf, other]

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Authors: Daniel K. Heersink, Reinhard Furrer, Mike A. Mooney

Abstract: Modern earthwork compaction rollers collect location and compaction information as they traverse a compaction site. These data are indirectly observed through non-linear measurement operators, inherently multivariate with complex correlation structures, and collected in huge quantities. The nature of such data was investigated at a large, atypically compacted test bed in Florida, USA. Exploratory… ▽ More Modern earthwork compaction rollers collect location and compaction information as they traverse a compaction site. These data are indirectly observed through non-linear measurement operators, inherently multivariate with complex correlation structures, and collected in huge quantities. The nature of such data was investigated at a large, atypically compacted test bed in Florida, USA. Exploratory analysis of this data through detrending and empirical semivariogram estimation is performed. A second analysis using a sequential, spatial backfitting algorithm is used to investigate the importance of driving direction of the roller. △ Less

Submitted 20 February, 2013; v1 submitted 19 February, 2013; originally announced February 2013.

Comments: 14 pages, 6 figures

arXiv:1302.4631 [pdf, other]

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis

Authors: Daniel K. Heersink, Reinhard Furrer, Mike A. Mooney

Abstract: Modern earthwork compaction rollers collect location and compaction information as they traverse a compaction site. These roller measurement values present a challenging spatio-temporal statistical problem that requires careful implementation of a proper stochastic model and estimation procedure. Heersink and Furrer (2013) proposed a sequential, spatial mixed-effects model and a sequential, spatia… ▽ More Modern earthwork compaction rollers collect location and compaction information as they traverse a compaction site. These roller measurement values present a challenging spatio-temporal statistical problem that requires careful implementation of a proper stochastic model and estimation procedure. Heersink and Furrer (2013) proposed a sequential, spatial mixed-effects model and a sequential, spatial backfitting routine for estimation of the modeling terms for such data. The estimated fields produced from this backfitting procedure are analyzed using a multiresolution scale space analysis developed by Holmstrom et al. (2011). This image analysis is proposed as a viable solution to improved intelligent compaction and quality assurance of the compaction process. △ Less

Submitted 20 March, 2013; v1 submitted 19 February, 2013; originally announced February 2013.

Comments: 11 pages, 4 figures

arXiv:1207.2338 [pdf, other]

MMANOVA: A general multilevel framework for multivariate analysis of variance

Authors: Steven Geinitz, Reinhard Furrer, Stephan R. Sain

Abstract: Classical analysis of variance requires that model terms be labeled as fixed or random and typically culminate by comparing variability from each batch (factor) to variability from errors; without a standard methodology to assess the magnitude of a batch's variability, to compare variability between batches, nor to consider the uncertainty in this assessment. In this paper we support recent work,… ▽ More Classical analysis of variance requires that model terms be labeled as fixed or random and typically culminate by comparing variability from each batch (factor) to variability from errors; without a standard methodology to assess the magnitude of a batch's variability, to compare variability between batches, nor to consider the uncertainty in this assessment. In this paper we support recent work, placing ANOVA into a general multilevel framework, then refine this through batch level model specifications, and develop it further by extension to the multivariate case. Adopting a Bayesian multilevel model parametrization, with improper batch level prior densities, we derive a method that facilitates comparison across all sources of variability. Whereas classical multivariate ANOVA often utilizes a single covariance criterion, e.g. determinant for Wilks' lambda distribution, the method allows arbitrary covariance criteria to be employed. The proposed method also addresses computation. By introducing implicit batch level constraints, which yield improper priors, the full posterior is efficiently factored, thus alleviating computational demands. For a large class of models, the partitioning mitigates, or even obviates the need for methods such as MCMC. The method is illustrated with simulated examples and an application focusing on climate projections with global climate models. △ Less

Submitted 15 July, 2012; v1 submitted 10 July, 2012; originally announced July 2012.

arXiv:1104.2703 [pdf, ps, other]

doi 10.1214/10-AOAS369

A spatial analysis of multivariate output from regional climate models

Authors: Stephan R. Sain, Reinhard Furrer, Noel Cressie

Abstract: Climate models have become an important tool in the study of climate and climate change, and ensemble experiments consisting of multiple climate-model runs are used in studying and quantifying the uncertainty in climate-model output. However, there are often only a limited number of model runs available for a particular experiment, and one of the statistical challenges is to characterize the distr… ▽ More Climate models have become an important tool in the study of climate and climate change, and ensemble experiments consisting of multiple climate-model runs are used in studying and quantifying the uncertainty in climate-model output. However, there are often only a limited number of model runs available for a particular experiment, and one of the statistical challenges is to characterize the distribution of the model output. To that end, we have developed a multivariate hierarchical approach, at the heart of which is a new representation of a multivariate Markov random field. This approach allows for flexible modeling of the multivariate spatial dependencies, including the cross-dependencies between variables. We demonstrate this statistical model on an ensemble arising from a regional-climate-model experiment over the western United States, and we focus on the projected change in seasonal temperature and precipitation over the next 50 years. △ Less

Submitted 14 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS369 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS369

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 1, 150-175

Showing 1–46 of 46 results for author: Furrer, R