Search | arXiv e-print repository

High-dimensional copula-based Wasserstein dependence

Authors: Steven De Keyser, Irene Gijbels

Abstract: We generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, we discuss general axioms for depend… ▽ More We generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, we discuss general axioms for dependence measures between multiple random vectors, other plausible normalizations, and various examples. Afterwards, we look into plug-in estimators based on penalized empirical covariance matrices in order to deal with high dimensionality issues and take possible marginal independencies into account by inducing (block) sparsity. The latter ideas are investigated via a simulation study, considering other dependence coefficients as well. We illustrate the use of the developed methods in two real data applications. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2306.12674 [pdf, other]

Map** poverty at multiple geographical scales

Authors: Silvia De Nicolò, Enrico Fabrizi, Aldo Gardini

Abstract: Poverty map** is a powerful tool to study the geography of poverty. The choice of the spatial resolution is central as poverty measures defined at a coarser level may mask their heterogeneity at finer levels. We introduce a small area multi-scale approach integrating survey and remote sensing data that leverages information at different spatial resolutions and accounts for hierarchical dependenc… ▽ More Poverty map** is a powerful tool to study the geography of poverty. The choice of the spatial resolution is central as poverty measures defined at a coarser level may mask their heterogeneity at finer levels. We introduce a small area multi-scale approach integrating survey and remote sensing data that leverages information at different spatial resolutions and accounts for hierarchical dependencies, preserving estimates coherence. We map poverty rates by proposing a Bayesian Beta-based model equipped with a new benchmarking algorithm that accounts for the double-bounded support. A simulation study shows the effectiveness of our proposal and an application on Bangladesh is discussed. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 22 pages, 7 figures

arXiv:2306.05315 [pdf, other]

Large-scale adaptive multiple testing for sequential data controlling false discovery and nondiscovery rates

Authors: Rahul Roy, Shyamal K. De, Subir Kumar Bhandari

Abstract: In modern scientific experiments, we frequently encounter data that have large dimensions, and in some experiments, such high dimensional data arrive sequentially rather than full data being available all at a time. We develop multiple testing procedures with simultaneous control of false discovery and nondiscovery rates when $m$-variate data vectors $\mathbf{X}_1, \mathbf{X}_2, \dots$ are observe… ▽ More In modern scientific experiments, we frequently encounter data that have large dimensions, and in some experiments, such high dimensional data arrive sequentially rather than full data being available all at a time. We develop multiple testing procedures with simultaneous control of false discovery and nondiscovery rates when $m$-variate data vectors $\mathbf{X}_1, \mathbf{X}_2, \dots$ are observed sequentially or in groups and each coordinate of these vectors leads to a hypothesis testing. Existing multiple testing methods for sequential data uses fixed stop** boundaries that do not depend on sample size, and hence, are quite conservative when the number of hypotheses $m$ is large. We propose sequential tests based on adaptive stop** boundaries that ensure shrinkage of the continue sampling region as the sample size increases. Under minimal assumptions on the data sequence, we first develop a test based on an oracle test statistic such that both false discovery rate (FDR) and false nondiscovery rate (FNR) are nearly equal to some prefixed levels with strong control. Under a two-group mixture model assumption, we propose a data-driven stop** and decision rule based on local false discovery rate statistic that mimics the oracle rule and guarantees simultaneous control of FDR and FNR asymptotically as $m$ tends to infinity. Both the oracle and the data-driven stop** times are shown to be finite (i.e., proper) with probability 1 for all finite $m$ and converge to a finite constant as $m$ grows to infinity. Further, we compare the data-driven test with the existing gap rule proposed in He and Bartroff (2021) and show that the ratio of the expected sample sizes of our method and the gap rule tends to zero as $m$ goes to infinity. Extensive analysis of simulated datasets as well as some real datasets illustrate the superiority of the proposed tests over some existing methods. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 44 pages, 4 figures, 2 tables

arXiv:2305.16530 [pdf, other]

doi 10.1016/j.cma.2024.116793

Bi-fidelity Variational Auto-encoder for Uncertainty Quantification

Authors: Nuo** Cheng, Osman Asif Malik, Subhayan De, Stephen Becker, Alireza Doostan

Abstract: Quantifying the uncertainty of quantities of interest (QoIs) from physical systems is a primary objective in model validation. However, achieving this goal entails balancing the need for computational efficiency with the requirement for numerical accuracy. To address this trade-off, we propose a novel bi-fidelity formulation of variational auto-encoders (BF-VAE) designed to estimate the uncertaint… ▽ More Quantifying the uncertainty of quantities of interest (QoIs) from physical systems is a primary objective in model validation. However, achieving this goal entails balancing the need for computational efficiency with the requirement for numerical accuracy. To address this trade-off, we propose a novel bi-fidelity formulation of variational auto-encoders (BF-VAE) designed to estimate the uncertainty associated with a QoI from low-fidelity (LF) and high-fidelity (HF) samples of the QoI. This model allows for the approximation of the statistics of the HF QoI by leveraging information derived from its LF counterpart. Specifically, we design a bi-fidelity auto-regressive model in the latent space that is integrated within the VAE's probabilistic encoder-decoder structure. An effective algorithm is proposed to maximize the variational lower bound of the HF log-likelihood in the presence of limited HF data, resulting in the synthesis of HF realizations with a reduced computational cost. Additionally, we introduce the concept of the bi-fidelity information bottleneck (BF-IB) to provide an information-theoretic interpretation of the proposed BF-VAE model. Our numerical results demonstrate that BF-VAE leads to considerably improved accuracy, as compared to a VAE trained using only HF data, when limited HF data is available. △ Less

Submitted 17 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Computer Methods in Applied Mechanics and Engineering (CMAME), Volume 421, 1 March 2024, 116793

arXiv:2304.12609 [pdf, other]

A Bi-fidelity DeepONet Approach for Modeling Uncertain and Degrading Hysteretic Systems

Authors: Subhayan De, Patrick T. Brewick

Abstract: Nonlinear systems, such as with degrading hysteretic behavior, are often encountered in engineering applications. In addition, due to the ubiquitous presence of uncertainty and the modeling of such systems becomes increasingly difficult. On the other hand, datasets from pristine models developed without knowing the nature of the degrading effects can be easily obtained. In this paper, we use datas… ▽ More Nonlinear systems, such as with degrading hysteretic behavior, are often encountered in engineering applications. In addition, due to the ubiquitous presence of uncertainty and the modeling of such systems becomes increasingly difficult. On the other hand, datasets from pristine models developed without knowing the nature of the degrading effects can be easily obtained. In this paper, we use datasets from pristine models without considering the degrading effects of hysteretic systems as low-fidelity representations that capture many of the important characteristics of the true system's behavior to train a deep operator network (DeepONet). Three numerical examples are used to show that the proposed use of the DeepONets to model the discrepancies between the low-fidelity model and the true system's response leads to significant improvements in the prediction error in the presence of uncertainty in the model parameters for degrading hysteretic systems. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 23 pages, 15 figures

arXiv:2303.16151 [pdf, other]

Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage

Authors: Rafael Alves, Diego S. de Brito, Marcelo C. Medeiros, Ruy M. Ribeiro

Abstract: We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using v… ▽ More We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using vector heterogeneous autoregressive (VHAR) models with the least absolute shrinkage and selection operator (LASSO). Our methodology improves forecasting precision relative to standard benchmarks and leads to better estimates of minimum variance portfolios. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2302.13861 [pdf, other]

Differentially Private Diffusion Models Generate Useful Synthetic Images

Authors: Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

Abstract: The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do n… ▽ More The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy. By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data. We decrease the SOTA FID on CIFAR-10 from 26.2 to 9.8, and increase the accuracy from 51.0% to 88.0%. On synthetic data from Camelyon17, we achieve a downstream accuracy of 91.1% which is close to the SOTA of 96.5% when training on the real data. We leverage the ability of generative models to create infinite amounts of data to maximise the downstream prediction performance, and further show how to use synthetic data for hyperparameter tuning. Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data, even in applications with significant distribution shift between the pre-training and fine-tuning distributions. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.13611 [pdf, other]

Parametric dependence between random vectors via copula-based divergence measures

Authors: Steven De Keyser, Irène Gijbels

Abstract: This article proposes copula-based dependence quantification between multiple groups of random variables of possibly different sizes via the family of $Phi$-divergences. An axiomatic framework for this purpose is provided, after which we focus on the absolutely continuous setting assuming copula densities exist. We consider parametric and semi-parametric frameworks, discuss estimation procedures,… ▽ More This article proposes copula-based dependence quantification between multiple groups of random variables of possibly different sizes via the family of $Phi$-divergences. An axiomatic framework for this purpose is provided, after which we focus on the absolutely continuous setting assuming copula densities exist. We consider parametric and semi-parametric frameworks, discuss estimation procedures, and report on asymptotic properties of the proposed estimators. In particular, we first concentrate on a Gaussian copula approach yielding explicit and attractive dependence coefficients for specific choices of $Phi$, which are more amenable for estimation. Next, general parametric copula families are considered, with special attention to nested Archimedean copulas, being a natural choice for dependence modelling of random vectors. The results are illustrated by means of examples. Simulations and a real-world application on financial data are provided as well. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.10861 [pdf, other]

Estimating the optimal time to perform a PET-PSMA exam in prostatectomized patients based on data from clinical practice

Authors: Martina Amongero, Gianluca Mastrantonio, Stefano De Luca, Mauro Gasparini

Abstract: Prostatectomized patients are at risk of resurgence: this is the reason why, during a follow-up period, they are monitored for PSA growth, an indicator of tumor progression. The presence of tumors can be evaluated with an expensive exam, called PET-PSMA (Positron Emission Tomography with Prostate-Specific Membrane Antigen). But, to optimize the benefit/risk ratio, patients should be referred to th… ▽ More Prostatectomized patients are at risk of resurgence: this is the reason why, during a follow-up period, they are monitored for PSA growth, an indicator of tumor progression. The presence of tumors can be evaluated with an expensive exam, called PET-PSMA (Positron Emission Tomography with Prostate-Specific Membrane Antigen). But, to optimize the benefit/risk ratio, patients should be referred to this exam only when the evidence is strong. The aim is to estimate the optimal time to recommend the exam, based on patients' history and collected data. We build a Hierarchical Bayesian model that describes, jointly, the PSA growth curve and the probability of a positive PET-PSMA. Our proposal is to process all information about the patient in order to give an informed estimate of the optimal time. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2211.04915 [pdf, other]

Inferring Mobility of Care Travel Behavior From Transit Origin-Destination Data

Authors: Daniela Shuman, Awad Abdelhalim, Anson F Stewart, Kayleigh B Campbell, Mira Patel, Ines Sanchez de Madariaga, **hua Zhao

Abstract: There are substantial differences in travel behavior by gender on public transit. Studies have concluded that these differences are largely attributable to household responsibilities typically falling disproportionately on women, leading to women being more likely to utilize transit for purposes referred to by the umbrella concept of "mobility of care". In contrast to past studies that have quanti… ▽ More There are substantial differences in travel behavior by gender on public transit. Studies have concluded that these differences are largely attributable to household responsibilities typically falling disproportionately on women, leading to women being more likely to utilize transit for purposes referred to by the umbrella concept of "mobility of care". In contrast to past studies that have quantified the impact of gender using survey and qualitative data, we propose a novel data-driven workflow utilizing a combination of previously developed origin, destination, and transfer inference (ODX) based on individual transit fare card transactions, name-based gender inference, and geospatial analysis as a framework to identify mobility of care trip making. We apply this framework to data from the Washington Metropolitan Area Transit Authority (WMATA). Analyzing data from millions of journeys conducted in the first quarter of 2019, the results of this study show that our proposed workflow can identify mobility of care travel behavior, detecting times and places of interest where the share of women travelers in an equally-sampled subset (on basis of inferred gender) of transit users is 10% - 15% higher than that of men. The workflow presented in this study provides a blueprint for combining transit origin-destination data, inferred customer demographics, and geospatial analyses enabling public transit agencies to assess, at the fare card level, the gendered impacts of different policy and operational decisions. △ Less

Submitted 10 April, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: Updated reference formatting and discussion points

arXiv:2210.13028 [pdf, other]

Generalised Likelihood Ratio Testing Adversaries through the Differential Privacy Lens

Authors: Georgios Kaissis, Alexander Ziller, Stefan Kolek Martinez de Azagra, Daniel Rueckert

Abstract: Differential Privacy (DP) provides tight upper bounds on the capabilities of optimal adversaries, but such adversaries are rarely encountered in practice. Under the hypothesis testing/membership inference interpretation of DP, we examine the Gaussian mechanism and relax the usual assumption of a Neyman-Pearson-Optimal (NPO) adversary to a Generalized Likelihood Test (GLRT) adversary. This mild rel… ▽ More Differential Privacy (DP) provides tight upper bounds on the capabilities of optimal adversaries, but such adversaries are rarely encountered in practice. Under the hypothesis testing/membership inference interpretation of DP, we examine the Gaussian mechanism and relax the usual assumption of a Neyman-Pearson-Optimal (NPO) adversary to a Generalized Likelihood Test (GLRT) adversary. This mild relaxation leads to improved privacy guarantees, which we express in the spirit of Gaussian DP and $(\varepsilon, δ)$-DP, including composition and sub-sampling results. We evaluate our results numerically and find them to match the theoretical upper bounds. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.01985 [pdf, other]

doi 10.1093/jrsssa/qnad083

Small Area Estimation of Inequality Measures using Mixtures of Beta

Authors: Silvia De Nicolò, Maria Rosaria Ferrante, Silvia Pacei

Abstract: Economic inequalities referring to specific regions are crucial in deepening spatial heterogeneity. Income surveys are generally planned to produce reliable estimates at countries or macroregion levels, thus we implement a small area model for a set of inequality measures (Gini, Relative Theil and Atkinson indexes) to obtain microregion estimates. Considering that inequality estimators are unit-in… ▽ More Economic inequalities referring to specific regions are crucial in deepening spatial heterogeneity. Income surveys are generally planned to produce reliable estimates at countries or macroregion levels, thus we implement a small area model for a set of inequality measures (Gini, Relative Theil and Atkinson indexes) to obtain microregion estimates. Considering that inequality estimators are unit-interval defined with skewed and heavy-tailed distributions, we propose a Bayesian hierarchical model at area level involving a Beta mixture. An application on EU-SILC data is carried out and a design-based simulation is performed. Our model outperforms in terms of bias, coverage and error the standard Beta regression model. Moreover, we extend the analysis of inequality estimators by deriving their approximate variance functions. △ Less

Submitted 15 September, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: 29 pages, 8 figures, 2 tables

Journal ref: Journal of the Royal Statistical Society Series A: Statistics in Society (2024)

arXiv:2207.07001 [pdf, other]

doi 10.1093/mnras/stac2504

SCONCE: A cosmic web finder for spherical and conic geometries

Authors: Yikun Zhang, Rafael S. de Souza, Yen-Chi Chen

Abstract: The latticework structure known as the cosmic web provides a valuable insight into the assembly history of large-scale structures. Despite the variety of methods to identify the cosmic web structures, they mostly rely on the assumption that galaxies are embedded in a Euclidean geometric space. Here we present a novel cosmic web identifier called SCONCE (Spherical and CONic Cosmic wEb finder) that… ▽ More The latticework structure known as the cosmic web provides a valuable insight into the assembly history of large-scale structures. Despite the variety of methods to identify the cosmic web structures, they mostly rely on the assumption that galaxies are embedded in a Euclidean geometric space. Here we present a novel cosmic web identifier called SCONCE (Spherical and CONic Cosmic wEb finder) that inherently considers the 2D (RA,DEC) spherical or the 3D (RA,DEC,$z$) conic geometry. The proposed algorithms in SCONCE generalize the well-known subspace constrained mean shift (SCMS) method and primarily address the predominant filament detection problem. They are intrinsic to the spherical/conic geometry and invariant to data rotations. We further test the efficacy of our method with an artificial cross-shaped filament example and apply it to the SDSS galaxy catalogue, revealing that the 2D spherical version of our algorithms is robust even in regions of high declination. Finally, using N-body simulations from Illustris, we show that the 3D conic version of our algorithms is more robust in detecting filaments than the standard SCMS method under the redshift distortions caused by the peculiar velocities of halos. Our cosmic web finder is packaged in python as SCONCE-SCMS and has been made publicly available. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: 20 pages, 9 figures, 2 tables

arXiv:2204.13650 [pdf, other]

Unlocking High-Accuracy Differentially Private Image Classification through Scale

Authors: Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Abstract: Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found th… ▽ More Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification. △ Less

Submitted 16 June, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.00997 [pdf, other]

Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using DeepONets

Authors: Subhayan De, Matthew Reynolds, Malik Hassanaly, Ryan N. King, Alireza Doostan

Abstract: Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, know… ▽ More Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, known as the low-fidelity models, are available. In this paper, we propose a bi-fidelity modeling approach for complex physical systems, where we model the discrepancy between the true system's response and low-fidelity response in the presence of a small training dataset from the true system's response using a deep operator network (DeepONet), a neural network architecture suitable for approximating nonlinear operators. We apply the approach to model systems that have parametric uncertainty and are partially unknown. Three numerical examples are used to show the efficacy of the proposed approach to model uncertain and partially unknown complex physical systems. △ Less

Submitted 18 August, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: 20 pages, 15 figures

arXiv:2203.03304 [pdf, other]

Regularising for invariance to data augmentation improves supervised learning

Authors: Aleksander Botev, Matthias Bauer, Soham De

Abstract: Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations. Usually this invariance is only encouraged implicitly by including a single augmented input during training. However, several works have recently shown that using multiple augmentations per input can improve generalisation or can be used to incorporate invariances more explicitly. In… ▽ More Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations. Usually this invariance is only encouraged implicitly by including a single augmented input during training. However, several works have recently shown that using multiple augmentations per input can improve generalisation or can be used to incorporate invariances more explicitly. In this work, we first empirically compare these recently proposed objectives that differ in whether they rely on explicit or implicit regularisation and at what level of the predictor they encode the invariances. We show that the predictions of the best performing method are also the most similar when compared on different augmentations of the same input. Inspired by this observation, we propose an explicit regulariser that encourages this invariance on the level of individual model predictions. Through extensive experiments on CIFAR-100 and ImageNet we show that this explicit regulariser (i) improves generalisation and (ii) equalises performance differences between all considered objectives. Our results suggest that objectives that encourage invariance on the level of the neural network itself generalise better than those that achieve invariance by averaging predictions of non-invariant models. △ Less

Submitted 7 March, 2022; originally announced March 2022.

arXiv:2109.12953 [pdf, other]

Non-destructive methods for assessing tree fiber length distributions in standing trees

Authors: Sara Sjöstedt de Luna, Konrad Abramowicz, Natalya Pya Arnqvist

Abstract: One of the main concerns of silviculture and forest management focuses on finding fast, cost-efficient and non-destructive ways of measuring wood properties in standing trees. This paper presents an R package \verb+fiberLD+ that provides functions for estimating tree fiber length distributions in the standing tree based on increment core samples. The methods rely on increment core data measured by… ▽ More One of the main concerns of silviculture and forest management focuses on finding fast, cost-efficient and non-destructive ways of measuring wood properties in standing trees. This paper presents an R package \verb+fiberLD+ that provides functions for estimating tree fiber length distributions in the standing tree based on increment core samples. The methods rely on increment core data measured by means of an optical fiber analyzer (OFA) or measured by microscopy. Increment core data analyzed by OFAs consist of the cell lengths of both cut and uncut fibers (tracheids) and fines (such as ray parenchyma cells) without being able to identify which cells are cut or if they are fines or fibers. The microscopy measured data consist of the observed lengths of the uncut fibers in the increment core. A censored version of a mixture of the fine and fiber length distributions is proposed to fit the OFA data, under distributional assumptions. Two choices for the assumptions of the underlying density functions of the true fiber (fine) lengths of those fibers (fines) that at least partially appear in the increment core are considered, such as the generalized gamma and the log normal densities. Maximum likelihood estimation is used for estimating the model parameters for both the OFA analyzed data and the microscopy measured data. △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:2107.08950 [pdf, other]

Mind the Income Gap: Bias Correction of Inequality Estimators in Small-Sized Samples

Authors: Silvia De Nicolò, Maria Rosaria Ferrante, Silvia Pacei

Abstract: Income inequality estimators are biased in small samples, leading generally to an underestimation. This aspect deserves particular attention when estimating inequality in small domains and performing small area estimation at the area level. We propose a bias correction framework for a large class of inequality measures comprising the Gini Index, the Generalized Entropy and the Atkinson index famil… ▽ More Income inequality estimators are biased in small samples, leading generally to an underestimation. This aspect deserves particular attention when estimating inequality in small domains and performing small area estimation at the area level. We propose a bias correction framework for a large class of inequality measures comprising the Gini Index, the Generalized Entropy and the Atkinson index families by accounting for complex survey designs. The proposed methodology does not require any parametric assumption on income distribution, being very flexible. Design-based performance evaluation of our proposal has been carried out using EU-SILC data, their results show a noticeable bias reduction for all the measures. Lastly, an illustrative example of application in small area estimation confirms that ignoring ex-ante bias correction determines model misspecification. △ Less

Submitted 10 May, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: 21 pages, 4 figures

arXiv:2107.06223 [pdf]

Impact of heat waves and cold spells on cause-specific mortality in the city of Sao Paulo, Brazil

Authors: Sara Lopes de Moraes, Ricardo Almendra, Ligia Vizeu Barrozo

Abstract: The impact of heat waves and cold spells on mortality has become a major public health problem worldwide, especially among older adults living in low- to middle-income countries. This study aimed to investigate the effects of heat waves and cold spells under different definitions on cause-specific mortality among people aged 65 years and over in Sao Paulo from 2006 to 2015. A quasi-Poisson general… ▽ More The impact of heat waves and cold spells on mortality has become a major public health problem worldwide, especially among older adults living in low- to middle-income countries. This study aimed to investigate the effects of heat waves and cold spells under different definitions on cause-specific mortality among people aged 65 years and over in Sao Paulo from 2006 to 2015. A quasi-Poisson generalized linear model with a distributed lag model was used to investigate the association between cause-specific mortality and extreme air temperature events. To evaluate the effects of the intensity under different durations, we considered 12 heat wave and nine cold spell definitions. Our results showed an increase in cause-specific deaths related to heat waves and cold spells under several definitions. The highest risk of death related to heat waves was identified mostly at higher temperature thresholds with longer events. We verified that men were more vulnerable to die from an ischemic stroke on heat waves and cold spells days than women, while women presented a higher risk of dying from ischemic heart diseases during cold spells and tended to have a higher risk of chronic obstructive pulmonary disease than men. Identification of heat wave- and cold spell-related mortality is important for the development and promotion of public health measures. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 28 pages, 2 tables, and 4 figures

arXiv:2105.13011 [pdf, other]

doi 10.1016/j.jcp.2022.111010

Neural Network Training Using $\ell_1$-Regularization and Bi-fidelity Data

Authors: Subhayan De, Alireza Doostan

Abstract: With the capability of accurately representing a functional relationship between the inputs of a physical system's model and output quantities of interest, neural networks have become popular for surrogate modeling in scientific applications. However, as these networks are over-parameterized, their training often requires a large amount of data. To prevent overfitting and improve generalization er… ▽ More With the capability of accurately representing a functional relationship between the inputs of a physical system's model and output quantities of interest, neural networks have become popular for surrogate modeling in scientific applications. However, as these networks are over-parameterized, their training often requires a large amount of data. To prevent overfitting and improve generalization error, regularization based on, e.g., $\ell_1$- and $\ell_2$-norms of the parameters is applied. Similarly, multiple connections of the network may be pruned to increase sparsity in the network parameters. In this paper, we explore the effects of sparsity promoting $\ell_1$-regularization on training neural networks when only a small training dataset from a high-fidelity model is available. As opposed to standard $\ell_1$-regularization that is known to be inadequate, we consider two variants of $\ell_1$-regularization informed by the parameters of an identical network trained using data from lower-fidelity models of the problem at hand. These bi-fidelity strategies are generalizations of transfer learning of neural networks that uses the parameters learned from a large low-fidelity dataset to efficiently train networks for a small high-fidelity dataset. We also compare the bi-fidelity strategies with two $\ell_1$-regularization methods that only use the high-fidelity dataset. Three numerical examples for propagating uncertainty through physical systems are used to show that the proposed bi-fidelity $\ell_1$-regularization strategies produce errors that are one order of magnitude smaller than those of networks trained only using datasets from the high-fidelity models. △ Less

Submitted 1 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: 28 pages, 14 figures

arXiv:2105.02083 [pdf, other]

AdaBoost and robust one-bit compressed sensing

Authors: Geoffrey Chinot, Felix Kuchelmeister, Matthias Löffler, Sara van de Geer

Abstract: This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed fea… ▽ More This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed feature distributions, requiring only a weak moment assumption and an anti-concentration condition. Improved convergence rates are shown when the features satisfy a small deviation lower bound. In particular, the results provide an explanation why interpolating adversarial noise can be harmless for classification problems. Simulations illustrate the presented theory. △ Less

Submitted 8 December, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: 40 pages, 4 figures, code available at https://github.com/Felix-127/Adaboost-and-robust-one-bit-compressed-sensing, extended results to features that satisfy weak-moment and anti-concentration assumption

MSC Class: 62H30 (Primary); 94A12 (Secondary)

arXiv:2104.00777 [pdf]

Impact of climate change on West Nile virus distribution in South America

Authors: Camila Lorenz, Thiago Salomao de Azevedo, Francisco Chiaravalloti-Neto

Abstract: West Nile virus (WNV) is a vector-borne pathogen of global relevance and is currently the most widely distributed flavivirus of encephalitis worldwide. This virus infects birds, humans, horses, and other mammals, and its transmission cycle occurs in urban and rural areas. Climate conditions have direct and indirect impacts on vector abundance and virus dynamics within the mosquito. The significanc… ▽ More West Nile virus (WNV) is a vector-borne pathogen of global relevance and is currently the most widely distributed flavivirus of encephalitis worldwide. This virus infects birds, humans, horses, and other mammals, and its transmission cycle occurs in urban and rural areas. Climate conditions have direct and indirect impacts on vector abundance and virus dynamics within the mosquito. The significance of environmental variables as drivers in WNV epidemiology is increasing under the current climate change scenario. In this study, we used a machine learning algorithm to model WNV distributions in South America. Our model evaluated eight environmental variables (type of biome, annual temperature, seasonality of temperature, daytime temperature variation, thermal amplitude, seasonality of precipitation, annual rainfall, and elevation) for their contribution to the occurrence of WNV since its introduction in South America (2004). Our results showed that environmental variables can directly alter the occurrence of WNV, with lower precipitation and higher temperatures associated with increased virus incidence. High-risk areas may be modified in the coming years, becoming more evident with high greenhouse gas emission levels. Countries such as Bolivia and Paraguay will be greatly affected, drastically changing their current WNV distribution. Several Brazilian areas will also increase the likelihood of presenting WNV, mainly in the Northeast and Midwest regions and the Pantanal biome. The Galapagos Islands will also probably increase their geographic range suitable for WNV occurrence. It is necessary to develop preventive policies to minimize potential WNV infection in humans and enhance active epidemiological surveillance in birds, humans, and other mammals before it becomes a more significant public health problem in South America. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 25 pages, 5 figures and 1 table

arXiv:2103.00594 [pdf]

Examining socioeconomic factors to understand the hospital case-fatality rates of COVID-19 in the city of Sao Paulo, Brazil

Authors: Camila Lorenz, Patricia Marques Moralejo Bermudi, Marcelo Antunes Failla, Breno Souza de Aguiar, Tatiana Natasha Toporcov, Francisco Chiaravalloti Neto, Ligia Vizeu Barrozo

Abstract: Understanding differences in hospital case-fatality rates (HCFRs) of coronavirus disease (COVID-19) may help evaluate its severity and the capacity of the healthcare system to reduce mortality. We examined the variability in HCFRs of COVID-19 in relation to spatial inequalities in socioeconomic factors across the city of Sao Paulo, Brazil. We found that HCFRs were higher for men and for individual… ▽ More Understanding differences in hospital case-fatality rates (HCFRs) of coronavirus disease (COVID-19) may help evaluate its severity and the capacity of the healthcare system to reduce mortality. We examined the variability in HCFRs of COVID-19 in relation to spatial inequalities in socioeconomic factors across the city of Sao Paulo, Brazil. We found that HCFRs were higher for men and for individuals aged 60 years and older. Our models identified per capita income as a significant factor that is negatively associated with the HCFRs of COVID-19, even after adjusting by age, sex and presence of risk factors. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 10 pages, 1 figure, 1 table

arXiv:2102.06171 [pdf, other]

High-Performance Large-Scale Image Recognition Without Normalization

Authors: Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for l… ▽ More Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clip** technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:2101.12176 [pdf, other]

On the Origin of Implicit Regularization in Stochastic Gradient Descent

Authors: Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

Abstract: For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization benefit is not explained by convergence bounds, since the learning rate which maximizes test accuracy is often larger than the learning rate which minimizes training… ▽ More For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization benefit is not explained by convergence bounds, since the learning rate which maximizes test accuracy is often larger than the learning rate which minimizes training loss. To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss. This modified loss is composed of the original loss function and an implicit regularizer, which penalizes the norms of the minibatch gradients. Under mild assumptions, when the batch size is small the scale of the implicit regularization term is proportional to the ratio of the learning rate to the batch size. We verify empirically that explicitly including the implicit regularizer in the loss can enhance the test accuracy when the learning rate is small. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: Accepted as a conference paper at ICLR 2021

arXiv:2101.08692 [pdf, other]

Characterizing signal propagation to close the performance gap in unnormalized ResNets

Authors: Andrew Brock, Soham De, Samuel L. Smith

Abstract: Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to… ▽ More Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet. △ Less

Submitted 27 January, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

Comments: Published as a conference paper at ICLR 2021

arXiv:2012.03854 [pdf, other]

doi 10.1016/j.ijforecast.2021.11.001

Forecasting: theory and practice

Authors: Fotios Petropoulos, Daniele Apiletti, Vassilios Assimakopoulos, Mohamed Zied Babai, Devon K. Barrow, Souhaib Ben Taieb, Christoph Bergmeir, Ricardo J. Bessa, Jakub Bijak, John E. Boylan, Jethro Browell, Claudio Carnevale, Jennifer L. Castle, Pasquale Cirillo, Michael P. Clements, Clara Cordeiro, Fernando Luiz Cyrino Oliveira, Shari De Baets, Alexander Dokumentov, Joanne Ellison, Piotr Fiszeder, Philip Hans Franses, David T. Frazier, Michael Gilliland, M. Sinan Gönül , et al. (55 additional authors not shown)

Abstract: Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systemati… ▽ More Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases. △ Less

Submitted 5 January, 2022; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2012.00807 [pdf, ps, other]

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Authors: Geoffrey Chinot, Matthias Löffler, Sara van de Geer

Abstract: This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the min… ▽ More This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the minimum norm interpolator of the errors and the size of the subdifferential around the true parameter. The general theory is illustrated for Gaussian features and several norms: The $\ell_1$, $\ell_2$, group Lasso and nuclear norms. In case of sparsity or low-rank inducing norms, minimum norm interpolators and RERM yield a prediction error of the order of the average noise level, provided that the overparameterization is at least a logarithmic factor larger than the number of samples and that, in case of RERM, the regularization parameter is small enough. Lower bounds that show near optimality of the results complement the analysis. △ Less

Submitted 7 October, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 35 pages

MSC Class: 62J05

arXiv:2010.10241 [pdf, ps, other]

BYOL works even without batch statistics

Authors: Pierre H. Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

Abstract: Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids co… ▽ More Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2008.09083 [pdf, other]

Exact Tests for Offline Changepoint Detection in Multichannel Binary and Count Data with Application to Networks

Authors: Shyamal K. De, Soumendu Sundar Mukherjee

Abstract: We consider offline detection of a single changepoint in binary and count time-series. We compare exact tests based on the cumulative sum (CUSUM) and the likelihood ratio (LR) statistics, and a new proposal that combines exact two-sample conditional tests with multiplicity correction, against standard asymptotic tests based on the Brownian bridge approximation to the CUSUM statistic. We see empiri… ▽ More We consider offline detection of a single changepoint in binary and count time-series. We compare exact tests based on the cumulative sum (CUSUM) and the likelihood ratio (LR) statistics, and a new proposal that combines exact two-sample conditional tests with multiplicity correction, against standard asymptotic tests based on the Brownian bridge approximation to the CUSUM statistic. We see empirically that the exact tests are much more powerful in situations where normal approximations driving asymptotic tests are not trustworthy: (i) small sample settings; (ii) sparse parametric settings; (iii) time-series with changepoint near the boundary. We also consider a multichannel version of the problem, where channels can have different changepoints. Controlling the False Discovery Rate (FDR), we simultaneously detect changes in multiple channels. This "local" approach is shown to be more advantageous than multivariate global testing approaches when the number of channels with changepoints is much smaller than the total number of channels. As a natural application, we consider network-valued time-series and use our approach with (a) edges as binary channels and (b) node-degrees or other local subgraph statistics as count channels. The local testing approach is seen to be much more informative than global network changepoint algorithms. △ Less

Submitted 20 August, 2020; originally announced August 2020.

Comments: 31 pages, 9 figures, 8 tables

arXiv:2008.04598 [pdf, other]

Uncertainty Quantification of Locally Nonlinear Dynamical Systems using Neural Networks

Authors: Subhayan De

Abstract: Models are often given in terms of differential equations to represent physical systems. In the presence of uncertainty, accurate prediction of the behavior of these systems using the models requires understanding the effect of uncertainty in the response. In uncertainty quantification, statistics such as mean and variance of the response of these physical systems are sought. To estimate these sta… ▽ More Models are often given in terms of differential equations to represent physical systems. In the presence of uncertainty, accurate prediction of the behavior of these systems using the models requires understanding the effect of uncertainty in the response. In uncertainty quantification, statistics such as mean and variance of the response of these physical systems are sought. To estimate these statistics sampling-based methods like Monte Carlo often require many evaluations of the models' governing equations for multiple realizations of the uncertainty. However, for large complex engineering systems, these methods become computationally burdensome. In structural engineering, often an otherwise linear structure contains spatially local nonlinearities with uncertainty present in them. A standard nonlinear solver for them with sampling-based methods for uncertainty quantification incurs significant computational cost for estimating the statistics of the response. To ease this computational burden of uncertainty quantification of large-scale locally nonlinear dynamical systems, a method is proposed herein, which decomposes the response into two parts -- response of a nominal linear system and a corrective term. This corrective term is the response from a pseudoforce that contains the nonlinearity and uncertainty information. In this paper, neural network, a recently popular tool for universal function approximation in the scientific machine learning community due to the advancement of computational capability as well as the availability of open-sourced packages like PyTorch and TensorFlow is used to estimate the pseudoforce. Since only the nonlinear and uncertain pseudoforce is modeled using the neural networks the same network can be used to predict a different response of the system and hence no new network is required to train if the statistic of a different response is sought. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: 26 pages, 20 figures

arXiv:2008.02322 [pdf]

doi 10.1016/j.tmaid.2020.101945

Spatiotemporal dynamic of COVID-19 mortality in the city of Sao Paulo, Brazil: shifting the high risk from the best to the worst socio-economic conditions

Authors: Patricia Marques Moralejo Bermudi, Camila Lorenz, Breno Souza de Aguiar, Marcelo Antunes Failla, Ligia Vizeu Barrozo, Francisco Chiaravalloti-Neto

Abstract: Currently, Brazil has one of the fastest increasing COVID-19 epidemics in the world, that has caused at least 94 thousand confirmed deaths until now. The city of Sao Paulo is particularly vulnerable because it is the most populous in the country. Analyzing the spatiotemporal dynamics of COVID-19 is important to help the urgent need to integrate better actions to face the pandemic. Thus, this study… ▽ More Currently, Brazil has one of the fastest increasing COVID-19 epidemics in the world, that has caused at least 94 thousand confirmed deaths until now. The city of Sao Paulo is particularly vulnerable because it is the most populous in the country. Analyzing the spatiotemporal dynamics of COVID-19 is important to help the urgent need to integrate better actions to face the pandemic. Thus, this study aimed to analyze the COVID-19 mortality, from March to July 2020, considering the spatio-time architectures, the socio-economic context of the population, and using a fine granular level, in the most populous city in Brazil. For this, we conducted an ecological study, using secondary public data from the mortality information system. We describe mortality rates for each epidemiological week and the entire period by sex and age. We modelled the deaths using spatiotemporal and spatial architectures and Poisson probability distributions in a latent Gaussian Bayesian model approach. We obtained the relative risks for temporal and spatiotemporal trends and socio-economic conditions. To reduce possible sub notification, we considered the confirmed and suspected deaths. Our findings showed an apparent stabilization of the temporal trend, at the end of the period, but that may change in the future. Mortality rate increased with increasing age and was higher in men. The risk of death was greater in areas with the worst social conditions throughout the study period. However, this was not a uniform pattern over time, since we identified a shift from the high risk in the areas with best socio-economic conditions to the worst ones. Our study contributed by emphasizing the importance of geographic screening in areas with a higher risk of death, and, currently, worse socio-economic contexts, as a crucial aspect to reducing disease mortality and health inequities, through integrated public health actions. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Comments: 22 pages, 6 figures, 2 tables, 3 supplementary materials

arXiv:2007.06847 [pdf, other]

Sequence-guided protein structure determination using graph convolutional and recurrent networks

Authors: Po-Nan Li, Saulo H. P. de Oliveira, Soichi Wakatsuki, Henry van den Bedem

Abstract: Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins and their complexes. Building an atomic model into a cryo-EM density map is challenging, particularly when no structure for the target protein is known a priori. Existing protocols for this type of task often rely on significant human intervention and can take hours to… ▽ More Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins and their complexes. Building an atomic model into a cryo-EM density map is challenging, particularly when no structure for the target protein is known a priori. Existing protocols for this type of task often rely on significant human intervention and can take hours to many days to produce an output. Here, we present a fully automated, template-free model building approach that is based entirely on neural networks. We use a graph convolutional network (GCN) to generate an embedding from a set of rotamer-based amino acid identities and candidate 3-dimensional C$α$ locations. Starting from this embedding, we use a bidirectional long short-term memory (LSTM) module to order and label the candidate identities and atomic locations consistent with the input protein sequence to obtain a structural model. Our approach paves the way for determining protein structures from cryo-EM densities at a fraction of the time of existing approaches and without the need for human intervention. △ Less

Submitted 2 September, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 6 pages, 5 figures; accepted to IEEE BIBE 2020

arXiv:2007.01073 [pdf, other]

Accurate Characterization of Non-Uniformly Sampled Time Series using Stochastic Differential Equations

Authors: Stijn de Waele

Abstract: Non-uniform sampling arises when an experimenter does not have full control over the sampling characteristics of the process under investigation. Moreover, it is introduced intentionally in algorithms such as Bayesian optimization and compressive sensing. We argue that Stochastic Differential Equations (SDEs) are especially well-suited for characterizing second order moments of such time series. W… ▽ More Non-uniform sampling arises when an experimenter does not have full control over the sampling characteristics of the process under investigation. Moreover, it is introduced intentionally in algorithms such as Bayesian optimization and compressive sensing. We argue that Stochastic Differential Equations (SDEs) are especially well-suited for characterizing second order moments of such time series. We introduce new initial estimates for the numerical optimization of the likelihood, based on incremental estimation and initialization from autoregressive models. Furthermore, we introduce model truncation as a purely data-driven method to reduce the order of the estimated model based on the SDE likelihood. We show the increased accuracy achieved with the new estimator in simulation experiments, covering all challenging circumstances that may be encountered in characterizing a non-uniformly sampled time series. Finally, we apply the new estimator to experimental rainfall variability data. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2007.00254 [pdf, other]

Construction of confidence interval for a univariate stock price signal predicted through Long Short Term Memory Network

Authors: Shankhyajyoti De, Arabin Kumar Dey, Deepak Gauda

Abstract: In this paper, we show an innovative way to construct bootstrap confidence interval of a signal estimated based on a univariate LSTM model. We take three different types of bootstrap methods for dependent set up. We prescribe some useful suggestions to select the optimal block length while performing the bootstrap** of the sample. We also propose a benchmark to compare the confidence interval me… ▽ More In this paper, we show an innovative way to construct bootstrap confidence interval of a signal estimated based on a univariate LSTM model. We take three different types of bootstrap methods for dependent set up. We prescribe some useful suggestions to select the optimal block length while performing the bootstrap** of the sample. We also propose a benchmark to compare the confidence interval measured through different bootstrap strategies. We illustrate the experimental results through some stock price data set. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: 14 pages, 11 figures

arXiv:2006.15081 [pdf, other]

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Authors: Samuel L. Smith, Erich Elsen, Soham De

Abstract: It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks. However recent papers have questioned this claim, arguing that this effect is simply a consequence of suboptimal hyperparameter tuning or insufficient compute budgets when the batch size is large. In this paper, we perform carefully designed experiment… ▽ More It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks. However recent papers have questioned this claim, arguing that this effect is simply a consequence of suboptimal hyperparameter tuning or insufficient compute budgets when the batch size is large. In this paper, we perform carefully designed experiments and rigorous hyperparameter sweeps on a range of popular models, which verify that small or moderately large batch sizes can substantially outperform very large batches on the test set. This occurs even when both models are trained for the same number of iterations and large batches achieve smaller training losses. Our results confirm that the noise in stochastic gradients can enhance generalization. We study how the optimal learning rate schedule changes as the epoch budget grows, and we provide a theoretical account of our observations based on the stochastic differential equation perspective of SGD dynamics. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: Camera-ready version of ICML 2020

arXiv:2006.04295 [pdf, other]

Efficient MCMC Sampling for Bayesian Matrix Factorization by Breaking Posterior Symmetries

Authors: Saibal De, Hadi Salehi, Alex Gorodetsky

Abstract: Bayesian low-rank matrix factorization techniques have become an essential tool for relational data analysis and matrix completion. A standard approach is to assign zero-mean Gaussian priors on the columns or rows of factor matrices to create a conjugate system. This choice of prior leads to simple implementations; however it also causes symmetries in the posterior distribution that can severely r… ▽ More Bayesian low-rank matrix factorization techniques have become an essential tool for relational data analysis and matrix completion. A standard approach is to assign zero-mean Gaussian priors on the columns or rows of factor matrices to create a conjugate system. This choice of prior leads to simple implementations; however it also causes symmetries in the posterior distribution that can severely reduce the efficiency of Markov-chain Monte-Carlo (MCMC) sampling approaches. In this paper, we propose a simple modification to the prior choice that provably breaks these symmetries and maintains/improves accuracy. Specifically, we provide conditions that the Gaussian prior mean and covariance must satisfy so the posterior does not exhibit invariances that yield sampling difficulties. For example, we show that using non-zero linearly independent prior means significantly lowers the autocorrelation of MCMC samples, and can also lead to lower reconstruction errors. △ Less

Submitted 10 November, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

arXiv:2006.04222 [pdf, other]

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Authors: Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Abstract: Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?… ▽ More Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?'' By posing this counterfactual question, we can recognize state-action trajectories within sub-groups of entities that we may have encountered in another task and use what we learned in that task to inform our prediction in the current one. We then reconstruct a prediction of the full returns as a combination of factors considering these disjoint groups of entities and train this ``randomly factorized" value function as an auxiliary objective for value-based multi-agent reinforcement learning. By doing so, our model can recognize and leverage similarities across tasks to improve learning efficiency in a multi-task setting. Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL), outperforms all strong baselines by a significant margin in challenging multi-task StarCraft micromanagement settings. △ Less

Submitted 11 June, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: ICML 2021 Camera Ready

arXiv:2005.08583 [pdf, ps, other]

doi 10.1093/mnras/staa3204

Ridges in the Dark Energy Survey for cosmic trough identification

Authors: Ben Moews, Morgan A. Schmitz, Andrew J. Lawler, Joe Zuntz, Alex I. Malz, Rafael S. de Souza, Ricardo Vilalta, Alberto Krone-Martins, Emille E. O. Ishida

Abstract: Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper,… ▽ More Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper, we extend the subspace-constrained mean shift algorithm, a recently introduced method to estimate density ridges, and apply it to 2D weak lensing mass density maps from the Dark Energy Survey Y1 data release to identify curvilinear filamentary structures. We compare the obtained ridges with previous approaches to extract trough structure in the same data, and apply curvelets as an alternative wavelet-based method to constrain densities. We then invoke the Wasserstein distance between noisy and noiseless simulations to validate the denoising capabilities of our method. Our results demonstrate the viability of ridge estimation as a precursor for denoising weak lensing observables to recover the large-scale structure, paving the way for a more versatile and effective search for troughs. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 12 pages, 5 figures, accepted for publication in MNRAS

MSC Class: 85A40; 62G07; 62P35; 85A35

arXiv:2005.07062 [pdf, other]

Simulation-Based Inference for Global Health Decisions

Authors: Christian Schroeder de Witt, Bradley Gram-Hansen, Nantas Nardelli, Andrew Gambardella, Rob Zinkov, Puneet Dokania, N. Siddharth, Ana Belen Espinosa-Gonzalez, Ara Darzi, Philip Torr, Atılım Güneş Baydin

Abstract: The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recen… ▽ More The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are develo** software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Journal ref: ICML Workshop on Machine Learning for Global Health, Thirty-Seventh International Conference on Machine Learning (ICML 2020)

arXiv:2004.06833 [pdf, ps, other]

Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Abstract: The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheime… ▽ More The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheimer's speech classification task and the neuropsychological score regression task. In the Alzheimer's speech classification task, ADReSS challenge participants create models for classifying speech as dementia or healthy control speech. In the the neuropsychological score regression task, participants create models to predict mini-mental state examination scores. This paper describes the ADReSS Challenge in detail and presents a baseline for both tasks, including feature extraction procedures and results for classification and regression models. ADReSS aims to provide the speech and language Alzheimer's research community with a platform for comprehensive methodological comparisons. This will hopefully contribute to addressing the lack of standardisation that currently affects the field and shed light on avenues for future research and clinical applicability. △ Less

Submitted 5 August, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China

arXiv:2003.12011 [pdf]

Adaptive machine learning strategies for network calibration of IoT smart air quality monitoring devices

Authors: Saverio De Vito, Girolamo Di Francia, Elena Esposito, Sergio Ferlito, Fabrizio Formisano, Ettore Massera

Abstract: Air Quality Multi-sensors Systems (AQMS) are IoT devices based on low cost chemical microsensors array that recently have showed capable to provide relatively accurate air pollutant quantitative estimations. Their availability permits to deploy pervasive Air Quality Monitoring (AQM) networks that will solve the geographical sparseness issue that affect the current network of AQ Regulatory Monitori… ▽ More Air Quality Multi-sensors Systems (AQMS) are IoT devices based on low cost chemical microsensors array that recently have showed capable to provide relatively accurate air pollutant quantitative estimations. Their availability permits to deploy pervasive Air Quality Monitoring (AQM) networks that will solve the geographical sparseness issue that affect the current network of AQ Regulatory Monitoring Systems (AQRMS). Unfortunately their accuracy have shown limited in long term field deployments due to negative influence of several technological issues including sensors poisoning or ageing, non target gas interference, lack of fabrication repeatability, etc. Seasonal changes in probability distribution of priors, observables and hidden context variables (i.e. non observable interferents) challenge field data driven calibration models which short to mid term performances recently rose to the attention of Urban authorithies and monitoring agencies. In this work, we address this non stationary framework with adaptive learning strategies in order to prolong the validity of multisensors calibration models enabling continuous learning. Relevant parameters influence in different network and note-to-node recalibration scenario is analyzed. Results are hence useful for pervasive deployment aimed to permanent high resolution AQ map** in urban scenarios as well as for the use of AQMS as AQRMS backup systems providing data when AQRMS data are unavailable due to faults or scheduled mainteinance. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: Submitted to Pattern Recognition Letters

arXiv:2003.08839 [pdf, other]

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Authors: Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

Abstract: In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised l… ▽ More In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods. △ Less

Submitted 27 August, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

Comments: Extended version of the ICML 2018 conference paper (arXiv:1803.11485)

Journal ref: Journal of Machine Learning Research 21(178):1-51, 2020

arXiv:2003.06709 [pdf, other]

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Authors: Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

Abstract: We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilit… ▽ More We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's action space separately as in MADDPG. This allows for more coordinated policy changes and fully reaps the benefits of a centralised critic. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks. Empirical results demonstrate FACMAC's superior performance over MADDPG and other baselines on all three domains. △ Less

Submitted 7 May, 2021; v1 submitted 14 March, 2020; originally announced March 2020.

arXiv:2002.10444 [pdf, other]

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Authors: Soham De, Samuel L. Smith

Abstract: Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit arises because, at initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor on the order of th… ▽ More Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit arises because, at initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor on the order of the square root of the network depth. This ensures that, early in training, the function computed by normalized residual blocks in deep networks is close to the identity function (on average). We use this insight to develop a simple initialization scheme that can train deep residual networks without normalization. We also provide a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small. △ Less

Submitted 9 December, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: Camera-ready version of NeurIPS 2020

arXiv:2002.04495 [pdf, other]

On transfer learning of neural networks using bi-fidelity data for uncertainty propagation

Authors: Subhayan De, Jolene Britton, Matthew Reynolds, Ryan Skinner, Kenneth Jansen, Alireza Doostan

Abstract: Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for map** inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly par… ▽ More Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for map** inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly parameterized construction of neural networks, especially deep neural networks, accurate training often requires large amounts of simulation data that may not be available in the case of computationally expensive systems. In this paper, to alleviate this issue for uncertainty propagation, we explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models. We explore two strategies for coupling these two datasets during the training procedure, namely, the standard transfer learning and the bi-fidelity weighted learning. In the former approach, a neural network model map** the inputs to the outputs of interest is trained based on the low-fidelity data. The high-fidelity data is then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model. In the latter approach, the entire low-fidelity network parameters are updated using data generated via a Gaussian process model trained with a small high-fidelity dataset. The parameter updates are performed via a variant of stochastic gradient descent with learning rates given by the Gaussian process model. Using three numerical examples, we illustrate the utility of these bi-fidelity transfer learning methods where we focus on accuracy improvement achieved by transfer learning over standard training approaches. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1912.09621 [pdf]

Understanding Deep Neural Network Predictions for Medical Imaging Applications

Authors: Barath Narayanan Narayanan, Manawaduge Supun De Silva, Russell C. Hardie, Nathan K. Kueterman, Redha Ali

Abstract: Computer-aided detection has been a research area attracting great interest in the past decade. Machine learning algorithms have been utilized extensively for this application as they provide a valuable second opinion to the doctors. Despite several machine learning models being available for medical imaging applications, not many have been implemented in the real-world due to the uninterpretable… ▽ More Computer-aided detection has been a research area attracting great interest in the past decade. Machine learning algorithms have been utilized extensively for this application as they provide a valuable second opinion to the doctors. Despite several machine learning models being available for medical imaging applications, not many have been implemented in the real-world due to the uninterpretable nature of the decisions made by the network. In this paper, we investigate the results provided by deep neural networks for the detection of malaria, diabetic retinopathy, brain tumor, and tuberculosis in different imaging modalities. We visualize the class activation map**s for all the applications in order to enhance the understanding of these networks. This type of visualization, along with the corresponding network performance metrics, would aid the data science experts in better understanding of their models as well as assisting doctors in their decision-making process. △ Less

Submitted 19 December, 2019; originally announced December 2019.

Comments: 20 pages, 28 Figures and 9 Tables

arXiv:1911.12446 [pdf, other]

QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning

Authors: Samuel Bosch, Alexander Sanchez de la Cerda, Mohsen Imani, Tajana Simunic Rosing, Giovanni De Micheli

Abstract: Machine Learning algorithms based on Brain-inspired Hyperdimensional(HD) computing imitate cognition by exploiting statistical properties of high-dimensional vector spaces. It is a promising solution for achieving high energy efficiency in different machine learning tasks, such as classification, semi-supervised learning, and clustering. A weakness of existing HD computing-based ML algorithms is t… ▽ More Machine Learning algorithms based on Brain-inspired Hyperdimensional(HD) computing imitate cognition by exploiting statistical properties of high-dimensional vector spaces. It is a promising solution for achieving high energy efficiency in different machine learning tasks, such as classification, semi-supervised learning, and clustering. A weakness of existing HD computing-based ML algorithms is the fact that they have to be binarized to achieve very high energy efficiency. At the same time, binarized models reach lower classification accuracies. To solve the problem of the trade-off between energy efficiency and classification accuracy, we propose the QubitHD algorithm. It stochastically binarizes HD-based algorithms, while maintaining comparable classification accuracies to their non-binarized counterparts. The FPGA implementation of QubitHD provides a 65% improvement in terms of energy efficiency, and a 95% improvement in terms of training time, as compared to state-of-the-art HD-based ML algorithms. It also outperforms state-of-the-art low-cost classifiers (such as Binarized Neural Networks) in terms of speed and energy efficiency by an order of magnitude during training and inference. △ Less

Submitted 10 October, 2022; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: 8 pages, 5 figures, 3 tables

arXiv:1911.07231 [pdf, other]

Adaptive Rates for Total Variation Image Denoising

Authors: Francesco Ortelli, Sara van de Geer

Abstract: We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images that are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised ima… ▽ More We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images that are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised image converges to the true image at a parametric rate, up to a log factor. More generally, we show that the denoised image enjoys oracle properties, that is, it is almost as good as if some aspects of the true image were known. In other words, image denoising with total variation regularization leads to an adaptive reconstruction of the true image. △ Less

Submitted 26 January, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: 38 pages, 6 figures

Journal ref: Journal of Machine Learning Research, 21(247), 2020

arXiv:1910.09056 [pdf, other]

Amortized Rejection Sampling in Universal Probabilistic Programming

Authors: Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip H. S. Torr, Tom Rainforth, Yee Whye Teh, Frank Wood

Abstract: Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove fini… ▽ More Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. △ Less

Submitted 28 March, 2022; v1 submitted 20 October, 2019; originally announced October 2019.

Comments: AISTATS 2022 camera ready

Showing 1–50 of 117 results for author: De, S