Search | arXiv e-print repository

A Bayesian sparse factor model with adaptive posterior concentration

Authors: Ilsang Ohn, Lizhen Lin, Yongdai Kim

Abstract: In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while kee** computational tractabili… ▽ More In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while kee** computational tractability. We show that the posterior distribution asymptotically concentrates on the true factor dimensionality, and more importantly, this posterior consistency is adaptive to the sparsity level of the true loading matrix and the noise variance. We also prove that the proposed Bayesian model attains the optimal detection rate of the factor dimensionality in a more general situation than those found in the literature. Moreover, we obtain a near-optimal posterior concentration rate of the covariance matrix. Numerical studies are conducted and show the superiority of the proposed method compared with other competitors. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.14765 [pdf, other]

Masked Bayesian Neural Networks : Theoretical Guarantee and its Posterior Inference

Authors: Insung Kong, Dongyoon Yang, Jong** Lee, Ilsang Ohn, Gyuseung Baek, Yongdai Kim

Abstract: Bayesian approaches for learning deep neural networks (BNN) have been received much attention and successfully applied to various applications. Particularly, BNNs have the merit of having better generalization ability as well as better uncertainty quantification. For the success of BNN, search an appropriate architecture of the neural networks is an important task, and various algorithms to find g… ▽ More Bayesian approaches for learning deep neural networks (BNN) have been received much attention and successfully applied to various applications. Particularly, BNNs have the merit of having better generalization ability as well as better uncertainty quantification. For the success of BNN, search an appropriate architecture of the neural networks is an important task, and various algorithms to find good sparse neural networks have been proposed. In this paper, we propose a new node-sparse BNN model which has good theoretical properties and is computationally feasible. We prove that the posterior concentration rate to the true model is near minimax optimal and adaptive to the smoothness of the true model. In particular the adaptiveness is the first of its kind for node-sparse BNNs. In addition, we develop a novel MCMC algorithm which makes the Bayesian inference of the node-sparse BNN model feasible in practice. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 30 pages, ICML 2023 proceedings. arXiv admin note: substantial text overlap with arXiv:2206.00853

arXiv:2302.08606 [pdf, other]

Intrinsic and extrinsic deep learning on manifolds

Authors: Yihao Fang, Ilsang Ohn, Vijay Gupta, Lizhen Lin

Abstract: We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geom… ▽ More We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geometry of manifolds via exponential and log maps with respect to a Riemannian structure. Consequently, we prove that the empirical risk of the empirical risk minimizers (ERM) of eDNNs and iDNNs converge in optimal rates. Overall, The eDNNs framework is simple and easy to compute, while the iDNNs framework is accurate and fast converging. To demonstrate the utilities of our framework, various simulation studies, and real data analyses are presented with eDNNs and iDNNs. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2206.08002 [pdf, other]

The convergent Indian buffet process

Authors: Ilsang Ohn

Abstract: We propose a new Bayesian nonparametric prior for latent feature models, which we call the convergent Indian buffet process (CIBP). We show that under the CIBP, the number of latent features is distributed as a Poisson distribution with the mean monotonically increasing but converging to a certain value as the number of objects goes to infinity. That is, the expected number of features is bounded… ▽ More We propose a new Bayesian nonparametric prior for latent feature models, which we call the convergent Indian buffet process (CIBP). We show that under the CIBP, the number of latent features is distributed as a Poisson distribution with the mean monotonically increasing but converging to a certain value as the number of objects goes to infinity. That is, the expected number of features is bounded above even when the number of objects goes to infinity, unlike the standard Indian buffet process under which the expected number of features increases with the number of objects. We provide two alternative representations of the CIBP based on a hierarchical distribution and a completely random measure, respectively, which are of independent interest. The proposed CIBP is assessed on a high-dimensional sparse factor model. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.00853

Masked Bayesian Neural Networks : Computation and Optimality

Authors: Insung Kong, Dongyoon Yang, Jong** Lee, Ilsang Ohn, Yongdai Kim

Abstract: As data size and computing power increase, the architectures of deep neural networks (DNNs) have been getting more complex and huge, and thus there is a growing need to simplify such complex and huge DNNs. In this paper, we propose a novel sparse Bayesian neural network (BNN) which searches a good DNN with an appropriate complexity. We employ the masking variables at each node which can turn off s… ▽ More As data size and computing power increase, the architectures of deep neural networks (DNNs) have been getting more complex and huge, and thus there is a growing need to simplify such complex and huge DNNs. In this paper, we propose a novel sparse Bayesian neural network (BNN) which searches a good DNN with an appropriate complexity. We employ the masking variables at each node which can turn off some nodes according to the posterior distribution to yield a nodewise sparse DNN. We devise a prior distribution such that the posterior distribution has theoretical optimalities (i.e. minimax optimality and adaptiveness), and develop an efficient MCMC algorithm. By analyzing several benchmark datasets, we illustrate that the proposed BNN performs well compared to other existing methods in the sense that it discovers well condensed DNN architectures with similar prediction accuracy and uncertainty quantification compared to large DNNs. △ Less

Submitted 23 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: I will change to another file

arXiv:2202.03165 [pdf, other]

SLIDE: a surrogate fairness constraint to ensure fairness consistency

Authors: Kunwoong Kim, Ilsang Ohn, Sara Kim, Yongdai Kim

Abstract: As they have a vital effect on social decision makings, AI algorithms should be not only accurate and but also fair. Among various algorithms for fairness AI, learning a prediction model by minimizing the empirical risk (e.g., cross-entropy) subject to a given fairness constraint has received much attention. To avoid computational difficulty, however, a given fairness constraint is replaced by a s… ▽ More As they have a vital effect on social decision makings, AI algorithms should be not only accurate and but also fair. Among various algorithms for fairness AI, learning a prediction model by minimizing the empirical risk (e.g., cross-entropy) subject to a given fairness constraint has received much attention. To avoid computational difficulty, however, a given fairness constraint is replaced by a surrogate fairness constraint as the 0-1 loss is replaced by a convex surrogate loss for classification problems. In this paper, we investigate the validity of existing surrogate fairness constraints and propose a new surrogate fairness constraint called SLIDE, which is computationally feasible and asymptotically valid in the sense that the learned model satisfies the fairness constraint asymptotically and achieves a fast convergence rate. Numerical experiments confirm that the SLIDE works well for various benchmark datasets. △ Less

Submitted 4 July, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 17 pages including appendix and references

arXiv:2202.02943 [pdf, other]

Learning fair representation with a parametric integral probability metric

Authors: Dongha Kim, Kunwoong Kim, Insung Kong, Ilsang Ohn, Yongdai Kim

Abstract: As they have a vital effect on social decision-making, AI algorithms should be not only accurate but also fair. Among various algorithms for fairness AI, learning fair representation (LFR), whose goal is to find a fair representation with respect to sensitive variables such as gender and race, has received much attention. For LFR, the adversarial training scheme is popularly employed as is done in… ▽ More As they have a vital effect on social decision-making, AI algorithms should be not only accurate but also fair. Among various algorithms for fairness AI, learning fair representation (LFR), whose goal is to find a fair representation with respect to sensitive variables such as gender and race, has received much attention. For LFR, the adversarial training scheme is popularly employed as is done in the generative adversarial network type algorithms. The choice of a discriminator, however, is done heuristically without justification. In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. The most notable result of the proposed LFR algorithm is its theoretical guarantee about the fairness of the final prediction model, which has not been considered yet. That is, we derive theoretical relations between the fairness of representation and the fairness of the prediction model built on the top of the representation (i.e., using the representation as the input). Moreover, by numerical experiments, we show that our proposed LFR algorithm is computationally lighter and more stable, and the final prediction model is competitive or superior to other LFR algorithms using more complex discriminators. △ Less

Submitted 10 January, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 28 pages, including references and appendix

arXiv:2109.03204 [pdf, other]

doi 10.1214/23-AOS2349

Adaptive variational Bayes: Optimality, computation and applications

Authors: Ilsang Ohn, Lizhen Lin

Abstract: In this paper, we explore adaptive inference based on variational Bayes. Although several studies have been conducted to analyze the contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that performs adaptive inference. To fill this gap, we propose a novel adaptive variational Bayes framework, which can operate… ▽ More In this paper, we explore adaptive inference based on variational Bayes. Although several studies have been conducted to analyze the contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that performs adaptive inference. To fill this gap, we propose a novel adaptive variational Bayes framework, which can operate on a collection of models. The proposed framework first computes a variational posterior over each individual model separately and then combines them with certain weights to produce a variational posterior over the entire model. It turns out that this combined variational posterior is the closest member to the posterior over the entire model in a predefined family of approximating distributions. We show that the adaptive variational Bayes attains optimal contraction rates adaptively under very general conditions. We also provide a methodology to maintain the tractability and adaptive optimality of the adaptive variational Bayes even in the presence of an enormous number of individual models, such as sparse models. We apply the general results to several examples, including deep learning and sparse factor models, and derive new and adaptive inference results. In addition, we characterize an implicit regularization effect of variational Bayes and show that the adaptive variational posterior can utilize this. △ Less

Submitted 11 March, 2024; v1 submitted 7 September, 2021; originally announced September 2021.

Journal ref: Ann. Statist. 52(1):335-363. (2024)

arXiv:2003.11769 [pdf, ps, other]

Nonconvex sparse regularization for deep neural networks and its optimality

Authors: Ilsang Ohn, Yongdai Kim

Abstract: Recent theoretical studies proved that deep neural network (DNN) estimators obtained by minimizing empirical risk with a certain sparsity constraint can attain optimal convergence rates for regression and classification problems. However, the sparsity constraint requires to know certain properties of the true model, which are not available in practice. Moreover, computation is difficult due to the… ▽ More Recent theoretical studies proved that deep neural network (DNN) estimators obtained by minimizing empirical risk with a certain sparsity constraint can attain optimal convergence rates for regression and classification problems. However, the sparsity constraint requires to know certain properties of the true model, which are not available in practice. Moreover, computation is difficult due to the discrete nature of the sparsity constraint. In this paper, we propose a novel penalized estimation method for sparse DNNs, which resolves the aforementioned problems existing in the sparsity constraint. We establish an oracle inequality for the excess risk of the proposed sparse-penalized DNN estimator and derive convergence rates for several learning tasks. In particular, we prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems. For computation, we develop an efficient gradient-based optimization algorithm that guarantees the monotonic reduction of the objective function. △ Less

Submitted 9 August, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

arXiv:1906.06903 [pdf, ps, other]

doi 10.3390/e21070627

Smooth function approximation by deep neural networks with general activation functions

Authors: Ilsang Ohn, Yongdai Kim

Abstract: There has been a growing interest in expressivity of deep neural networks. However, most of the existing work about this topic focuses only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a broad class of activation functions. This class of activation functions includes most of frequently used activat… ▽ More There has been a growing interest in expressivity of deep neural networks. However, most of the existing work about this topic focuses only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a broad class of activation functions. This class of activation functions includes most of frequently used activation functions. We derive the required depth, width and sparsity of a deep neural network to approximate any Hölder smooth function upto a given approximation error for the large class of activation functions. Based on our approximation error analysis, we derive the minimax optimality of the deep neural network estimators with the general activation functions in both regression and classification problems. △ Less

Submitted 27 June, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: 24 pages

Journal ref: Entropy 2019, 21(7), 627

arXiv:1812.03599 [pdf, other]

Fast convergence rates of deep neural networks for classification

Authors: Yongdai Kim, Ilsang Ohn, Dongha Kim

Abstract: We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the… ▽ More We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity). is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the cross-entropy, and show that the DNN classifier achieves a fast convergence rate under the condition that the conditional class probabilities of most data are sufficiently close to either 1 or zero. This assumption is not unusual for image recognition because human beings are extremely good at recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy. △ Less

Submitted 18 June, 2019; v1 submitted 9 December, 2018; originally announced December 2018.

Comments: 35 pages, 2 figures and 3 tables

Showing 1–11 of 11 results for author: Ohn, I