Search | arXiv e-print repository

arXiv:2312.10435 [pdf, other]

Uncertainty Quantification in Heterogeneous Treatment Effect Estimation with Gaussian-Process-Based Partially Linear Model

Authors: Shunsuke Horii, Yoichi Chikahara

Abstract: Estimating heterogeneous treatment effects across individuals has attracted growing attention as a statistical tool for performing critical decision-making. We propose a Bayesian inference framework that quantifies the uncertainty in treatment effect estimation to support decision-making in a relatively small sample size setting. Our proposed model places Gaussian process priors on the nonparametr… ▽ More Estimating heterogeneous treatment effects across individuals has attracted growing attention as a statistical tool for performing critical decision-making. We propose a Bayesian inference framework that quantifies the uncertainty in treatment effect estimation to support decision-making in a relatively small sample size setting. Our proposed model places Gaussian process priors on the nonparametric components of a semiparametric model called a partially linear model. This model formulation has three advantages. First, we can analytically compute the posterior distribution of a treatment effect without relying on the computationally demanding posterior approximation. Second, we can guarantee that the posterior distribution concentrates around the true one as the sample size goes to infinity. Third, we can incorporate prior knowledge about a treatment effect into the prior distribution, improving the estimation efficiency. Our experimental results show that even in the small sample size setting, our method can accurately estimate the heterogeneous treatment effects and effectively quantify its estimation uncertainty. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 15 pages, 4 figures, accepted at The 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2103.08195 [pdf, ps, other]

Bayesian Model Averaging for Causality Estimation and its Approximation based on Gaussian Scale Mixture Distributions

Authors: Shunsuke Horii

Abstract: In the estimation of the causal effect under linear Structural Causal Models (SCMs), it is common practice to first identify the causal structure, estimate the probability distributions, and then calculate the causal effect. However, if the goal is to estimate the causal effect, it is not necessary to fix a single causal structure or probability distributions. In this paper, we first show from a B… ▽ More In the estimation of the causal effect under linear Structural Causal Models (SCMs), it is common practice to first identify the causal structure, estimate the probability distributions, and then calculate the causal effect. However, if the goal is to estimate the causal effect, it is not necessary to fix a single causal structure or probability distributions. In this paper, we first show from a Bayesian perspective that it is Bayes optimal to weight (average) the causal effects estimated under each model rather than estimating the causal effect under a fixed single model. This idea is also known as Bayesian model averaging. Although the Bayesian model averaging is optimal, as the number of candidate models increases, the weighting calculations become computationally hard. We develop an approximation to the Bayes optimal estimator by using Gaussian scale mixture distributions. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

arXiv:2005.06118 [pdf, ps, other]

Improved Computation-Communication Trade-Off for Coded Distributed Computing using Linear Dependence of Intermediate Values

Authors: Shunsuke Horii

Abstract: In large scale distributed computing systems, communication overhead is one of the major bottlenecks. In the map-shuffle-reduce framework, which is one of the major distributed computing frameworks, the communication load among servers can be reduced by increasing the computation load of each server, that is, there is a trade-off between computation load and communication load. Recently, it has be… ▽ More In large scale distributed computing systems, communication overhead is one of the major bottlenecks. In the map-shuffle-reduce framework, which is one of the major distributed computing frameworks, the communication load among servers can be reduced by increasing the computation load of each server, that is, there is a trade-off between computation load and communication load. Recently, it has been shown that coded distributed computing (CDC) improves this trade-off relationship by letting servers encode their intermediate computation results. The original CDC scheme does not assume any special structures on the functions that servers compute. However, in actual problems, these functions often have some structures, and the trade-off relation may be further improved by using that structures. In this paper, we propose a new scheme that further improves the trade-off relationship by utilizing the linear dependency structure of the intermediate computation results. The intermediate values computed in the map phase can be considered as vectors on $\mathbb{F}_{2}$. In some applications, these intermediate values have a linear dependency and in such cases, it is sufficient for each server to send a basis of the linear subspace and linear combination coefficients. As a result, the proposed approach improves over the best-known computation-communication overhead trade-off in some applications. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: 6 pages, 4 figures, accepted to ISIT2020

arXiv:1907.02944 [pdf]

Proceedings of the 11th Asia-Europe Workshop on Concepts in Information Theory

Authors: A. J. Han Vinck, Kees A. Schouhamer Immink, Tadashi Wadayama, Van Khu Vu, Akiko Manada, Kui Cai, Shunsuke Horii, Yoshiki Abe, Mitsugu Iwamoto, Kazuo Ohta, Xingwei Zhong, Zhen Mei, Renfei Bu, J. H. Weber, Vitaly Skachek, Hiroyoshi Morita, N. Hovhannisyan, Hiroshi Kamabe, Shan Lu, Hirosuke Yamamoto, Kengo Hasimoto, O. Ytrehus, Shigeaki Kuzuoaka, Mikihiko Nishiara, Han Mao Kiah , et al. (2 additional authors not shown)

Abstract: This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community.… ▽ More This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community. This year we selected Hiroyoshi Morita, who is a well known information theorist with many original contributions. △ Less

Submitted 26 June, 2019; originally announced July 2019.

arXiv:1901.04668 [pdf, ps, other]

Distributed Stochastic Gradient Descent Using LDGM Codes

Authors: Shunsuke Horii, Takahiro Yoshida, Manabu Kobayashi, Toshiyasu Matsushima

Abstract: We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a significant decrease in performance. Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning h… ▽ More We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a significant decrease in performance. Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning has been established by Tandon et al. Most studies on GC are aiming at recovering the gradient information completely assuming that the Gradient Descent (GD) algorithm is used as a learning algorithm. On the other hand, if the Stochastic Gradient Descent (SGD) algorithm is used, it is not necessary to completely recover the gradient information, and its unbiased estimator is sufficient for the learning. In this paper, we propose a distributed SGD scheme using Low-Density Generator Matrix (LDGM) codes. In the proposed system, it may take longer time than existing GC methods to recover the gradient information completely, however, it enables the master node to obtain a high-quality unbiased estimator of the gradient at low computational cost and it leads to overall performance improvement. △ Less

Submitted 15 January, 2019; originally announced January 2019.

arXiv:1508.01640 [pdf, ps, other]

doi 10.1587/transfun.E99.A.2170

Linear Programming Decoding of Binary Linear Codes for Symbol-Pair Read Channels

Authors: Shunsuke Horii, Toshiyasu Matsushima, Shigeichi Hirasawa

Abstract: In this paper, we develop a new decoding algorithm of a binary linear codes for symbol-pair read channels. Symbol-pair read channel has recently been introduced by Cassuto and Blaum to model channels with high write resolution but low read resolution. The proposed decoding algorithm is based on a linear programming (LP). It is proved that the proposed LP decoder has the maximum-likelihood (ML) cer… ▽ More In this paper, we develop a new decoding algorithm of a binary linear codes for symbol-pair read channels. Symbol-pair read channel has recently been introduced by Cassuto and Blaum to model channels with high write resolution but low read resolution. The proposed decoding algorithm is based on a linear programming (LP). It is proved that the proposed LP decoder has the maximum-likelihood (ML) certificate property, i.e., the output of the decoder is guaranteed to be the ML codeword when it is integral. We also introduce the fractional pair distance $d_{fp}$ of a code which is a lower bound on the pair distance. It is proved that the proposed LP decoder will correct up to $\lceil d_{fp}/2\rceil-1$ pair errors. △ Less

Submitted 29 September, 2015; v1 submitted 7 August, 2015; originally announced August 2015.

Comments: 15pages, 2 figures

Showing 1–6 of 6 results for author: Horii, S