-
Uncertainty Quantification in Heterogeneous Treatment Effect Estimation with Gaussian-Process-Based Partially Linear Model
Authors:
Shunsuke Horii,
Yoichi Chikahara
Abstract:
Estimating heterogeneous treatment effects across individuals has attracted growing attention as a statistical tool for performing critical decision-making. We propose a Bayesian inference framework that quantifies the uncertainty in treatment effect estimation to support decision-making in a relatively small sample size setting. Our proposed model places Gaussian process priors on the nonparametr…
▽ More
Estimating heterogeneous treatment effects across individuals has attracted growing attention as a statistical tool for performing critical decision-making. We propose a Bayesian inference framework that quantifies the uncertainty in treatment effect estimation to support decision-making in a relatively small sample size setting. Our proposed model places Gaussian process priors on the nonparametric components of a semiparametric model called a partially linear model. This model formulation has three advantages. First, we can analytically compute the posterior distribution of a treatment effect without relying on the computationally demanding posterior approximation. Second, we can guarantee that the posterior distribution concentrates around the true one as the sample size goes to infinity. Third, we can incorporate prior knowledge about a treatment effect into the prior distribution, improving the estimation efficiency. Our experimental results show that even in the small sample size setting, our method can accurately estimate the heterogeneous treatment effects and effectively quantify its estimation uncertainty.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Bayesian Model Averaging for Causality Estimation and its Approximation based on Gaussian Scale Mixture Distributions
Authors:
Shunsuke Horii
Abstract:
In the estimation of the causal effect under linear Structural Causal Models (SCMs), it is common practice to first identify the causal structure, estimate the probability distributions, and then calculate the causal effect. However, if the goal is to estimate the causal effect, it is not necessary to fix a single causal structure or probability distributions. In this paper, we first show from a B…
▽ More
In the estimation of the causal effect under linear Structural Causal Models (SCMs), it is common practice to first identify the causal structure, estimate the probability distributions, and then calculate the causal effect. However, if the goal is to estimate the causal effect, it is not necessary to fix a single causal structure or probability distributions. In this paper, we first show from a Bayesian perspective that it is Bayes optimal to weight (average) the causal effects estimated under each model rather than estimating the causal effect under a fixed single model. This idea is also known as Bayesian model averaging. Although the Bayesian model averaging is optimal, as the number of candidate models increases, the weighting calculations become computationally hard. We develop an approximation to the Bayes optimal estimator by using Gaussian scale mixture distributions.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Improved Computation-Communication Trade-Off for Coded Distributed Computing using Linear Dependence of Intermediate Values
Authors:
Shunsuke Horii
Abstract:
In large scale distributed computing systems, communication overhead is one of the major bottlenecks. In the map-shuffle-reduce framework, which is one of the major distributed computing frameworks, the communication load among servers can be reduced by increasing the computation load of each server, that is, there is a trade-off between computation load and communication load. Recently, it has be…
▽ More
In large scale distributed computing systems, communication overhead is one of the major bottlenecks. In the map-shuffle-reduce framework, which is one of the major distributed computing frameworks, the communication load among servers can be reduced by increasing the computation load of each server, that is, there is a trade-off between computation load and communication load. Recently, it has been shown that coded distributed computing (CDC) improves this trade-off relationship by letting servers encode their intermediate computation results. The original CDC scheme does not assume any special structures on the functions that servers compute. However, in actual problems, these functions often have some structures, and the trade-off relation may be further improved by using that structures. In this paper, we propose a new scheme that further improves the trade-off relationship by utilizing the linear dependency structure of the intermediate computation results. The intermediate values computed in the map phase can be considered as vectors on $\mathbb{F}_{2}$. In some applications, these intermediate values have a linear dependency and in such cases, it is sufficient for each server to send a basis of the linear subspace and linear combination coefficients. As a result, the proposed approach improves over the best-known computation-communication overhead trade-off in some applications.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Proceedings of the 11th Asia-Europe Workshop on Concepts in Information Theory
Authors:
A. J. Han Vinck,
Kees A. Schouhamer Immink,
Tadashi Wadayama,
Van Khu Vu,
Akiko Manada,
Kui Cai,
Shunsuke Horii,
Yoshiki Abe,
Mitsugu Iwamoto,
Kazuo Ohta,
Xingwei Zhong,
Zhen Mei,
Renfei Bu,
J. H. Weber,
Vitaly Skachek,
Hiroyoshi Morita,
N. Hovhannisyan,
Hiroshi Kamabe,
Shan Lu,
Hirosuke Yamamoto,
Kengo Hasimoto,
O. Ytrehus,
Shigeaki Kuzuoaka,
Mikihiko Nishiara,
Han Mao Kiah
, et al. (2 additional authors not shown)
Abstract:
This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community.…
▽ More
This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community. This year we selected Hiroyoshi Morita, who is a well known information theorist with many original contributions.
△ Less
Submitted 26 June, 2019;
originally announced July 2019.
-
Distributed Stochastic Gradient Descent Using LDGM Codes
Authors:
Shunsuke Horii,
Takahiro Yoshida,
Manabu Kobayashi,
Toshiyasu Matsushima
Abstract:
We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a significant decrease in performance. Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning h…
▽ More
We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a significant decrease in performance. Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning has been established by Tandon et al. Most studies on GC are aiming at recovering the gradient information completely assuming that the Gradient Descent (GD) algorithm is used as a learning algorithm. On the other hand, if the Stochastic Gradient Descent (SGD) algorithm is used, it is not necessary to completely recover the gradient information, and its unbiased estimator is sufficient for the learning. In this paper, we propose a distributed SGD scheme using Low-Density Generator Matrix (LDGM) codes. In the proposed system, it may take longer time than existing GC methods to recover the gradient information completely, however, it enables the master node to obtain a high-quality unbiased estimator of the gradient at low computational cost and it leads to overall performance improvement.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Linear Programming Decoding of Binary Linear Codes for Symbol-Pair Read Channels
Authors:
Shunsuke Horii,
Toshiyasu Matsushima,
Shigeichi Hirasawa
Abstract:
In this paper, we develop a new decoding algorithm of a binary linear codes for symbol-pair read channels. Symbol-pair read channel has recently been introduced by Cassuto and Blaum to model channels with high write resolution but low read resolution. The proposed decoding algorithm is based on a linear programming (LP). It is proved that the proposed LP decoder has the maximum-likelihood (ML) cer…
▽ More
In this paper, we develop a new decoding algorithm of a binary linear codes for symbol-pair read channels. Symbol-pair read channel has recently been introduced by Cassuto and Blaum to model channels with high write resolution but low read resolution. The proposed decoding algorithm is based on a linear programming (LP). It is proved that the proposed LP decoder has the maximum-likelihood (ML) certificate property, i.e., the output of the decoder is guaranteed to be the ML codeword when it is integral. We also introduce the fractional pair distance $d_{fp}$ of a code which is a lower bound on the pair distance. It is proved that the proposed LP decoder will correct up to $\lceil d_{fp}/2\rceil-1$ pair errors.
△ Less
Submitted 29 September, 2015; v1 submitted 7 August, 2015;
originally announced August 2015.