Search | arXiv e-print repository

arXiv:2405.08806 [pdf, ps, other]

Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment Effects

Authors: Zhehao Zhang, Thomas S. Richardson

Abstract: We revisit the following problem, proposed by Kolmogorov: given prescribed marginal distributions $F$ and $G$ for random variables $X,Y$ respectively, characterize the set of compatible distribution functions for the sum $Z=X+Y$. Bounds on the distribution function for $Z$ were given by Markarov (1982), and Frank et al. (1987), the latter using copula theory. However, though they obtain the same b… ▽ More We revisit the following problem, proposed by Kolmogorov: given prescribed marginal distributions $F$ and $G$ for random variables $X,Y$ respectively, characterize the set of compatible distribution functions for the sum $Z=X+Y$. Bounds on the distribution function for $Z$ were given by Markarov (1982), and Frank et al. (1987), the latter using copula theory. However, though they obtain the same bounds, they make different assertions concerning their sharpness. In addition, their solutions leave some open problems in the case when the given marginal distribution functions are discontinuous. These issues have led to some confusion and erroneous statements in subsequent literature, which we correct. Kolmogorov's problem is closely related to inferring possible distributions for individual treatment effects $Y_1 - Y_0$ given the marginal distributions of $Y_1$ and $Y_0$; the latter being identified from a randomized experiment. We use our new insights to sharpen and correct results due to Fan and Park (2010) concerning individual treatment effects, and to fill some other logical gaps. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2303.06701 [pdf, other]

Composite Sorting

Authors: Job Boerma, Aleh Tsyvinski, Ruodu Wang, Zhenyuan Zhang

Abstract: We propose a new sorting framework: composite sorting. Composite sorting comprises of (1) distinct worker types assigned to the same occupation, and (2) a given worker type simultaneously being part of both positive and negative sorting. Composite sorting arises when fixed investments mitigate variable costs of mismatch. We completely characterize optimal sorting and additionally show it is more p… ▽ More We propose a new sorting framework: composite sorting. Composite sorting comprises of (1) distinct worker types assigned to the same occupation, and (2) a given worker type simultaneously being part of both positive and negative sorting. Composite sorting arises when fixed investments mitigate variable costs of mismatch. We completely characterize optimal sorting and additionally show it is more positive when mismatch costs are less concave. We then characterize equilibrium wages. Wages have a regional hierarchical structure - relative wages depend solely on sorting within skill groups. Quantitatively, composite sorting can generate a sizable portion of within-occupations wage dispersion in the US. △ Less

Submitted 29 August, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: 81 pages, 26 figures

arXiv:2211.07903 [pdf, ps, other]

Identification and Auto-debiased Machine Learning for Outcome Conditioned Average Structural Derivatives

Authors: Zequn **, Lihua Lin, Zhengyu Zhang

Abstract: This paper proposes a new class of heterogeneous causal quantities, named \textit{outcome conditioned} average structural derivatives (OASD) in a general nonseparable model. OASD is the average partial effect of a marginal change in a continuous treatment on the individuals located at different parts of the outcome distribution, irrespective of individuals' characteristics. OASD combines both feat… ▽ More This paper proposes a new class of heterogeneous causal quantities, named \textit{outcome conditioned} average structural derivatives (OASD) in a general nonseparable model. OASD is the average partial effect of a marginal change in a continuous treatment on the individuals located at different parts of the outcome distribution, irrespective of individuals' characteristics. OASD combines both features of ATE and QTE: it is interpreted as straightforwardly as ATE while at the same time more granular than ATE by breaking the entire population up according to the rank of the outcome distribution. One contribution of this paper is that we establish some close relationships between the \textit{outcome conditioned average partial effects} and a class of parameters measuring the effect of counterfactually changing the distribution of a single covariate on the unconditional outcome quantiles. By exploiting such relationship, we can obtain root-$n$ consistent estimator and calculate the semi-parametric efficiency bound for these counterfactual effect parameters. We illustrate this point by two examples: equivalence between OASD and the unconditional partial quantile effect (Firpo et al. (2009)), and equivalence between the marginal partial distribution policy effect (Rothe (2012)) and a corresponding outcome conditioned parameter. Because identification of OASD is attained under a conditional exogeneity assumption, by controlling for a rich information about covariates, a researcher may ideally use high-dimensional controls in data. We propose for OASD a novel automatic debiased machine learning estimator, and present asymptotic statistical guarantees for it. We prove our estimator is root-$n$ consistent, asymptotically normal, and semiparametrically efficient. We also prove the validity of the bootstrap procedure for uniform inference on the OASD process. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 75 pages

arXiv:2204.10971 [pdf, other]

An Efficient Approach for Optimizing the Cost-effective Individualized Treatment Rule Using Conditional Random Forest

Authors: Yizhe Xu, Tom H. Greene, Adam P. Bress, Brandon K. Bellows, Yue Zhang, Zugui Zhang, Paul Kolm, William S. Weintraub, Andrew S. Moran, **cheng Shen

Abstract: Evidence from observational studies has become increasingly important for supporting healthcare policy making via cost-effectiveness (CE) analyses. Similar as in comparative effectiveness studies, health economic evaluations that consider subject-level heterogeneity produce individualized treatment rules (ITRs) that are often more cost-effective than one-size-fits-all treatment. Thus, it is of gre… ▽ More Evidence from observational studies has become increasingly important for supporting healthcare policy making via cost-effectiveness (CE) analyses. Similar as in comparative effectiveness studies, health economic evaluations that consider subject-level heterogeneity produce individualized treatment rules (ITRs) that are often more cost-effective than one-size-fits-all treatment. Thus, it is of great interest to develop statistical tools for learning such a cost-effective ITR (CE-ITR) under the causal inference framework that allows proper handling of potential confounding and can be applied to both trials and observational studies. In this paper, we use the concept of net-monetary-benefit (NMB) to assess the trade-off between health benefits and related costs. We estimate CE-ITR as a function of patients' characteristics that, when implemented, optimizes the allocation of limited healthcare resources by maximizing health gains while minimizing treatment-related costs. We employ the conditional random forest approach and identify the optimal CE-ITR using NMB-based classification algorithms, where two partitioned estimators are proposed for the subject-specific weights to effectively incorporate information from censored individuals. We conduct simulation studies to evaluate the performance of our proposals. We apply our top-performing algorithm to the NIH-funded Systolic Blood Pressure Intervention Trial (SPRINT) to illustrate the CE gains of assigning customized intensive blood pressure therapy. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: Submitted to Statistical Methods in Medical Research

arXiv:2203.08933 [pdf]

doi 10.13140/RG.2.2.18223.20648

The Digital Divide in Canada and the Role of LEO Satellites in Bridging the Gap

Authors: Tuheen Ahmmed, Afsoon Alidadi, Zichao Zhang, Aizaz U. Chaudhry, Halim Yanikomeroglu

Abstract: Overcoming the digital divide in rural and remote areas has always been a big challenge for Canada with its huge geographical area. In 2016, the Canadian Radio-television and Telecommunications Commission announced broadband Internet as a basic service available for all Canadians. However, approximately one million Canadians still do not have access to broadband services as of 2020. The COVID-19 p… ▽ More Overcoming the digital divide in rural and remote areas has always been a big challenge for Canada with its huge geographical area. In 2016, the Canadian Radio-television and Telecommunications Commission announced broadband Internet as a basic service available for all Canadians. However, approximately one million Canadians still do not have access to broadband services as of 2020. The COVID-19 pandemic has made the situation more challenging, as social, economic, and educational activities have increasingly been transferred online. The condition is more unfavorable for Indigenous communities. A key challenge in deploying rural and remote broadband Internet is to plan and implement high-capacity backbones, which are now available only in denser urban areas. For any Internet provider, it is almost impossible to make a viable business proposal in these areas. For example, the vast land of the Northwest Territories, Yukon, and Nunavuts diverse geographical features present obstacles for broadband infrastructure. In this paper, we investigate the digital divide in Canada with a focus on rural and remote areas. In so doing, we highlight two potential solutions using low Earth orbit (LEO) constellations to deliver broadband Internet in rural and remote areas to address the access inequality and the digital divide. The first solution involves integrating LEO constellations as a backbone for the existing 4G/5G telecommunications network. This solution uses satellites in a LEO constellation to provide a backhaul network connecting the 4G/5G access network to its core network. The 3rd Generation Partnership Project already specifies how to integrate LEO satellite networks into the 4G/5G network, and the Canadian satellite operator Telesat has already showcased this solution with one terrestrial operator, TIM Brasil, in their 4G network. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted for publication in IEEE Communications Magazine, Total 7 pages, 5 figures, 1 table

arXiv:2202.10678 [pdf, ps, other]

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

Authors: Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu

Abstract: In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers. This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receiv… ▽ More In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers. This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Nevertheless, in the population level where the model is known, it turns out that we can efficiently determine the optimal (resp. $ε$-optimal) policy with finite (resp. infinite) states and outcomes, through a modified formulation of the Bellman equation. Our main technical contribution is to study the MPP under the online reinforcement learning (RL) setting, where the goal is to learn the optimal signaling policy by interacting with with the underlying MPP, without the knowledge of the sender's utility functions, prior distributions, and the Markov transition kernels. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles. Our algorithm enjoys sample efficiency by achieving a sublinear $\sqrt{T}$-regret upper bound. Furthermore, both our algorithm and theory can be applied to MPPs with large space of outcomes and states via function approximation, and we showcase such a success under the linear setting. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2201.03483 [pdf, ps, other]

Simultaneous Optimal Transport

Authors: Ruodu Wang, Zhenyuan Zhang

Abstract: We propose a general framework of mass transport between vector-valued measures, which will be called simultaneous optimal transport (SOT). The new framework is motivated by the need to transport resources of different types simultaneously, i.e., in single trips, from specified origins to destinations; similarly, in economic matching, one needs to couple two groups, e.g., buyers and sellers, by eq… ▽ More We propose a general framework of mass transport between vector-valued measures, which will be called simultaneous optimal transport (SOT). The new framework is motivated by the need to transport resources of different types simultaneously, i.e., in single trips, from specified origins to destinations; similarly, in economic matching, one needs to couple two groups, e.g., buyers and sellers, by equating supplies and demands of different goods at the same time. The mathematical structure of simultaneous transport is very different from the classic setting of optimal transport, leading to many new challenges. The Monge and Kantorovich formulations are contrasted and connected. Existence conditions and duality formulas are established. More interestingly, by connecting SOT to a natural relaxation of martingale optimal transport (MOT), we introduce the MOT-SOT parity, which allows for explicit solutions of SOT in many interesting cases. △ Less

Submitted 24 May, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: 49 pages. Added MOT-SOT parity

arXiv:2112.03170 [pdf]

A revised comparison between FF five-factor model and three-factor model,based on China's A-share market

Authors: Zhi**g Zhang, Yue Yu, Qinghua Ma, Haixiang Yao

Abstract: In allusion to some contradicting results in existing research, this paper selects China's latest stock data from 2005 to 2020 for empirical analysis. By choosing this periods' data, we avoid the periods of China's significant stock market reforms to reduce the impact of the government's policy on the factor effect. In this paper, the redundant factors (HML, CMA) are orthogonalized, and the regres… ▽ More In allusion to some contradicting results in existing research, this paper selects China's latest stock data from 2005 to 2020 for empirical analysis. By choosing this periods' data, we avoid the periods of China's significant stock market reforms to reduce the impact of the government's policy on the factor effect. In this paper, the redundant factors (HML, CMA) are orthogonalized, and the regression analysis of 5*5 portfolio of Size-B/M and Size-Inv is carried out with these two orthogonalized factors. It found that the HML and the CMA are still significant in many portfolios, indicating that they have a strong explanatory ability, which is also consistent with the results of GRS test. All these show that the five-factor model has a better ability to explain the excess return rate. In the concrete analysis, this paper uses the methods of the five-factor 25-group portfolio returns calculation, the five-factor regression analysis, the orthogonal treatment, the five-factor 25-group regression and the GRS test to more comprehensively explain the excellent explanatory ability of the five-factor model to the excess return. Then, we analyze the possible reasons for the strong explanatory ability of the HML, CMA and RMW from the aspects of price to book ratio, turnover rate and correlation coefficient. We also give a detailed explanation of the results, and analyze the changes of China's stock market policy and investors' investment style recent years. Finally, this paper attempts to put forward some useful suggestions on the development of asset pricing model and China's stock market. △ Less

Submitted 16 October, 2021; originally announced December 2021.

Comments: 17 pages, under review

arXiv:2110.01152 [pdf, other]

Efficiency, Fairness, and Stability in Non-Commercial Peer-to-Peer Ridesharing

Authors: Hoon Oh, Yanhan Tang, Zong Zhang, Alexandre Jacquillat, Fei Fang

Abstract: Unlike commercial ridesharing, non-commercial peer-to-peer (P2P) ridesharing has been subject to limited research -- although it can promote viable solutions in non-urban communities. This paper focuses on the core problem in P2P ridesharing: the matching of riders and drivers. We elevate users' preferences as a first-order concern and introduce novel notions of fairness and stability in P2P rides… ▽ More Unlike commercial ridesharing, non-commercial peer-to-peer (P2P) ridesharing has been subject to limited research -- although it can promote viable solutions in non-urban communities. This paper focuses on the core problem in P2P ridesharing: the matching of riders and drivers. We elevate users' preferences as a first-order concern and introduce novel notions of fairness and stability in P2P ridesharing. We propose algorithms for efficient matching while considering user-centric factors, including users' preferred departure time, fairness, and stability. Results suggest that fair and stable solutions can be obtained in reasonable computational times and can improve baseline outcomes based on system-wide efficiency exclusively. △ Less

Submitted 19 June, 2023; v1 submitted 3 October, 2021; originally announced October 2021.

arXiv:2102.08063 [pdf, other]

A Unified Framework for Specification Tests of Continuous Treatment Effect Models

Authors: Wei Huang, Oliver Linton, Zheng Zhang

Abstract: We propose a general framework for the specification testing of continuous treatment effect models. We assume a general residual function, which includes the average and quantile treatment effect models as special cases. The null models are identified under the unconfoundedness condition and contain a nonparametric weighting function. We propose a test statistic for the null model in which the wei… ▽ More We propose a general framework for the specification testing of continuous treatment effect models. We assume a general residual function, which includes the average and quantile treatment effect models as special cases. The null models are identified under the unconfoundedness condition and contain a nonparametric weighting function. We propose a test statistic for the null model in which the weighting function is estimated by solving an expanding set of moment equations. We establish the asymptotic distributions of our test statistic under the null hypothesis and under fixed and local alternatives. The proposed test statistic is shown to be more efficient than that constructed from the true weighting function and can detect local alternatives deviated from the null models at the rate of $O(N^{-1/2})$. A simulation method is provided to approximate the null distribution of the test statistic. Monte-Carlo simulations show that our test exhibits a satisfactory finite-sample performance, and an application shows its practical value. △ Less

Submitted 3 September, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2101.04618 [pdf, other]

doi 10.1287/mksc.2022.1361

Social Media, Content Moderation, and Technology

Authors: Yi Liu, Pinar Yildirim, Z. John Zhang

Abstract: This paper develops a theoretical model to study the economic incentives for a social media platform to moderate user-generated content. We show that a self-interested platform can use content moderation as an effective marketing tool to expand its installed user base, to increase the utility of its users, and to achieve its positioning as a moderate or extreme content platform. The optimal conten… ▽ More This paper develops a theoretical model to study the economic incentives for a social media platform to moderate user-generated content. We show that a self-interested platform can use content moderation as an effective marketing tool to expand its installed user base, to increase the utility of its users, and to achieve its positioning as a moderate or extreme content platform. The optimal content moderation strategy differs for platforms with different revenue models, advertising or subscription. We also show that a platform's content moderation strategy depends on its technical sophistication. Because of imperfect technology, a platform may optimally throw away the moderate content more than the extreme content. Therefore, one cannot judge how extreme a platform is by just looking at its content moderation strategy. Furthermore, we show that a platform under advertising does not necessarily benefit from a better technology for content moderation, but one under subscription does. This means that platforms under different revenue models can have different incentives to improve their content moderation technology. Finally, we draw managerial and policy implications from our insights. △ Less

Submitted 13 January, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

arXiv:2005.11318 [pdf]

A De-biased Direct Question Approach to Measuring Consumers' Willingness to Pay

Authors: Reto Hofstetter, Klaus M. Miller, Harley Krohmer, Z. John Zhang

Abstract: Knowledge of consumers' willingness to pay (WTP) is a prerequisite to profitable price-setting. To gauge consumers' WTP, practitioners often rely on a direct single question approach in which consumers are asked to explicitly state their WTP for a product. Despite its popularity among practitioners, this approach has been found to suffer from hypothetical bias. In this paper, we propose a rigorous… ▽ More Knowledge of consumers' willingness to pay (WTP) is a prerequisite to profitable price-setting. To gauge consumers' WTP, practitioners often rely on a direct single question approach in which consumers are asked to explicitly state their WTP for a product. Despite its popularity among practitioners, this approach has been found to suffer from hypothetical bias. In this paper, we propose a rigorous method that improves the accuracy of the direct single question approach. Specifically, we systematically assess the hypothetical biases associated with the direct single question approach and explore ways to de-bias it. Our results show that by using the de-biasing procedures we propose, we can generate a de-biased direct single question approach that is accu-rate enough to be useful for managerial decision-making. We validate this approach with two studies in this paper. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: Market Research, Pricing, Demand Estimation, Direct Estimation, Single Question Approach, Choice Experiments, Willingness to Pay, Hypothetical Bias

arXiv:1808.04936 [pdf, ps, other]

A Unified Framework for Efficient Estimation of General Treatment Models

Authors: Chunrong Ai, Oliver Linton, Kaiji Motegi, Zheng Zhang

Abstract: This paper presents a weighted optimization framework that unifies the binary,multi-valued, continuous, as well as mixture of discrete and continuous treatment, under the unconfounded treatment assignment. With a general loss function, the framework includes the average, quantile and asymmetric least squares causal effect of treatment as special cases. For this general framework, we first derive t… ▽ More This paper presents a weighted optimization framework that unifies the binary,multi-valued, continuous, as well as mixture of discrete and continuous treatment, under the unconfounded treatment assignment. With a general loss function, the framework includes the average, quantile and asymmetric least squares causal effect of treatment as special cases. For this general framework, we first derive the semiparametric efficiency bound for the causal effect of treatment, extending the existing bound results to a wider class of models. We then propose a generalized optimization estimation for the causal effect with weights estimated by solving an expanding set of equations. Under some sufficient conditions, we establish consistency and asymptotic normality of the proposed estimator of the causal effect and show that the estimator attains our semiparametric efficiency bound, thereby extending the existing literature on efficient estimation of causal effect to a wider class of applications. Finally, we discuss etimation of some causal effect functionals such as the treatment effect curve and the average outcome. To evaluate the finite sample performance of the proposed procedure, we conduct a small scale simulation study and find that the proposed estimation has practical value. To illustrate the applicability of the procedure, we revisit the literature on campaign advertise and campaign contributions. Unlike the existing procedures which produce mixed results, we find no evidence of campaign advertise on campaign contribution. △ Less

Submitted 16 August, 2018; v1 submitted 14 August, 2018; originally announced August 2018.

arXiv:1807.05678 [pdf, other]

A Simple and Efficient Estimation of the Average Treatment Effect in the Presence of Unmeasured Confounders

Authors: Chunrong Ai, Lukang Huang, Zheng Zhang

Abstract: Wang and Tchetgen Tchetgen (2017) studied identification and estimation of the average treatment effect when some confounders are unmeasured. Under their identification condition, they showed that the semiparametric efficient influence function depends on five unknown functionals. They proposed to parameterize all functionals and estimate the average treatment effect from the efficient influence f… ▽ More Wang and Tchetgen Tchetgen (2017) studied identification and estimation of the average treatment effect when some confounders are unmeasured. Under their identification condition, they showed that the semiparametric efficient influence function depends on five unknown functionals. They proposed to parameterize all functionals and estimate the average treatment effect from the efficient influence function by replacing the unknown functionals with estimated functionals. They established that their estimator is consistent when certain functionals are correctly specified and attains the semiparametric efficiency bound when all functionals are correctly specified. In applications, it is likely that those functionals could all be misspecified. Consequently their estimator could be inconsistent or consistent but not efficient. This paper presents an alternative estimator that does not require parameterization of any of the functionals. We establish that the proposed estimator is always consistent and always attains the semiparametric efficiency bound. A simple and intuitive estimator of the asymptotic variance is presented, and a small scale simulation study reveals that the proposed estimation outperforms the existing alternatives in finite samples. △ Less

Submitted 16 July, 2018; originally announced July 2018.

Showing 1–14 of 14 results for author: Zhang, Z