Search | arXiv e-print repository

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference

Authors: Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva

Abstract: The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node… ▽ More The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2403.14696 [pdf, other]

MOTIV: Visual Exploration of Moral Framing in Social Media

Authors: Andrew Wentzel, Lauren Levine, Vipul Dhariwal, Zarah Fatemi, Abarai Bhattacharya, Barbara Di Eugenio, Andrew Rojecki, Elena Zheleva, G. Elisabeta Marai

Abstract: We present a visual computing framework for analyzing moral rhetoric on social media around controversial topics. Using Moral Foundation Theory, we propose a methodology for deconstructing and visualizing the \textit{when}, \textit{where}, and \textit{who} behind each of these moral dimensions as expressed in microblog data. We characterize the design of this framework, developed in collaboration… ▽ More We present a visual computing framework for analyzing moral rhetoric on social media around controversial topics. Using Moral Foundation Theory, we propose a methodology for deconstructing and visualizing the \textit{when}, \textit{where}, and \textit{who} behind each of these moral dimensions as expressed in microblog data. We characterize the design of this framework, developed in collaboration with experts from language processing, communications, and causal inference. Our approach integrates microblog data with multiple sources of geospatial and temporal data, and leverages unsupervised machine learning (generalized additive models) to support collaborative hypothesis discovery and testing. We implement this approach in a system named MOTIV. We illustrate this approach on two problems, one related to Stay-at-home policies during the COVID-19 pandemic, and the other related to the Black Lives Matter movement. Through detailed case studies and discussions with collaborators, we identify several insights discovered regarding the different drivers of moral sentiment in social media. Our results indicate that this visual approach supports rapid, collaborative hypothesis testing, and can help give insights into the underlying moral values behind controversial political issues. Supplemental Material: https://osf.io/ygkzn/?view_only=6310c0886938415391d977b8aae8b749 △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.11895 [pdf, other]

doi 10.1145/3589334.3645675

Bridging or Breaking: Impact of Intergroup Interactions on Religious Polarization

Authors: Rochana Chaturvedi, Sugat Chaturvedi, Elena Zheleva

Abstract: While exposure to diverse viewpoints may reduce polarization, it can also have a backfire effect and exacerbate polarization when the discussion is adversarial. Here, we examine the question whether intergroup interactions around important events affect polarization between majority and minority groups in social networks. We compile data on the religious identity of nearly 700,000 Indian Twitter u… ▽ More While exposure to diverse viewpoints may reduce polarization, it can also have a backfire effect and exacerbate polarization when the discussion is adversarial. Here, we examine the question whether intergroup interactions around important events affect polarization between majority and minority groups in social networks. We compile data on the religious identity of nearly 700,000 Indian Twitter users engaging in COVID-19-related discourse during 2020. We introduce a new measure for an individual's group conformity based on contextualized embeddings of tweet text, which helps us assess polarization between religious groups. We then use a meta-learning framework to examine heterogeneous treatment effects of intergroup interactions on an individual's group conformity in the light of communal, political, and socio-economic events. We find that for political and social events, intergroup interactions reduce polarization. This decline is weaker for individuals at the extreme who already exhibit high conformity to their group. In contrast, during communal events, intergroup interactions can increase group conformity. Finally, we decompose the differential effects across religious groups in terms of emotions and topics of discussion. The results show that the dynamics of religious polarization are sensitive to the context and have important implications for understanding the role of intergroup interactions. △ Less

Submitted 10 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2310.10259 [pdf, other]

Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards

Authors: Ahmed Sayeed Faruk, Elena Zheleva

Abstract: Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore po… ▽ More Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore potential spillover between interacting users, where the action of one user can impact the actions and rewards of other users. Moreover, spillover may vary for different people based on their preferences and the closeness of ties to other users. This leads to heterogeneity in the spillover effects, i.e., the extent to which the action of one user can impact the action of another. Here, we propose a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. By experimenting on several real-world datasets using prominent linear and non-linear contextual bandit algorithms, we observe that our proposed method leads to significantly higher rewards than existing solutions that ignore spillover. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2308.13552 [pdf, other]

A Lens to Pandemic Stay at Home Attitudes

Authors: Andrew Wentzel, Lauren Levine, Vipul Dhariwal, Zahra Fatemi, Barbara Di Eugenio, Andrew Rojecki, Elena Zheleva, G. Elisabeta Marai

Abstract: We describe the design process and the challenges we met during a rapid multi-disciplinary pandemic project related to stay-at-home orders and social media moral frames. Unlike our typical design experience, we had to handle a steeper learning curve, emerging and continually changing datasets, as well as under-specified design requirements, persistent low visual literacy, and an extremely fast tur… ▽ More We describe the design process and the challenges we met during a rapid multi-disciplinary pandemic project related to stay-at-home orders and social media moral frames. Unlike our typical design experience, we had to handle a steeper learning curve, emerging and continually changing datasets, as well as under-specified design requirements, persistent low visual literacy, and an extremely fast turnaround for new data ingestion, prototy**, testing and deployment. We describe the lessons learned through this experience. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2306.09261 [pdf, other]

Mitigating Cold-start Forecasting using Cold Causal Demand Forecasting Model

Authors: Zahra Fatemi, Minh Huynh, Elena Zheleva, Zamir Syed, Xiaojun Di

Abstract: Forecasting multivariate time series data, which involves predicting future values of variables over time using historical data, has significant practical applications. Although deep learning-based models have shown promise in this field, they often fail to capture the causal relationship between dependent variables, leading to less accurate forecasts. Additionally, these models cannot handle the… ▽ More Forecasting multivariate time series data, which involves predicting future values of variables over time using historical data, has significant practical applications. Although deep learning-based models have shown promise in this field, they often fail to capture the causal relationship between dependent variables, leading to less accurate forecasts. Additionally, these models cannot handle the cold-start problem in time series data, where certain variables lack historical data, posing challenges in identifying dependencies among variables. To address these limitations, we introduce the Cold Causal Demand Forecasting (CDF-cold) framework that integrates causal inference with deep learning-based models to enhance the forecasting accuracy of multivariate time series data affected by the cold-start problem. To validate the effectiveness of the proposed approach, we collect 15 multivariate time-series datasets containing the network traffic of different Google data centers. Our experiments demonstrate that the CDF-cold framework outperforms state-of-the-art forecasting models in predicting future values of multivariate time series data. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.02479 [pdf, other]

Contagion Effect Estimation Using Proximal Embeddings

Authors: Zahra Fatemi, Elena Zheleva

Abstract: Contagion effect refers to the causal effect of peers' behavior on the outcome of an individual in social networks. Contagion can be confounded due to latent homophily which makes contagion effect estimation very hard: nodes in a homophilic network tend to have ties to peers with similar attributes and can behave similarly without influencing one another. One way to account for latent homophily is… ▽ More Contagion effect refers to the causal effect of peers' behavior on the outcome of an individual in social networks. Contagion can be confounded due to latent homophily which makes contagion effect estimation very hard: nodes in a homophilic network tend to have ties to peers with similar attributes and can behave similarly without influencing one another. One way to account for latent homophily is by considering proxies for the unobserved confounders. However, as we demonstrate in this paper, existing proxy-based methods for contagion effect estimation have a very high variance when the proxies are high-dimensional. To address this issue, we introduce a novel framework, Proximal Embeddings (ProEmb), that integrates variational autoencoders with adversarial networks to create low-dimensional representations of high-dimensional proxies and help with identifying contagion effects. While VAEs have been used previously for representation learning in causal inference, a novel aspect of our approach is the additional component of adversarial networks to balance the representations of different treatment groups, which is essential in causal inference from observational data where these groups typically come from different distributions. We empirically show that our method significantly increases the accuracy and reduces the variance of contagion effect estimation in observational network data compared to state-of-the-art methods. △ Less

Submitted 17 October, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.17479 [pdf, other]

Inferring Causal Effects Under Heterogeneous Peer Influence

Authors: Shishir Adhikari, Elena Zheleva

Abstract: Causal inference in networks should account for interference, which occurs when a unit's outcome is influenced by treatments or outcomes of peers. Heterogeneous peer influence (HPI) occurs when a unit's outcome is influenced differently by different peers based on their attributes and relationships, or when each unit has a different susceptibility to peer influence. Existing solutions to estimatin… ▽ More Causal inference in networks should account for interference, which occurs when a unit's outcome is influenced by treatments or outcomes of peers. Heterogeneous peer influence (HPI) occurs when a unit's outcome is influenced differently by different peers based on their attributes and relationships, or when each unit has a different susceptibility to peer influence. Existing solutions to estimating direct causal effects under interference consider either homogeneous influence from peers or specific heterogeneous influence mechanisms (e.g., based on local neighborhood structure). This paper presents a methodology for estimating individual direct causal effects in the presence of HPI where the mechanism of influence is not known a priori. We propose a structural causal model for networks that can capture different possible assumptions about network structure, interference conditions, and causal dependence and enables reasoning about identifiability in the presence of HPI. We find potential heterogeneous contexts using the causal model and propose a novel graph neural network-based estimator to estimate individual direct causal effects. We show that state-of-the-art methods for individual direct effect estimation produce biased results in the presence of HPI, and that our proposed estimator is robust. △ Less

Submitted 14 November, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

arXiv:2301.06615 [pdf, other]

Data-Driven Estimation of Heterogeneous Treatment Effects

Authors: Christopher Tran, Keith Burghardt, Kristina Lerman, Elena Zheleva

Abstract: Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences. In the last few years, there has been a considerable interest in adapting machine learning algorithms to the problem of estimating heterogeneous effects from observational and experimental data. However, these algorithms often make strong assu… ▽ More Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences. In the last few years, there has been a considerable interest in adapting machine learning algorithms to the problem of estimating heterogeneous effects from observational and experimental data. However, these algorithms often make strong assumptions about the observed features in the data and ignore the structure of the underlying causal model, which can lead to biased estimation. At the same time, the underlying causal mechanism is rarely known in real-world datasets, making it hard to take it into consideration. In this work, we provide a survey of state-of-the-art data-driven methods for heterogeneous treatment effect estimation using machine learning, broadly categorizing them as methods that focus on counterfactual prediction and methods that directly estimate the causal effect. We also provide an overview of a third category of methods which rely on structural causal models and learn the model structure from data. Our empirical evaluation under various underlying structural model mechanisms shows the advantages and deficiencies of existing estimators and of the metrics for measuring their performance. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2209.05729 [pdf, other]

Understanding Stay-at-home Attitudes through Framing Analysis of Tweets

Authors: Zahra Fatemi, Abari Bhattacharya, Andrew Wentzel, Vipul Dhariwal, Lauren Levine, Andrew Rojecki, G. Elisabeta Marai, Barbara Di Eugenio, Elena Zheleva

Abstract: With the onset of the COVID-19 pandemic, a number of public policy measures have been developed to curb the spread of the virus. However, little is known about the attitudes towards stay-at-home orders expressed on social media despite the fact that social media are central platforms for expressing and debating personal attitudes. To address this gap, we analyze the prevalence and framing of attit… ▽ More With the onset of the COVID-19 pandemic, a number of public policy measures have been developed to curb the spread of the virus. However, little is known about the attitudes towards stay-at-home orders expressed on social media despite the fact that social media are central platforms for expressing and debating personal attitudes. To address this gap, we analyze the prevalence and framing of attitudes towards stay-at-home policies, as expressed on Twitter in the early months of the pandemic. We focus on three aspects of tweets: whether they contain an attitude towards stay-at-home measures, whether the attitude was for or against, and the moral justification for the attitude, if any. We collect and annotate a dataset of stay-at-home tweets and create classifiers that enable large-scale analysis of the relationship between moral frames and stay-at-home attitudes and their temporal evolution. Our findings suggest that frames of care are correlated with a supportive stance, whereas freedom and oppression signify an attitude against stay-at-home directives. There was widespread support for stay-at-home orders in the early weeks of lockdowns, followed by increased resistance toward the end of May and the beginning of June 2020. The resistance was associated with moral judgment that mapped to political divisions. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: This paper has been accepted at The IEEE International Conference on Data Science and Advanced Analytics (DSAA)

arXiv:2208.12210 [pdf, other]

Learning Relational Causal Models with Cycles through Relational Acyclification

Authors: Ragib Ahsan, David Arbour, Elena Zheleva

Abstract: In real-world phenomena which involve mutual influence or causal effects between interconnected units, equilibrium states are typically represented with cycles in graphical models. An expressive class of graphical models, relational causal models, can represent and reason about complex dynamic systems exhibiting such cycles or feedback loops. Existing cyclic causal discovery algorithms for learnin… ▽ More In real-world phenomena which involve mutual influence or causal effects between interconnected units, equilibrium states are typically represented with cycles in graphical models. An expressive class of graphical models, relational causal models, can represent and reason about complex dynamic systems exhibiting such cycles or feedback loops. Existing cyclic causal discovery algorithms for learning causal models from observational data assume that the data instances are independent and identically distributed which makes them unsuitable for relational causal models. At the same time, causal discovery algorithms for relational causal models assume acyclicity. In this work, we examine the necessary and sufficient conditions under which a constraint-based relational causal discovery algorithm is sound and complete for cyclic relational causal models. We introduce relational acyclification, an operation specifically designed for relational models that enables reasoning about the identifiability of cyclic relational causal models. We show that under the assumptions of relational acyclification and $σ$-faithfulness, the relational causal discovery algorithm RCD (Maier et al. 2013) is sound and complete for cyclic models. We present experimental results to support our claim. △ Less

Submitted 17 March, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: Published in the 37th AAAI Conference on Artificial Intelligence (AAAI 2023)

Journal ref: AAAI 2023

arXiv:2208.05624 [pdf]

Determining Causality in Travel Mode Choice

Authors: Rishabh Singh Chauhan, Christoffer Riis, Shishir Adhikari, Sybil Derrible, Elena Zheleva, Charisma F. Choudhury, Francisco Camara Pereira

Abstract: This article presents one of the pioneering studies on causal modeling in travel mode choice decision-making using causal discovery algorithms. These models are a major advancement from conventional correlation-based techniques. We propose a novel methodology that combines causal discovery with structural equation modeling (SEM). This modeling approach overcomes some of the limitations of SEM by c… ▽ More This article presents one of the pioneering studies on causal modeling in travel mode choice decision-making using causal discovery algorithms. These models are a major advancement from conventional correlation-based techniques. We propose a novel methodology that combines causal discovery with structural equation modeling (SEM). This modeling approach overcomes some of the limitations of SEM by combining the strengths of both causal discovery and SEM. Causal discovery algorithms determine causal graphs from observational data and domain knowledge, and SEMs estimate direct causal effects and test the performance of causal discovery algorithms. In this study, we test four causal discovery algorithms: Peter-Clark (PC), Fast Causal Inference (FCI), Fast Greedy Equivalence Search (FGES), and Direct Linear Non-Gaussian Acyclic Models (DirectLiNGAM). The results show that DirectLiNGAM based SEM model best captures causality in mode choice behavior. It passes several goodness-of-fit tests, including Root Mean Square Error of Approximation (RMSEA) and Goodness-of-Fit Index (GFI), and it achieves the lowest Bayesian Information Criterion (BIC) value. The analyses are conducted on data collected from the 2017 National Household Travel Survey in the New York Metropolitan area. △ Less

Submitted 24 April, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2207.00163 [pdf, ps, other]

Non-Parametric Inference of Relational Dependence

Authors: Ragib Ahsan, Zahra Fatemi, David Arbour, Elena Zheleva

Abstract: Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from r… ▽ More Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from relational systems by defining sufficient representations for the sets of observations influencing individual instances. Specifically, we define marginal and conditional independence tests for relational data by considering the kernel mean embedding as a flexible aggregation function for relational variables. We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data under a set of structural assumptions. We empirically evaluate our proposed method on a variety of synthetic and semi-synthetic networks and demonstrate its effectiveness compared to state-of-the-art kernel-based independence tests. △ Less

Submitted 29 June, 2022; originally announced July 2022.

Comments: To appear in UAI 2022

arXiv:2206.12689 [pdf, other]

Improving Data-driven Heterogeneous Treatment Effect Estimation Under Structure Uncertainty

Authors: Christopher Tran, Elena Zheleva

Abstract: Estimating how a treatment affects units individually, known as heterogeneous treatment effect (HTE) estimation, is an essential part of decision-making and policy implementation. The accumulation of large amounts of data in many domains, such as healthcare and e-commerce, has led to increased interest in develo** data-driven algorithms for estimating heterogeneous effects from observational and… ▽ More Estimating how a treatment affects units individually, known as heterogeneous treatment effect (HTE) estimation, is an essential part of decision-making and policy implementation. The accumulation of large amounts of data in many domains, such as healthcare and e-commerce, has led to increased interest in develo** data-driven algorithms for estimating heterogeneous effects from observational and experimental data. However, these methods often make strong assumptions about the observed features and ignore the underlying causal model structure, which can lead to biased HTE estimation. At the same time, accounting for the causal structure of real-world data is rarely trivial since the causal mechanisms that gave rise to the data are typically unknown. To address this problem, we develop a feature selection method that considers each feature's value for HTE estimation and learns the relevant parts of the causal structure from data. We provide strong empirical evidence that our method improves existing data-driven HTE estimation methods under arbitrary underlying causal structures. Our results on synthetic, semi-synthetic, and real-world datasets show that our feature selection algorithm leads to lower HTE estimation error. △ Less

Submitted 25 June, 2022; originally announced June 2022.

Comments: 11 Pages, Accepted to KDD22

arXiv:2202.10706 [pdf, other]

Relational Causal Models with Cycles:Representation and Reasoning

Authors: Ragib Ahsan, David Arbour, Elena Zheleva

Abstract: Causal reasoning in relational domains is fundamental to studying real-world social phenomena in which individual units can influence each other's traits and behavior. Dynamics between interconnected units can be represented as an instantiation of a relational causal model; however, causal reasoning over such instantiation requires additional templating assumptions that capture feedback loops of i… ▽ More Causal reasoning in relational domains is fundamental to studying real-world social phenomena in which individual units can influence each other's traits and behavior. Dynamics between interconnected units can be represented as an instantiation of a relational causal model; however, causal reasoning over such instantiation requires additional templating assumptions that capture feedback loops of influence. Previous research has developed lifted representations to address the relational nature of such dynamics but has strictly required that the representation has no cycles. To facilitate cycles in relational representation and learning, we introduce relational $σ$-separation, a new criterion for understanding relational systems with feedback loops. We also introduce a new lifted representation, $σ$-abstract ground graph which helps with abstracting statistical independence relations in all possible instantiations of the cyclic relational model. We show the necessary and sufficient conditions for the completeness of $σ$-AGG and that relational $σ$-separation is sound and complete in the presence of one or more cycles with arbitrary length. To the best of our knowledge, this is the first work on representation of and reasoning with cyclic relational causal models. △ Less

Submitted 6 May, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Published in the 1st Conference on Causal Learning and Reasoning (2022)

arXiv:2201.11242 [pdf, other]

Heterogeneous Peer Effects in the Linear Threshold Model

Authors: Christopher Tran, Elena Zheleva

Abstract: The Linear Threshold Model is a widely used model that describes how information diffuses through a social network. According to this model, an individual adopts an idea or product after the proportion of their neighbors who have adopted it reaches a certain threshold. Typical applications of the Linear Threshold Model assume that thresholds are either the same for all network nodes or randomly di… ▽ More The Linear Threshold Model is a widely used model that describes how information diffuses through a social network. According to this model, an individual adopts an idea or product after the proportion of their neighbors who have adopted it reaches a certain threshold. Typical applications of the Linear Threshold Model assume that thresholds are either the same for all network nodes or randomly distributed, even though some people may be more susceptible to peer pressure than others. To address individual-level differences, we propose causal inference methods for estimating individual thresholds that can more accurately predict whether and when individuals will be affected by their peers. We introduce the concept of heterogeneous peer effects and develop a Structural Causal Model which corresponds to the Linear Threshold Model and supports heterogeneous peer effect identification and estimation. We develop two algorithms for individual threshold estimation, one based on causal trees and one based on causal meta-learners. Our experimental results on synthetic and real-world datasets show that our proposed models can better predict individual-level thresholds in the Linear Threshold Model and thus more precisely predict which nodes will get activated over time. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: To be published in the 36th AAAI Conference on Artificial Intelligence (2022)

arXiv:2201.04399 [pdf, other]

doi 10.1145/3488560.3502192

RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems

Authors: Zohreh Ovaisi, Shelby Heinecke, Jia Li, Yongfeng Zhang, Elena Zheleva, Caiming Xiong

Abstract: Robust machine learning is an increasingly important topic that focuses on develo** models resilient to various forms of imperfect data. Due to the pervasiveness of recommender systems in online technologies, researchers have carried out several robustness studies focusing on data sparsity and profile injection attacks. Instead, we propose a more holistic view of robustness for recommender syste… ▽ More Robust machine learning is an increasingly important topic that focuses on develo** models resilient to various forms of imperfect data. Due to the pervasiveness of recommender systems in online technologies, researchers have carried out several robustness studies focusing on data sparsity and profile injection attacks. Instead, we propose a more holistic view of robustness for recommender systems that encompasses multiple dimensions - robustness with respect to sub-populations, transformations, distributional disparity, attack, and data sparsity. While there are several libraries that allow users to compare different recommender system models, there is no software library for comprehensive robustness evaluation of recommender system models under different scenarios. As our main contribution, we present a robustness evaluation toolkit, Robustness Gym for RecSys (RGRecSys -- https://www.github.com/salesforce/RGRecSys), that allows us to quickly and uniformly evaluate the robustness of recommender system models. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Journal ref: In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM 22), February 2022, ACM, 4 pages

arXiv:2110.14632 [pdf, other]

doi 10.1145/3472538.3472550

Heterogeneous Effects of Software Patches in a Multiplayer Online Battle Arena Game

Authors: Yuzi He, Christopher Tran, Julie Jiang, Keith Burghardt, Emilio Ferrara, Elena Zheleva, Kristina Lerman

Abstract: The popularity of online gaming has grown dramatically, driven in part by streaming and the billion-dollar e-sports industry. Online games regularly update their software to fix bugs, add functionality that improve the game's look and feel, and change the game mechanics to keep the games fun and challenging. An open question, however, is the impact of these changes on player performance and game b… ▽ More The popularity of online gaming has grown dramatically, driven in part by streaming and the billion-dollar e-sports industry. Online games regularly update their software to fix bugs, add functionality that improve the game's look and feel, and change the game mechanics to keep the games fun and challenging. An open question, however, is the impact of these changes on player performance and game balance, as well as how players adapt to these sudden changes. To address these questions, we use causal inference to measure the impact of software patches to League of Legends, a popular team-based multiplayer online game. We show that game patches have substantially different impacts on players depending on their skill level and whether they take breaks between games. We find that the gap between good and bad players increases after a patch, despite efforts to make gameplay more equal. Moreover, longer between-game breaks tend to improve player performance after patches. Overall, our results highlight the utility of causal inference, and specifically heterogeneous treatment effect estimation, as a tool to quantify the complex mechanisms of game balance and its interplay with players' performance. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 9 pages, 11 figures

Journal ref: Proceedings of The 16th International Conference on the Foundations of Digital Games (FDG) 2021

arXiv:2106.11029 [pdf, other]

Understanding the Dynamics between Va** and Cannabis Legalization Using Twitter Opinions

Authors: Shishir Adhikari, Akshay Uppal, Robin Mermelstein, Tanya Berger-Wolf, Elena Zheleva

Abstract: Cannabis legalization has been welcomed by many U.S. states but its role in escalation from tobacco e-cigarette use to cannabis va** is unclear. Meanwhile, cannabis va** has been associated with new lung diseases and rising adolescent use. To understand the impact of cannabis legalization on escalation, we design an observational study to estimate the causal effect of recreational cannabis leg… ▽ More Cannabis legalization has been welcomed by many U.S. states but its role in escalation from tobacco e-cigarette use to cannabis va** is unclear. Meanwhile, cannabis va** has been associated with new lung diseases and rising adolescent use. To understand the impact of cannabis legalization on escalation, we design an observational study to estimate the causal effect of recreational cannabis legalization on the development of pro-cannabis attitude for e-cigarette users. We collect and analyze Twitter data which contains opinions about cannabis and JUUL, a very popular e-cigarette brand. We use weakly supervised learning for personal tweet filtering and classification for stance detection. We discover that recreational cannabis legalization policy has an effect on increased development of pro-cannabis attitudes for users already in favor of e-cigarettes. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: Published at ICWSM 2021

arXiv:2004.07225 [pdf, other]

Minimizing Interference and Selection Bias in Network Experiment Design

Authors: Zahra Fatemi, Elena Zheleva

Abstract: Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can "spill over" from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment… ▽ More Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can "spill over" from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment to treatment and control. Here, we show that cluster randomization does not ensure sufficient node randomization and it can lead to selection bias in which treatment and control nodes represent different populations of users. To address this problem, we propose a principled framework for network experiment design which jointly minimizes interference and selection bias. We introduce the concepts of edge spillover probability and cluster matching and demonstrate their importance for designing network A/B testing. Our experiments on a number of real-world datasets show that our proposed framework leads to significantly lower error in causal effect estimation than existing solutions. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: This paper has been accepted at the International AAAI Conference on Web and Social Media (ICWSM 2020)

arXiv:2002.00208 [pdf, other]

doi 10.1145/3441452

Variable-lag Granger Causality and Transfer Entropy for Time Series Analysis

Authors: Chainarong Amornbunchornvej, Elena Zheleva, Tanya Berger-Wolf

Abstract: Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. The assumption of fixed time delay also exists in Transfer En… ▽ More Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. The assumption of fixed time delay also exists in Transfer Entropy, which is considered to be a non-linear version of Granger causality. However, the assumption of the fixed time delay does not hold in many applications, such as collective behavior, financial markets, and many natural phenomena. To address this issue, we develop Variable-lag Granger causality and Variable-lag Transfer Entropy, generalizations of both Granger causality and Transfer Entropy that relax the assumption of the fixed time delay and allow causes to influence effects with arbitrary time delays. In addition, we propose methods for inferring both variable-lag Granger causality and Transfer Entropy relations. In our approaches, we utilize an optimal war** path of Dynamic Time War** (DTW) to infer variable-lag causal relations. We demonstrate our approaches on an application for studying coordinated collective behavior and other real-world casual-inference datasets and show that our proposed approaches perform better than several existing methods in both simulated and real-world datasets. Our approaches can be applied in any domain of time series analysis. The software of this work is available in the R-CRAN package: VLTimeCausality. △ Less

Submitted 1 June, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Comments: This preprint is the extension of the work [arXiv:1912.10829] entitled "Variable-lag Granger Causality for Time Series Analysis" by the same authors. The revision was made based on reviewers' suggestions. The R package is available at https://github.com/DarkEyes/VLTimeSeriesCausality

MSC Class: 91-08; 68T05; 62-07 ACM Class: G.3; I.2.3; I.2.6; J.4

Journal ref: ACM Transactions on Knowledge Discovery from Data (TKDD), 15(4), 67 (2021)

arXiv:2001.11358 [pdf, ps, other]

doi 10.1145/3366423.3380255

Correcting for Selection Bias in Learning-to-rank Systems

Authors: Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, Elena Zheleva

Abstract: Click data collected by modern recommendation systems are an important source of observational data that can be utilized to train learning-to-rank (LTR) systems. However, these data suffer from a number of biases that can result in poor performance for LTR systems. Recent methods for bias correction in such systems mostly focus on position bias, the fact that higher ranked results (e.g., top searc… ▽ More Click data collected by modern recommendation systems are an important source of observational data that can be utilized to train learning-to-rank (LTR) systems. However, these data suffer from a number of biases that can result in poor performance for LTR systems. Recent methods for bias correction in such systems mostly focus on position bias, the fact that higher ranked results (e.g., top search engine results) are more likely to be clicked even if they are not the most relevant results given a user's query. Less attention has been paid to correcting for selection bias, which occurs because clicked documents are reflective of what documents have been shown to the user in the first place. Here, we propose new counterfactual approaches which adapt Heckman's two-stage method and accounts for selection and position bias in LTR systems. Our empirical evaluation shows that our proposed methods are much more robust to noise and have better accuracy compared to existing unbiased LTR algorithms, especially when there is moderate to no position bias. △ Less

Submitted 12 May, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: This paper appeared in The Web Conference (WWW'20), April 20-24, 2020, Taipei, Taiwan

arXiv:1912.10829 [pdf, other]

doi 10.1109/DSAA.2019.00016

Variable-lag Granger Causality for Time Series Analysis

Authors: Chainarong Amornbunchornvej, Elena Zheleva, Tanya Y. Berger-Wolf

Abstract: Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. However, the assumption of the fixed time delay does not hold… ▽ More Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. However, the assumption of the fixed time delay does not hold in many applications, such as collective behavior, financial markets, and many natural phenomena. To address this issue, we develop variable-lag Granger causality, a generalization of Granger causality that relaxes the assumption of the fixed time delay and allows causes to influence effects with arbitrary time delays. In addition, we propose a method for inferring variable-lag Granger causality relations. We demonstrate our approach on an application for studying coordinated collective behavior and show that it performs better than several existing methods in both simulated and real-world datasets. Our approach can be applied in any domain of time series analysis. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: This paper will be appeared in the proceeding of 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA). The R package is available at https://github.com/DarkEyes/VLTimeSeriesCausality

MSC Class: 91-08; 68T05; 62-07 ACM Class: G.3; I.2.3; I.2.6; J.4

Journal ref: Proceedings of 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

arXiv:1902.00087 [pdf, other]

doi 10.1609/aaai.v33i01.33015183

Learning Triggers for Heterogeneous Treatment Effects

Authors: Christopher Tran, Elena Zheleva

Abstract: The causal effect of a treatment can vary from person to person based on their individual characteristics and predispositions. Mining for patterns of individual-level effect differences, a problem known as heterogeneous treatment effect estimation, has many important applications, from precision medicine to recommender systems. In this paper we define and study a variant of this problem in which a… ▽ More The causal effect of a treatment can vary from person to person based on their individual characteristics and predispositions. Mining for patterns of individual-level effect differences, a problem known as heterogeneous treatment effect estimation, has many important applications, from precision medicine to recommender systems. In this paper we define and study a variant of this problem in which an individual-level threshold in treatment needs to be reached, in order to trigger an effect. One of the main contributions of our work is that we do not only estimate heterogeneous treatment effects with fixed treatments but can also prescribe individualized treatments. We propose a tree-based learning method to find the heterogeneity in the treatment effects. Our experimental results on multiple datasets show that our approach can learn the triggers better than existing approaches. △ Less

Submitted 10 May, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

Comments: Accepted at AAAI 2019

Showing 1–24 of 24 results for author: Zheleva, E