Search | arXiv e-print repository

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference

Authors: Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva

Abstract: The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node… ▽ More The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2403.14696 [pdf, other]

MOTIV: Visual Exploration of Moral Framing in Social Media

Authors: Andrew Wentzel, Lauren Levine, Vipul Dhariwal, Zarah Fatemi, Abarai Bhattacharya, Barbara Di Eugenio, Andrew Rojecki, Elena Zheleva, G. Elisabeta Marai

Abstract: We present a visual computing framework for analyzing moral rhetoric on social media around controversial topics. Using Moral Foundation Theory, we propose a methodology for deconstructing and visualizing the \textit{when}, \textit{where}, and \textit{who} behind each of these moral dimensions as expressed in microblog data. We characterize the design of this framework, developed in collaboration… ▽ More We present a visual computing framework for analyzing moral rhetoric on social media around controversial topics. Using Moral Foundation Theory, we propose a methodology for deconstructing and visualizing the \textit{when}, \textit{where}, and \textit{who} behind each of these moral dimensions as expressed in microblog data. We characterize the design of this framework, developed in collaboration with experts from language processing, communications, and causal inference. Our approach integrates microblog data with multiple sources of geospatial and temporal data, and leverages unsupervised machine learning (generalized additive models) to support collaborative hypothesis discovery and testing. We implement this approach in a system named MOTIV. We illustrate this approach on two problems, one related to Stay-at-home policies during the COVID-19 pandemic, and the other related to the Black Lives Matter movement. Through detailed case studies and discussions with collaborators, we identify several insights discovered regarding the different drivers of moral sentiment in social media. Our results indicate that this visual approach supports rapid, collaborative hypothesis testing, and can help give insights into the underlying moral values behind controversial political issues. Supplemental Material: https://osf.io/ygkzn/?view_only=6310c0886938415391d977b8aae8b749 △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2308.13552 [pdf, other]

A Lens to Pandemic Stay at Home Attitudes

Authors: Andrew Wentzel, Lauren Levine, Vipul Dhariwal, Zahra Fatemi, Barbara Di Eugenio, Andrew Rojecki, Elena Zheleva, G. Elisabeta Marai

Abstract: We describe the design process and the challenges we met during a rapid multi-disciplinary pandemic project related to stay-at-home orders and social media moral frames. Unlike our typical design experience, we had to handle a steeper learning curve, emerging and continually changing datasets, as well as under-specified design requirements, persistent low visual literacy, and an extremely fast tur… ▽ More We describe the design process and the challenges we met during a rapid multi-disciplinary pandemic project related to stay-at-home orders and social media moral frames. Unlike our typical design experience, we had to handle a steeper learning curve, emerging and continually changing datasets, as well as under-specified design requirements, persistent low visual literacy, and an extremely fast turnaround for new data ingestion, prototy**, testing and deployment. We describe the lessons learned through this experience. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2306.09261 [pdf, other]

Mitigating Cold-start Forecasting using Cold Causal Demand Forecasting Model

Authors: Zahra Fatemi, Minh Huynh, Elena Zheleva, Zamir Syed, Xiaojun Di

Abstract: Forecasting multivariate time series data, which involves predicting future values of variables over time using historical data, has significant practical applications. Although deep learning-based models have shown promise in this field, they often fail to capture the causal relationship between dependent variables, leading to less accurate forecasts. Additionally, these models cannot handle the… ▽ More Forecasting multivariate time series data, which involves predicting future values of variables over time using historical data, has significant practical applications. Although deep learning-based models have shown promise in this field, they often fail to capture the causal relationship between dependent variables, leading to less accurate forecasts. Additionally, these models cannot handle the cold-start problem in time series data, where certain variables lack historical data, posing challenges in identifying dependencies among variables. To address these limitations, we introduce the Cold Causal Demand Forecasting (CDF-cold) framework that integrates causal inference with deep learning-based models to enhance the forecasting accuracy of multivariate time series data affected by the cold-start problem. To validate the effectiveness of the proposed approach, we collect 15 multivariate time-series datasets containing the network traffic of different Google data centers. Our experiments demonstrate that the CDF-cold framework outperforms state-of-the-art forecasting models in predicting future values of multivariate time series data. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.02479 [pdf, other]

Contagion Effect Estimation Using Proximal Embeddings

Authors: Zahra Fatemi, Elena Zheleva

Abstract: Contagion effect refers to the causal effect of peers' behavior on the outcome of an individual in social networks. Contagion can be confounded due to latent homophily which makes contagion effect estimation very hard: nodes in a homophilic network tend to have ties to peers with similar attributes and can behave similarly without influencing one another. One way to account for latent homophily is… ▽ More Contagion effect refers to the causal effect of peers' behavior on the outcome of an individual in social networks. Contagion can be confounded due to latent homophily which makes contagion effect estimation very hard: nodes in a homophilic network tend to have ties to peers with similar attributes and can behave similarly without influencing one another. One way to account for latent homophily is by considering proxies for the unobserved confounders. However, as we demonstrate in this paper, existing proxy-based methods for contagion effect estimation have a very high variance when the proxies are high-dimensional. To address this issue, we introduce a novel framework, Proximal Embeddings (ProEmb), that integrates variational autoencoders with adversarial networks to create low-dimensional representations of high-dimensional proxies and help with identifying contagion effects. While VAEs have been used previously for representation learning in causal inference, a novel aspect of our approach is the additional component of adversarial networks to balance the representations of different treatment groups, which is essential in causal inference from observational data where these groups typically come from different distributions. We empirically show that our method significantly increases the accuracy and reduces the variance of contagion effect estimation in observational network data compared to state-of-the-art methods. △ Less

Submitted 17 October, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2209.05729 [pdf, other]

Understanding Stay-at-home Attitudes through Framing Analysis of Tweets

Authors: Zahra Fatemi, Abari Bhattacharya, Andrew Wentzel, Vipul Dhariwal, Lauren Levine, Andrew Rojecki, G. Elisabeta Marai, Barbara Di Eugenio, Elena Zheleva

Abstract: With the onset of the COVID-19 pandemic, a number of public policy measures have been developed to curb the spread of the virus. However, little is known about the attitudes towards stay-at-home orders expressed on social media despite the fact that social media are central platforms for expressing and debating personal attitudes. To address this gap, we analyze the prevalence and framing of attit… ▽ More With the onset of the COVID-19 pandemic, a number of public policy measures have been developed to curb the spread of the virus. However, little is known about the attitudes towards stay-at-home orders expressed on social media despite the fact that social media are central platforms for expressing and debating personal attitudes. To address this gap, we analyze the prevalence and framing of attitudes towards stay-at-home policies, as expressed on Twitter in the early months of the pandemic. We focus on three aspects of tweets: whether they contain an attitude towards stay-at-home measures, whether the attitude was for or against, and the moral justification for the attitude, if any. We collect and annotate a dataset of stay-at-home tweets and create classifiers that enable large-scale analysis of the relationship between moral frames and stay-at-home attitudes and their temporal evolution. Our findings suggest that frames of care are correlated with a supportive stance, whereas freedom and oppression signify an attitude against stay-at-home directives. There was widespread support for stay-at-home orders in the early weeks of lockdowns, followed by increased resistance toward the end of May and the beginning of June 2020. The resistance was associated with moral judgment that mapped to political divisions. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: This paper has been accepted at The IEEE International Conference on Data Science and Advanced Analytics (DSAA)

arXiv:2207.00163 [pdf, ps, other]

Non-Parametric Inference of Relational Dependence

Authors: Ragib Ahsan, Zahra Fatemi, David Arbour, Elena Zheleva

Abstract: Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from r… ▽ More Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from relational systems by defining sufficient representations for the sets of observations influencing individual instances. Specifically, we define marginal and conditional independence tests for relational data by considering the kernel mean embedding as a flexible aggregation function for relational variables. We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data under a set of structural assumptions. We empirically evaluate our proposed method on a variety of synthetic and semi-synthetic networks and demonstrate its effectiveness compared to state-of-the-art kernel-based independence tests. △ Less

Submitted 29 June, 2022; originally announced July 2022.

Comments: To appear in UAI 2022

arXiv:2110.05367 [pdf, other]

Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting

Authors: Zahra Fatemi, Chen Xing, Wenhao Liu, Caiming Xiong

Abstract: Existing studies addressing gender bias of pre-trained language models, usually build a small gender-neutral data set and conduct a second phase pre-training on the model with such data. However, given the limited size and concentrated focus of the gender-neutral data, catastrophic forgetting would occur during second-phase pre-training. Forgetting information in the original training data may dam… ▽ More Existing studies addressing gender bias of pre-trained language models, usually build a small gender-neutral data set and conduct a second phase pre-training on the model with such data. However, given the limited size and concentrated focus of the gender-neutral data, catastrophic forgetting would occur during second-phase pre-training. Forgetting information in the original training data may damage the model's downstream performance by a large margin. In this work, we empirically show that catastrophic forgetting occurs in such methods by evaluating them with general NLP tasks in GLUE. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models with less forgetting. GEEP freezes the pre-trained model and learns gender-related prompts with gender-neutral data. Empirical results show that GEEP not only achieves SOTA performances on gender fairness tasks, but also forgets less and performs better on GLUE by a large margin. △ Less

Submitted 30 June, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: This paper has been accepted at the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

arXiv:2004.07225 [pdf, other]

Minimizing Interference and Selection Bias in Network Experiment Design

Authors: Zahra Fatemi, Elena Zheleva

Abstract: Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can "spill over" from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment… ▽ More Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can "spill over" from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment to treatment and control. Here, we show that cluster randomization does not ensure sufficient node randomization and it can lead to selection bias in which treatment and control nodes represent different populations of users. To address this problem, we propose a principled framework for network experiment design which jointly minimizes interference and selection bias. We introduce the concepts of edge spillover probability and cluster matching and demonstrate their importance for designing network A/B testing. Our experiments on a number of real-world datasets show that our proposed framework leads to significantly lower error in causal effect estimation than existing solutions. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: This paper has been accepted at the International AAAI Conference on Web and Social Media (ICWSM 2020)

arXiv:1607.03914 [pdf, other]

A simple multiforce layout for multiplex networks

Authors: Zahra Fatemi, Mostafa Salehi, Matteo Magnani

Abstract: We introduce multiforce, a force-directed layout for multiplex networks, where the nodes of the network are organized into multiple layers and both in-layer and inter-layer relationships among nodes are used to compute node coordinates. The proposed approach generalizes existing work, providing a range of intermediate layouts in-between the ones produced by known methods. Our experiments on real d… ▽ More We introduce multiforce, a force-directed layout for multiplex networks, where the nodes of the network are organized into multiple layers and both in-layer and inter-layer relationships among nodes are used to compute node coordinates. The proposed approach generalizes existing work, providing a range of intermediate layouts in-between the ones produced by known methods. Our experiments on real data show that multiforce can keep nodes well aligned across different layers without significantly affecting their internal layouts when the layers have similar or compatible topologies. As a consequence, multiforce enriches the benefits of force-directed layouts by also supporting the identification of topological correspondences between layers. △ Less

Submitted 30 December, 2016; v1 submitted 13 July, 2016; originally announced July 2016.

Comments: 9 pages

Showing 1–10 of 10 results for author: Fatemi, Z