-
Optimally Improving Cooperative Learning in a Social Setting
Authors:
Shahrzad Haddadan,
Cheng Xin,
Jie Gao
Abstract:
We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions, for the same classification task, through communication or observations of each other's predictions. Clearly if highly influential vertices use erroneous classifiers, there will be a negative effect on the accuracy of all the agents in the net…
▽ More
We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions, for the same classification task, through communication or observations of each other's predictions. Clearly if highly influential vertices use erroneous classifiers, there will be a negative effect on the accuracy of all the agents in the network. We ask the following question: how can we optimally fix the prediction of a few classifiers so as maximize the overall accuracy in the entire network. To this end we consider an aggregate and an egalitarian objective function. We show a polynomial time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard. Furthermore, we develop approximation algorithms for the egalitarian improvement. The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
Authors:
Fred Philippy,
Shohreh Haddadan,
Siwen Guo
Abstract:
In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for l…
▽ More
In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More
Authors:
Fred Philippy,
Siwen Guo,
Shohreh Haddadan,
Cedric Lothritz,
Jacques Klein,
Tegawendé F. Bissyandé
Abstract:
Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the…
▽ More
Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by kee** the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Towards a Common Understanding of Contributing Factors for Cross-Lingual Transfer in Multilingual Language Models: A Review
Authors:
Fred Philippy,
Siwen Guo,
Shohreh Haddadan
Abstract:
In recent years, pre-trained Multilingual Language Models (MLLMs) have shown a strong ability to transfer knowledge across different languages. However, given that the aspiration for such an ability has not been explicitly incorporated in the design of the majority of MLLMs, it is challenging to obtain a unique and straightforward explanation for its emergence. In this review paper, we survey lite…
▽ More
In recent years, pre-trained Multilingual Language Models (MLLMs) have shown a strong ability to transfer knowledge across different languages. However, given that the aspiration for such an ability has not been explicitly incorporated in the design of the majority of MLLMs, it is challenging to obtain a unique and straightforward explanation for its emergence. In this review paper, we survey literature that investigates different factors contributing to the capacity of MLLMs to perform zero-shot cross-lingual transfer and subsequently outline and discuss these factors in detail. To enhance the structure of this review and to facilitate consolidation with future studies, we identify five categories of such factors. In addition to providing a summary of empirical evidence from past studies, we identify consensuses among studies with consistent findings and resolve conflicts among contradictory ones. Our work contextualizes and unifies existing research streams which aim at explaining the cross-lingual potential of MLLMs. This review provides, first, an aligned reference point for future research and, second, guidance for a better-informed and more efficient way of leveraging the cross-lingual capacity of MLLMs.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Identifying the Correlation Between Language Distance and Cross-Lingual Transfer in a Multilingual Representation Space
Authors:
Fred Philippy,
Siwen Guo,
Shohreh Haddadan
Abstract:
Prior research has investigated the impact of various linguistic features on cross-lingual transfer performance. In this study, we investigate the manner in which this effect can be mapped onto the representation space. While past studies have focused on the impact on cross-lingual alignment in multilingual language models during fine-tuning, this study examines the absolute evolution of the respe…
▽ More
Prior research has investigated the impact of various linguistic features on cross-lingual transfer performance. In this study, we investigate the manner in which this effect can be mapped onto the representation space. While past studies have focused on the impact on cross-lingual alignment in multilingual language models during fine-tuning, this study examines the absolute evolution of the respective language representation spaces produced by MLLMs. We place a specific emphasis on the role of linguistic characteristics and investigate their inter-correlation with the impact on representation spaces and cross-lingual transfer performance. Additionally, this paper provides preliminary evidence of how these findings can be leveraged to enhance transfer to linguistically distant languages.
△ Less
Submitted 27 March, 2024; v1 submitted 3 May, 2023;
originally announced May 2023.
-
DeMEtRIS: Counting (near)-Cliques by Crawling
Authors:
Suman K. Bera,
Jayesh Choudhari,
Shahrzad Haddadan,
Sara Ahmadian
Abstract:
We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be sto…
▽ More
We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be stored as a whole or be scanned entirely and sampling vertices independently is non-trivial in it. We introduce DeMEtRIS: Dense Motif Estimation through Random Incident Sampling. This method provides a scalable algorithm for clique and near clique counting in the random walk model. We prove the correctness of our algorithm through rigorous mathematical analysis and extensive experiments. Both our theoretical results and our experiments show that DeMEtRIS obtains a high precision estimation by only crawling a sub-linear portion on vertices, thus we demonstrate a significant improvement over previously known results.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
The Drift of #MyBodyMyChoice Discourse on Twitter
Authors:
Cristina Menghini,
Justin Uhr,
Shahrzad Haddadan,
Ashley Champagne,
Bjorn Sandstede,
Sohini Ramachandran
Abstract:
#MyBodyMyChoice is a well-known hashtag originally created to advocate for women's rights, often used in discourse about abortion and bodily autonomy. The Covid-19 outbreak prompted governments to take containment measures such as vaccination campaigns and mask mandates. Population groups opposed to such measures started to use the slogan "My Body My Choice" to claim their bodily autonomy. In this…
▽ More
#MyBodyMyChoice is a well-known hashtag originally created to advocate for women's rights, often used in discourse about abortion and bodily autonomy. The Covid-19 outbreak prompted governments to take containment measures such as vaccination campaigns and mask mandates. Population groups opposed to such measures started to use the slogan "My Body My Choice" to claim their bodily autonomy. In this paper, we investigate whether the discourse around the hashtag #MyBodyMyChoice on Twitter changed its usage after the Covid-19 outbreak. We observe that the conversation around the hashtag changed in two ways. First, semantically, the hashtag #MyBodyMyChoice drifted towards conversations around Covid-19, especially in messages opposed to containment measures. Second, while before the pandemic users used to share content produced by experts and authorities, after Covid-19 the users' attention has shifted towards individuals.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds
Authors:
Shahrzad Haddadan,
Yue Zhuang,
Cyrus Cousins,
Eli Upfal
Abstract:
We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this proble…
▽ More
We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this problem is multi-stage algorithms, which consist of a cooling schedule, and a mean estimator in each step of the schedule. While the cooling schedule in these algorithms is adaptive, the mean estimation computations use MCMC as a black-box to draw approximate samples. We develop a doubly adaptive approach, combining the adaptive cooling schedule with an adaptive MCMC mean estimator, whose number of Markov chain steps adapts dynamically to the underlying chain. Through rigorous theoretical analysis, we prove that our method outperforms the state of the art algorithms in several factors: (1) The computational complexity of our method is smaller; (2) Our method is less sensitive to loose bounds on mixing times, an inherent component in these algorithms; and (3) The improvement obtained by our method is particularly significant in the most challenging regime of high-precision estimation. We demonstrate the advantage of our method in experiments run on classic factor graphs, such as voting models and Ising models.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions
Authors:
Shahrzad Haddadan,
Cristina Menghini,
Matteo Riondato,
Eli Upfal
Abstract:
The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The struc…
▽ More
The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The structural bias of a graph is the sum of the radii of highly-polarized bubbles. We study the problem of decreasing the structural bias through edge insertions. Healing all nodes with high polarized bubble radius is hard to approximate within a logarithmic factor, so we focus on finding the best $k$ edges to insert to maximally reduce the structural bias. We present RePBubLik, an algorithm that leverages a variant of the random walk closeness centrality to select the edges to insert. RePBubLik obtains, under mild conditions, a constant-factor approximation. It reduces the structural bias faster than existing edge-recommendation methods, including some designed to reduce the polarization of a graph.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
The Wedge Picking Model: A dynamic graph model based on triadic closure
Authors:
Sara Ahmadian,
Shahrzad Haddadan
Abstract:
Social networks have become an inseparable part of human life and processing them in an efficient manner is a top priority in the study of networks. These networks are highly dynamic and they are growing incessantly. Inspired by the concept of triadic closure, we propose a probabilistic mechanism to model the evolution of these dynamic graphs. Although triadic closure is ubiquitous in social netwo…
▽ More
Social networks have become an inseparable part of human life and processing them in an efficient manner is a top priority in the study of networks. These networks are highly dynamic and they are growing incessantly. Inspired by the concept of triadic closure, we propose a probabilistic mechanism to model the evolution of these dynamic graphs. Although triadic closure is ubiquitous in social networks and its presence helps forming communities, probabilistic models encapsulating it have not been studied adequately.
We theoretically analyze our model and show how to bound the growth rate of some characteristics of the graph, such as degree of vertices. Leveraging our theoretical results, we develop a scheduling subroutine to process modifications of the graph in batches. Our scheduling subroutine is then used to speed up the state-of-the-art algorithms with negligible loss in their approximation guarantees. We demonstrate the applicability of our method by applying it to the densest subgraph and tri-densest subgraph discovery problem.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE
Authors:
Cyrus Cousins,
Shahrzad Haddadan,
Eli Upfal
Abstract:
We introduce a novel statistical measure for MCMC-mean estimation, the inter-trace variance ${\rm trv}^{(τ_{rel})}({\cal M},f)$, which depends on a Markov chain ${\cal M}$ and a function $f:S\to [a,b]$. The inter-trace variance can be efficiently estimated from observed data and leads to a more efficient MCMC-mean estimator. Prior MCMC mean-estimators receive, as input, upper-bounds on $τ_{mix}$ o…
▽ More
We introduce a novel statistical measure for MCMC-mean estimation, the inter-trace variance ${\rm trv}^{(τ_{rel})}({\cal M},f)$, which depends on a Markov chain ${\cal M}$ and a function $f:S\to [a,b]$. The inter-trace variance can be efficiently estimated from observed data and leads to a more efficient MCMC-mean estimator. Prior MCMC mean-estimators receive, as input, upper-bounds on $τ_{mix}$ or $τ_{rel}$, and often also the stationary variance, and their performance is highly dependent to the sharpness of these bounds. In contrast, we introduce DynaMITE, which dynamically adjusts the sample size, it is less sensitive to the looseness of input upper-bounds on $τ_{rel}$, and requires no bound on $v_π$.
Receiving only an upper-bound ${\cal T}_{rel}$ on $τ_{rel}$, DynaMITE estimates the mean of $f$ in $\tilde{\cal{O}}\bigl(\smash{\frac{{\cal T}_{rel} R}{\varepsilon}}+\frac{τ_{rel}\cdot {\rm trv}^{(τ{rel})}}{\varepsilon^{2}}\bigr)$ steps, without a priori bounds on the stationary variance $v_π$ or the inter-trace variance ${\rm trv}^{(τrel)}$. Thus we depend minimally on the tightness of ${\cal T}_{mix}$, as the complexity is dominated by $τ_{rel}\rm{trv}^{(τ{rel})}$ as $\varepsilon \to 0$. Note that bounding $τ_{\rm rel}$ is known to be prohibitively difficult, however, DynaMITE is able to reduce its principal dependence on ${\cal T}_{rel}$ to $τ_{rel}$, simply by exploiting properties of the inter-trace variance. To compare our method to known variance-aware bounds, we show ${\rm trv}^{(τ{rel})}({\cal M},f) \leq v_π$. Furthermore, we show when $f$'s image is distributed (semi)symmetrically on ${\cal M}$'s traces, we have ${\rm trv}^{({τ{rel}})}({\cal M},f)=o(v_π(f))$, thus DynaMITE outperforms prior methods in these cases.
△ Less
Submitted 4 August, 2021; v1 submitted 22 November, 2020;
originally announced November 2020.
-
On the Complexity of Sampling Nodes Uniformly from a Graph
Authors:
Flavio Chierichetti,
Shahrzad Haddadan
Abstract:
We study a number of graph exploration problems in the following natural scenario: an algorithm starts exploring an undirected graph from some seed node; the algorithm, for an arbitrary node $v$ that it is aware of, can ask an oracle to return the set of the neighbors of $v$. (In social network analysis, a call to this oracle corresponds to downloading the profile page of user $v$ in a social netw…
▽ More
We study a number of graph exploration problems in the following natural scenario: an algorithm starts exploring an undirected graph from some seed node; the algorithm, for an arbitrary node $v$ that it is aware of, can ask an oracle to return the set of the neighbors of $v$. (In social network analysis, a call to this oracle corresponds to downloading the profile page of user $v$ in a social network.) The goal of the algorithm is to either learn something (e.g., average degree) about the graph, or to return some random function of the graph (e.g., a uniform-at-random node), while accessing/downloading as few nodes of the graph as possible. Motivated by practical applications, we study the complexities of a variety of problems in terms of the graph's mixing time and average degree -- two measures that are believed to be quite small in real-world social networks, and that have often been used in the applied literature to bound the performance of online exploration algorithms. Our main result is that the algorithm has to access $Ω\left(t_{\rm mix} d_{\rm avg} ε^{-2} \ln δ^{-1}\right)$ nodes to obtain, with probability at least $1-δ$, an $ε$-additive approximation of the average of a bounded function on the nodes of a graph -- this lower bound matches the performance of an algorithm that was proposed in the literature. We also give tight bounds for the problem of returning a close-to-uniform-at-random node from the graph. Finally, we give lower bounds for the problems of estimating the average degree of the graph, and the number of nodes of the graph.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Mixing Time for Some Adjacent Transposition Markov Chains
Authors:
Shahrzad Haddadan,
Peter Winkler
Abstract:
We prove rapid mixing for certain Markov chains on the set $S_n$ of permutations on $1,2,\dots,n$ in which adjacent transpositions are made with probabilities that depend on the items being transposed. Typically, when in state $σ$, a position $i<n$ is chosen uniformly at random, and $σ(i)$ and $σ(i{+}1)$ are swapped with probability depending on $σ(i)$ and $σ(i{+}1)$. The stationary distributions…
▽ More
We prove rapid mixing for certain Markov chains on the set $S_n$ of permutations on $1,2,\dots,n$ in which adjacent transpositions are made with probabilities that depend on the items being transposed. Typically, when in state $σ$, a position $i<n$ is chosen uniformly at random, and $σ(i)$ and $σ(i{+}1)$ are swapped with probability depending on $σ(i)$ and $σ(i{+}1)$. The stationary distributions of such chains appear in various fields of theoretical computer science, and rapid mixing established in the uniform case.
Recently, there has been progress in cases with biased stationary distributions, but there are wide classes of such chains whose mixing time is unknown. One case of particular interest is what we call the "gladiator chain," in which each number $g$ is assigned a "strength" $s_g$ and when $g$ and $g'$ are adjacent and chosen for possible swap**, $g$ comes out on top with probability $s_g/(s_g + s_{g'})$. We obtain a polynomial-time upper bound on mixing time when the gladiators fall into only three strength classes.
A preliminary version of this paper appeared as "Mixing of Permutations by Biased Transposition" in STACS 2017.
△ Less
Submitted 11 January, 2018; v1 submitted 4 April, 2016;
originally announced April 2016.
-
The expected jaggedness of order ideals
Authors:
Melody Chan,
Shahrzad Haddadan,
Sam Hopkins,
Luca Moci
Abstract:
The jaggedness of an order ideal I in a poset P is the number of maximal elements in I plus the number of minimal elements of P not in I. A probability distribution on the set of order ideals of P is toggle-symmetric if for every p in P, the probability that p is maximal in I equals the probability that p is minimal not in I. In this paper, we prove a formula for the expected jaggedness of an orde…
▽ More
The jaggedness of an order ideal I in a poset P is the number of maximal elements in I plus the number of minimal elements of P not in I. A probability distribution on the set of order ideals of P is toggle-symmetric if for every p in P, the probability that p is maximal in I equals the probability that p is minimal not in I. In this paper, we prove a formula for the expected jaggedness of an order ideal of P under any toggle-symmetric probability distribution when P is the poset of boxes in a skew Young diagram. Our result extends the main combinatorial theorem of Chan-López-Pflueger-Teixidor, who used an expected jaggedness computation as a key ingredient to prove an algebro-geometric formula; and it has applications to homomesies, in the sense of Propp-Roby, of the antichain cardinality statistic for order ideals in partially ordered sets.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Some Instances of Homomesy Among Ideals of Posets
Authors:
Shahrzad Haddadan
Abstract:
Given a permutation $τ$ defined on a set of combinatorial objects $S$, together with some statistic $f:S\rightarrow \mathbb{R}$, we say that the triple $\langle S, τ,f \rangle$ exhibits homomesy if $f$ has the same average along all orbits of $τ$ in $S$. This phenomenon was noticed by Panyushev (2007) and later studied, named and extended by Propp and Roby (2013). After Propp and Roby's paper, hom…
▽ More
Given a permutation $τ$ defined on a set of combinatorial objects $S$, together with some statistic $f:S\rightarrow \mathbb{R}$, we say that the triple $\langle S, τ,f \rangle$ exhibits homomesy if $f$ has the same average along all orbits of $τ$ in $S$. This phenomenon was noticed by Panyushev (2007) and later studied, named and extended by Propp and Roby (2013). After Propp and Roby's paper, homomesy has received a lot of attention and a number of mathematicians are intrigued by it. While seeming ubiquitous, homomesy is often surprisingly non-trivial to prove. Propp and Roby studied homomesy in the set of ideals in the product of two chains, with two well known permutations, rowmotion and promotion, the statistic being the size of the ideal. In this paper we extend their results to generalized rowmotion and promotion together with a wider class of statistics in product of two chains .
Moreover, we derive some homomesy results in posets of type A and B. We believe that the framework that we set up can be used to prove similar results in wider classes of posets.
△ Less
Submitted 4 April, 2016; v1 submitted 17 October, 2014;
originally announced October 2014.