Encoding Hierarchical Schema via Concept Flow for
Multifaceted Ideology Detection

Songtao Liu1, Bang Wang1,11footnotemark: 1, Wei Xiang1, Han Xu2 and Minghua Xu2,thanks:  Corresponding author: B. Wang and M. Xu
1School of Electronic Information and Communications,
Huazhong University of Science and Technology, Wuhan, China
2School of Journalism and Information Communication,
Huazhong University of Science and Technology, Wuhan, China
{liusongtao, wangbang, xiangwei, xuh, xuminghua}@hust.edu.cn
Abstract

Multifaceted ideology detection (MID) aims to detect the ideological leanings of texts towards multiple facets. Previous studies on ideology detection mainly focus on one generic facet and ignore label semantics and explanatory descriptions of ideologies, which are a kind of instructive information and reveal the specific concepts of ideologies. In this paper, we develop a novel concept semantics-enhanced framework for the MID task. Specifically, we propose a bidirectional iterative concept flow (BICo) method to encode multifaceted ideologies. BICo enables the concepts to flow across levels of the schema tree and enriches concept representations with multi-granularity semantics. Furthermore, we explore concept attentive matching and concept-guided contrastive learning strategies to guide the model to capture ideology features with the learned concept semantics. Extensive experiments on the benchmark dataset show that our approach achieves state-of-the-art performance in MID, including in the cross-topic scenario.111 The source code is available at https://github.com/LST1836/BICo

Encoding Hierarchical Schema via Concept Flow for
Multifaceted Ideology Detection


Songtao Liu1, Bang Wang1,11footnotemark: 1, Wei Xiang1, Han Xu2 and Minghua Xu2,thanks:  Corresponding author: B. Wang and M. Xu 1School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China 2School of Journalism and Information Communication, Huazhong University of Science and Technology, Wuhan, China {liusongtao, wangbang, xiangwei, xuh, xuminghua}@hust.edu.cn


1 Introduction

Multifaceted ideology detection (MID) aims to identify the ideological leanings (e.g., Left, Center, Right, etc.) expressed in texts towards multiple facets, as shown in Figure 1. It is crucial for understanding public opinion and detecting potential extremism (Kannangara, 2018; Grover and Mark, 2019; Demszky et al., 2019), which is helpful for governments and cybersecurity organizations (Stefanov et al., 2020; Aldera et al., 2021). It can also facilitate downstream research and applications in social sciences (Kabir and Madria, 2022).

In most of related work, researches generally focus on modeling the text content with diversified cues, such as sentiment polarities (Bhatia and P, 2018; Kabir and Madria, 2022), named entities (Liu et al., 2022), and discourse structure (Devatine et al., 2023; Hong et al., 2023), or jointly learning with other related tasks (Baly et al., 2019). There are also approaches that incorporate information sources beyond text to facilitate ideology mining. Hyperlink structure (Kulkarni et al., 2018), social networks (Stefanov et al., 2020; Li and Goldwasser, 2021), external knowledge from knowledge graphs (Zhang et al., 2022) as well as information from other modalities (Qiu et al., 2022), are introduced in the task of ideology detection.

Refer to caption
Figure 1: Upper: the multifaceted ideology schema and concepts of facets and ideologies (Liu et al., 2023). Lower left: the tree-like hierarchical structure of the schema. Lower right: an example of MID. “L” denotes Left, “R” denotes Right.

Although achieving promising performance, those methods limit the ideology prediction to a generic facet. In other words, they only label a text as ideologically left- or right-leaning as a whole, regardless whether the text containing one or more different facets. Furthermore, they ignore a crucial clue, label semantics, that is, what exactly does an ideology mean? In this case, ideological categories are represented as one-hot vectors without any semantic information, and models can only rely on the training data distribution to analyze latent ideology features, which could be unfavorable for the generalization ability of models (Wang et al., 2021; Wen and Hauptmann, 2023).

So how can we effectively detect multifaceted ideologies? And what exactly does an ideology mean? Liu et al. (2023) propose the multifaceted ideology detection task for the first time and design a multifaceted ideology schema which contains 12 facets covering 5 domains in a tree-like hierarchical structure (see Figure 1, details in Appendix A). Each facet, as well as the ideological attributes under each facet, are defined using natural texts, which can be regarded as concepts. These concepts describe the meaning of facets and ideologies, thus making it natural to represent the label semantics. In addition, in the hierarchical schema, higher-level concepts (like Domain and Facet) have general semantics shared by their child concepts, while lower-level concepts (like Ideology and Facet) describe their parents from various views, which can be seen as the semantic divisions of higher-level concepts. This meaningful hierarchical structure can be utilized to enrich the concept semantics.

Based on the motivation above, to incorporate the concept semantics and leverage the hierarchical structure of the schema in MID, we propose a novel Bidirectional Iterative Concept Flow (BICo) method to encode the hierarchical schema. Specifically, BICo allows concepts to flow in two directions on the schema tree, enabling them to perceive both high-level general semantics and low-level specific perspectives. On the one hand, inspired by the relation rotation in complex space (Sun et al., 2018), we design Concept Metapath Diffusion to perform message passing from root to leaf. On the other hand, in the direction of leaf to root, we propose Concept Hierarchy Aggregation to aggregate concept semantics in lower levels to the ones in higher levels based on the parent-child relation. Concept flow in the two directions is iterated multiple times and the final concept representations are enriched by multi-granularity semantics. For example, the Facet representations capture the meanings of different ideologies in the corresponding facet, while the Ideology representations also perceive information about the Facet and Domain they belong to. We match the text and Facet representations based on the attention mechanism to recognize text-related facets. Furthermore, we explore a Concept-Guided Contrastive Learning strategy to learn more distinguishable text representations under the guidance of Ideology concepts.

The main contributions of our work are summarized as follows:

(1) We propose a concept semantics-enhanced MID framework. To our best knowledge, this is the first work that incorporates label semantics and explanatory descriptions in the MID task.

(2) We propose a Bidirectional Iterative Concept Flow (BICo) method to encode the hierarchical schema. Concepts flow on the schema tree in two directions iteratively to capture multi-granularity concept semantics.

(3) We design Concept Attentive Matching and Concept-Guided Contrastive Learning strategies to enable the model to extract ideology features with the help of concept semantics.

(4) Extensive experiments on the MITweet benchmark demonstrate the effectiveness of our approach, including in the cross-topic scenario.

Refer to caption
Figure 2: Overview of our concept-enhanced multifaceted ideology detection framework. The blue box in the middle shows the proposed bidirectional iterative concept flow (BICo), which includes root-to-leaf concept metapath diffusion and leaf-to-root concept hierarchy aggregation. The concept representations are enriched gradually by bidirectional iteration, and are then used to enhance the two subtasks of MID through concept attentive matching and concept-guided contrastive learning.

2 Task Description

Given an input text and a set of facets, Multifaceted Ideology Detection (MID) is divided into two sub-tasks: (1) Relevance Recognition aims to recognize the facets that the text is related to; (2) Ideology Analysis predicts which ideology the text holds towards the related facets. Formally, a sample instance can be considered as a triple (x,{yRi}i=1n,{yIi}i=1m)𝑥superscriptsubscriptsuperscriptsubscript𝑦𝑅𝑖𝑖1𝑛superscriptsubscriptsuperscriptsubscript𝑦𝐼𝑖𝑖1𝑚\left(x,\{y_{R}^{i}\}_{i=1}^{n},\{y_{I}^{i}\}_{i=1}^{m}\right)( italic_x , { italic_y start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , { italic_y start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ), where x𝑥xitalic_x is the input text, yRisuperscriptsubscript𝑦𝑅𝑖absenty_{R}^{i}\initalic_y start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ {{\{{Related, Unrelated}}\}} represents the relevance label of i𝑖iitalic_i-th facet, n𝑛nitalic_n is the number of given facets. For each facet that the text is related to, we have an ideology label yIisuperscriptsubscript𝑦𝐼𝑖absenty_{I}^{i}\initalic_y start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ {{\{{Left, Center, Right}}\}}, m𝑚mitalic_m is the number of related facets.

3 Approach

In this section, we first introduce the proposed Bidirectional Iterative Concept Flow (BICo) for encoding the hierarchical schema, and then discuss how we augment multifaceted ideology detection based on the learned concept encodings. Figure 2 illustrates the overall structure of our model.

3.1 Bidirectional Iterative Concept Flow

3.1.1 Concept Hierarchy Tree

Liu et al. (2023) define the first hierarchical schema of multifaceted ideology, which contains 12 facets covering 5 domains. We construct a concept hierarchy tree T=(N,E)𝑇𝑁𝐸T=(N,E)italic_T = ( italic_N , italic_E ) based on the schema, as shown in Figure 2. The node set N𝑁Nitalic_N contains four types of nodes, i.e., Root, Domain, Facet and Ideology. The edge set E𝐸Eitalic_E indicates the subordination relation between nodes. The Ideology, or leaf, nodes represent the three ideologies (Left, Center, Right) of each facet.

To initialize node embeddings in the concept hierarchy tree, we leverage the concepts of facets and ideologies in the schema. Specifically, we adopt a pre-trained language model as the concept encoder and feed the concepts of facets and ideologies in the schema into the encoder. We then extract the hidden state of [CLS] token as initial representations of Facet and Ideology nodes, i.e., hFsubscripth𝐹\textbf{h}_{F}h start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and hIsubscripth𝐼\textbf{h}_{I}h start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT. For Root and Domain nodes, we obtain their initial embeddings (hRsubscripth𝑅\textbf{h}_{R}h start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT and hDsubscripth𝐷\textbf{h}_{D}h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT) by average-pooling their child node embeddings.

3.1.2 Concept Metapath Diffusion

In the concept hierarchy tree, higher-level nodes (like Root and Domain) have general and abstract concepts, which are shared by their child nodes and could be beneficial for enriching the representations of lower-level nodes (like Ideology and Facet). In order to allow lower-level nodes to perceive higher-level abstract semantics, we adopt the relation rotation in complex space (Sun et al., 2018), which is effective for information transfer along edges in a sequential structure.

Specifically, we define concept metapath as a path from root to leaf (RootDomainFacetIdeology)RootDomainFacetIdeology(\texttt{Root}-\texttt{Domain}-\texttt{Facet}-\texttt{Ideology})( Root - Domain - Facet - Ideology ). Given node representations in a metapath (𝐡R,𝐡D,𝐡F,𝐡I)=(𝐡0,𝐡1,𝐡2,𝐡3)subscript𝐡𝑅subscript𝐡𝐷subscript𝐡𝐹subscript𝐡𝐼subscript𝐡0subscript𝐡1subscript𝐡2subscript𝐡3(\mathbf{h}_{R},\mathbf{h}_{D},\mathbf{h}_{F},\mathbf{h}_{I})=(\mathbf{h}_{0},% \mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3})( bold_h start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ) = ( bold_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), let 𝐫isubscript𝐫𝑖\mathbf{r}_{i}bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the representation of edge between node hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hi+1subscript𝑖1h_{i+1}italic_h start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, the concept metapath diffusion from root to leaf through relation rotation is formulated as:

𝐨0subscript𝐨0\displaystyle\mathbf{o}_{0}bold_o start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =𝐡0=𝐡0absentsuperscriptsubscript𝐡0subscript𝐡0\displaystyle=\mathbf{h}_{0}^{\prime}=\mathbf{h}_{0}= bold_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (1)
𝐡isuperscriptsubscript𝐡𝑖\displaystyle\mathbf{h}_{i}^{\prime}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT =𝐡i+𝐡i1𝐫i1absentsubscript𝐡𝑖direct-productsuperscriptsubscript𝐡𝑖1subscript𝐫𝑖1\displaystyle=\mathbf{h}_{i}+\mathbf{h}_{i-1}^{\prime}\odot\mathbf{r}_{i-1}= bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_h start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊙ bold_r start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (2)
𝐨isubscript𝐨𝑖\displaystyle\mathbf{o}_{i}bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝐡ii+1absentsuperscriptsubscript𝐡𝑖𝑖1\displaystyle=\frac{\mathbf{h}_{i}^{\prime}}{i+1}= divide start_ARG bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_i + 1 end_ARG (3)

where 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝐫isubscript𝐫𝑖\mathbf{r}_{i}bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐨isubscript𝐨𝑖\mathbf{o}_{i}bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are all complex vectors, 𝐨isubscript𝐨𝑖\mathbf{o}_{i}bold_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the updated embedding, direct-product\odot is the element-wise complex product and performs vector rotation in complex space. Here we can easily interpret a real vector of dimension d𝑑ditalic_d as a complex vector of dimension d/2𝑑2d/2italic_d / 2 by treating the first half of the vector as the real part and the second half as the imaginary part. We perform concept diffusion on all metapaths in the tree and 𝐫isubscript𝐫𝑖\mathbf{r}_{i}bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is shared between each two consecutive levels of nodes. Note that in relation rotation, edges represent the rotation angles of vectors in complex space. Therefore, 𝐫isubscript𝐫𝑖\mathbf{r}_{i}bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is first randomly initialized in the range of (π,π)𝜋𝜋(-\pi,\pi)( - italic_π , italic_π ), and then its real and imaginary parts are obtained by the Euler’s formula.

3.1.3 Concept Hierarchy Aggregation

In contrast to metapath diffusion, concept hierarchy aggregation enables concept flow from leaf to root. In the concept hierarchy tree, child nodes describe their parent node from different views, and thus can be regarded as more fine-grained concepts. Through concept hierarchy aggregation, we aggregate concept semantics of child nodes to their parent node, so as to enrich the representations of higher-level nodes.

We utilize the graph attention network (GAT), which aggregates features through attention mechanism in a graph. Considering the characteristics of concept hierarchy tree, we modify it to explicitly model the hierarchical structure and quantitatively measure the compatibility between hierarchies in the tree. Specifically, we only establish aggregation between the parent node and its own child nodes, which is different from aggregating over all one-hop neighbors in GAT. Additionally, we use different attention parameters at different levels to distinguish the aggregation features of each hierarchy.

Formally, for a parent node p𝑝pitalic_p with embedding 𝐡psubscript𝐡𝑝\mathbf{h}_{p}bold_h start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, we compute an aggregation weight for each child node i𝑖iitalic_i and then weighted sum all child nodes’ embeddings:

episubscript𝑒𝑝𝑖\displaystyle e_{pi}italic_e start_POSTSUBSCRIPT italic_p italic_i end_POSTSUBSCRIPT =LeakyReLU(𝐀l(𝐡p𝐡i))absentLeakyReLUsubscript𝐀𝑙conditionalsubscript𝐡𝑝subscript𝐡𝑖\displaystyle=\textrm{LeakyReLU}\left(\mathbf{A}_{l}\left(\mathbf{h}_{p}% \parallel\mathbf{h}_{i}\right)\right)= LeakyReLU ( bold_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∥ bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (4)
αpisubscript𝛼𝑝𝑖\displaystyle\alpha_{pi}italic_α start_POSTSUBSCRIPT italic_p italic_i end_POSTSUBSCRIPT =exp(epi)j𝒞p{p}exp(epj)absentexpsubscript𝑒𝑝𝑖subscript𝑗subscript𝒞𝑝𝑝expsubscript𝑒𝑝𝑗\displaystyle=\frac{\textrm{exp}\left(e_{pi}\right)}{\sum_{j\in\mathcal{C}_{p}% \cup\{p\}}\textrm{exp}\left(e_{pj}\right)}= divide start_ARG exp ( italic_e start_POSTSUBSCRIPT italic_p italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∪ { italic_p } end_POSTSUBSCRIPT exp ( italic_e start_POSTSUBSCRIPT italic_p italic_j end_POSTSUBSCRIPT ) end_ARG (5)
𝐡psuperscriptsubscript𝐡𝑝\displaystyle\mathbf{h}_{p}^{\prime}bold_h start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT =σ(i𝒞p{p}αpi𝐡i)absent𝜎subscript𝑖subscript𝒞𝑝𝑝subscript𝛼𝑝𝑖subscript𝐡𝑖\displaystyle=\sigma\left(\sum_{i\in\mathcal{C}_{p}\cup\{p\}}\alpha_{pi}% \mathbf{h}_{i}\right)= italic_σ ( ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∪ { italic_p } end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p italic_i end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (6)

where 𝐀lsubscript𝐀𝑙\mathbf{A}_{l}bold_A start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the learnable parameter for aggregation of nodes in level l𝑙litalic_l, 𝒞psubscript𝒞𝑝\mathcal{C}_{p}caligraphic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the child node set of p𝑝pitalic_p, 𝐡psuperscriptsubscript𝐡𝑝\mathbf{h}_{p}^{\prime}bold_h start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the updated representation for p𝑝pitalic_p.

3.1.4 Bidirectional Iteration

The root-to-leaf metapath diffusion and leaf-to-root hierarchy aggregation are iterated multiple times to update node encodings. Finally, The new generated concept representations can be fully aware of higher-level general semantics and constructed with concepts from different aspects. Next we will enhance the MID task with the enriched Facet and Ideology representations, 𝐜Fsubscript𝐜𝐹\mathbf{c}_{F}bold_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and 𝐜Isubscript𝐜𝐼\mathbf{c}_{I}bold_c start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT.

3.2 Concept-Enhanced MID

3.2.1 Text Encoder

We select a pre-trained language model as the text encoder. In the subtask of Relevance Recognition, the encoder processes input sequence and outputs a hidden representation for each token: 𝐗={𝐱i}i=1L𝐗superscriptsubscriptsubscript𝐱𝑖𝑖1𝐿\mathbf{X}=\{\mathbf{x}_{i}\}_{i=1}^{L}bold_X = { bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, where L𝐿Litalic_L is the length of text. For Ideology Analysis, we concatenate the text and its related facet concept, and then feed the sequence into text encoder to acquire the hidden state of [CLS] as text representation 𝐭𝐭\mathbf{t}bold_t.

3.2.2 Concept Attentive Matching

In Relevance Recognition subtask, to enable the text to be aware of label semantics (i.e., Facet concepts) and measure the importance of each token in relevance feature extraction, we adopt the cross-attention mechanism (Vaswani et al., 2017) to match the Facet and input token representations:

𝐭i=softmax(𝐜Fi𝐗Td)𝐗superscript𝐭𝑖softmaxsuperscriptsubscript𝐜𝐹𝑖superscript𝐗𝑇𝑑𝐗\displaystyle\mathbf{t}^{i}=\textrm{softmax}\left(\frac{\mathbf{c}_{F}^{i}% \mathbf{X}^{T}}{\sqrt{d}}\right)\mathbf{X}bold_t start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = softmax ( divide start_ARG bold_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d end_ARG end_ARG ) bold_X (7)

where d𝑑ditalic_d is the dimension of vectors in the equation, the superscript i𝑖iitalic_i represents i𝑖iitalic_i-th facet, 𝐜Fisuperscriptsubscript𝐜𝐹𝑖\mathbf{c}_{F}^{i}bold_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is i𝑖iitalic_i-th facet representation and 𝐭isuperscript𝐭𝑖\mathbf{t}^{i}bold_t start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is i𝑖iitalic_i-th facet-aware text representation.

3.2.3 Concept-Guided Contrastive Learning

To inject label semantics (i.e., Ideology concepts) into Ideology Analysis subtask, we further explore a Concept-Guided Contrastive Learning strategy (CGCL), which tries to make intra-ideology representations more compact in the feature space and inter-ideology ones more distinguishable with the ideology concepts as anchors. The motivation is that ideology concepts describe the general meaning of ideologies. In the embedding space, this property can be interpreted as clustering, where an ideology concept anchor is the semantic center of samples with that ideological category.

Specifically, given text representations ={𝐭i}i=1Bsuperscriptsubscriptsubscript𝐭𝑖𝑖1𝐵\mathcal{B}=\{\mathbf{t}_{i}\}_{i=1}^{B}caligraphic_B = { bold_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT in a batch (B𝐵Bitalic_B is the batch size), and three Ideology representations 𝒜={𝐜I,L,𝐜I,C,𝐜I,R}𝒜subscript𝐜𝐼𝐿subscript𝐜𝐼𝐶subscript𝐜𝐼𝑅\mathcal{A}=\{\mathbf{c}_{I,L},\mathbf{c}_{I,C},\mathbf{c}_{I,R}\}caligraphic_A = { bold_c start_POSTSUBSCRIPT italic_I , italic_L end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_I , italic_C end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_I , italic_R end_POSTSUBSCRIPT } (corresponding to Left, Center and Right respectively) which will be used as concept anchors in the vector space, the concept-guided contrastive loss is formulated as:

CGCL=13i{L,C,R}isubscript𝐶𝐺𝐶𝐿13subscript𝑖𝐿𝐶𝑅subscript𝑖\displaystyle\mathcal{L}_{CGCL}=\frac{1}{3}\sum_{i\in\{L,C,R\}}\mathcal{L}_{i}caligraphic_L start_POSTSUBSCRIPT italic_C italic_G italic_C italic_L end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_L , italic_C , italic_R } end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (8)
i=logj{j|yI,j=i}exp(f(𝐜I,i,𝐭j)/τ)𝐯𝒜\{cI,i}exp(f(𝐜I,i,𝐯)/τ)subscript𝑖subscript𝑗conditional-set𝑗subscript𝑦𝐼𝑗𝑖𝑓subscript𝐜𝐼𝑖subscript𝐭𝑗𝜏subscript𝐯\𝒜subscript𝑐𝐼𝑖𝑓subscript𝐜𝐼𝑖𝐯𝜏\displaystyle\mathcal{L}_{i}=\log\frac{{\textstyle\sum_{j\in\{j|y_{I,j}=i\}}}% \exp\left(f\left(\mathbf{c}_{I,i},\mathbf{t}_{j}\right)/\tau\right)}{\sum_{% \mathbf{v}\in\mathcal{B}\cup\mathcal{A}\backslash\{c_{I,i}\}}\exp\left(f\left(% \mathbf{c}_{I,i},\mathbf{v}\right)/\tau\right)}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ { italic_j | italic_y start_POSTSUBSCRIPT italic_I , italic_j end_POSTSUBSCRIPT = italic_i } end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_c start_POSTSUBSCRIPT italic_I , italic_i end_POSTSUBSCRIPT , bold_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_v ∈ caligraphic_B ∪ caligraphic_A \ { italic_c start_POSTSUBSCRIPT italic_I , italic_i end_POSTSUBSCRIPT } end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_c start_POSTSUBSCRIPT italic_I , italic_i end_POSTSUBSCRIPT , bold_v ) / italic_τ ) end_ARG (9)

where yI,jsubscript𝑦𝐼𝑗y_{I,j}italic_y start_POSTSUBSCRIPT italic_I , italic_j end_POSTSUBSCRIPT is the ideology label of 𝐭jsubscript𝐭𝑗\mathbf{t}_{j}bold_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, f𝑓fitalic_f is the cosine similarity function, τ𝜏\tauitalic_τ is temperature parameter. Note that CGCLsubscript𝐶𝐺𝐶𝐿\mathcal{L}_{CGCL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_G italic_C italic_L end_POSTSUBSCRIPT is computed for each facet, and we omit the facet superscript for clarity.

3.2.4 Classification and Training

Considering the varying ideology features among different facets, we set up a classification head with a softmax function for each facet in both subtasks:

yi=softmax(𝐖i𝐭i+𝐛i)superscript𝑦𝑖softmaxsuperscript𝐖𝑖superscript𝐭𝑖superscript𝐛𝑖\displaystyle y^{i}=\textrm{softmax}\left(\mathbf{W}^{i}\mathbf{t}^{i}+\mathbf% {b}^{i}\right)italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = softmax ( bold_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (10)

where the superscript i𝑖iitalic_i represents i𝑖iitalic_i-th facet, 𝐖isuperscript𝐖𝑖\mathbf{W}^{i}bold_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and 𝐛isuperscript𝐛𝑖\mathbf{b}^{i}bold_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT are trainable parameters.

Note that in Relevance Recognition, we also incorporate contrastive learning (CL), which is similar to the concept-guided CL in Sec. 3.2.3, but the anchors here are text representations themselves:

CL=1Bi=1Blogjiexp(f(𝐭i,𝐭j)/τ)k{k|ik}exp(f(ti,tk)/τ)subscript𝐶𝐿1𝐵superscriptsubscript𝑖1𝐵subscript𝑗subscript𝑖𝑓subscript𝐭𝑖subscript𝐭𝑗𝜏subscript𝑘conditional-set𝑘𝑖𝑘𝑓subscriptt𝑖subscriptt𝑘𝜏\displaystyle\mathcal{L}_{CL}=-\frac{1}{B}\sum_{i=1}^{B}\log\frac{\sum_{j\in% \mathcal{B}_{i}}\exp\left(f\left(\mathbf{t}_{i},\mathbf{t}_{j}\right)/\tau% \right)}{\sum_{k\in\{k|i\neq k\}}\exp\left(f\left(\textbf{t}_{i},\textbf{t}_{k% }\right)/\tau\right)}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ { italic_k | italic_i ≠ italic_k } end_POSTSUBSCRIPT roman_exp ( italic_f ( t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ ) end_ARG (11)

where 𝐭isubscript𝐭𝑖\mathbf{t}_{i}bold_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the facet-aware text representation, i={j|ij,yR,i=yR,j}subscript𝑖conditional-set𝑗formulae-sequence𝑖𝑗subscript𝑦𝑅𝑖subscript𝑦𝑅𝑗\mathcal{B}_{i}=\{j|i\neq j,y_{R,i}=y_{R,j}\}caligraphic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_j | italic_i ≠ italic_j , italic_y start_POSTSUBSCRIPT italic_R , italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_R , italic_j end_POSTSUBSCRIPT }, yR,isubscript𝑦𝑅𝑖y_{R,i}italic_y start_POSTSUBSCRIPT italic_R , italic_i end_POSTSUBSCRIPT is the relevance label of 𝐭isubscript𝐭𝑖\mathbf{t}_{i}bold_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, B𝐵Bitalic_B is the batch size, τ𝜏\tauitalic_τ is temperature parameter. Here CLsubscript𝐶𝐿\mathcal{L}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT is also computed for each facet, and we omit the facet superscript for clarity.

Finally, the training loss of both subtasks is the weighted sum of cross-entropy classification loss and contrastive learning loss across all facets:

=1ni=1n(CEi+λCLi)1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐶𝐸𝑖𝜆superscriptsubscript𝐶𝐿𝑖\displaystyle\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}\left(\mathcal{L}_{CE}^{i}+% \lambda\mathcal{L}_{CL}^{i}\right)caligraphic_L = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( caligraphic_L start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_λ caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (12)

where CEisuperscriptsubscript𝐶𝐸𝑖\mathcal{L}_{CE}^{i}caligraphic_L start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the cross-entropy loss of i𝑖iitalic_i-th facet, λ𝜆\lambdaitalic_λ is a hyper-parameter controlling the weight of contrastive loss, n𝑛nitalic_n is the total number of facets.

4 Experiments

4.1 Dataset and Evaluation Metrics

We conduct experiments on the MITweet (Liu et al., 2023) dataset, which contains 12,594 English tweets and covers 14 highly controversial topics in recent years. Each instance in MITweet is annotated with a relevance label and an ideology label (if the relevance label is “Related”) for each of the 12 facets in the multifaceted ideology schema. The statistics of MITweet is shown in Table 6.

We follow the original training/validation/test split and use the same evaluation metrics as Liu et al. (2023). First we calculate the Accuracy (Acc) and F1 score for each facet. Then we utilize both Macro and Micro methods to integrate metrics from all facets to obtain overall results of model performance. Macro-F1 and Macro-Acc are calculated by averaging F1 and Acc across all facets. Micro-F1 and Micro-Acc are the aggregated F1 and Acc scores obtained by concatenating the predictions of all facets. Note that, following existing work, we only report F1-related metrics for Relevance Recognition due to the highly imbalanced data distribution in this subtask.

4.2 Implement Details

The pre-trained BERTweet-base (Nguyen et al., 2020) is used as the concept and text encoder, and the two encoders share weights as this gave better results in preliminary experiments. We train the Relevance Recognition model and the Ideology Analysis model independently. Each model includes the BICo module and is trained end-to-end. We use AdamW (Loshchilov and Hutter, 2018) as the optimizer. The learning rate is set to 2e-5. The batch size B𝐵Bitalic_B is set to 64. The iteration number of BICo is set to 4 for relevance recognition and 2 for ideology analysis. For contrastive loss, we set the temperature parameter τ𝜏\tauitalic_τ to 0.5 for relevance recognition and 0.1 for ideology analysis. The contrastive loss weight λ𝜆\lambdaitalic_λ is set to 0.3 for both subtasks. The classification head is a two-layer fully connected network, in which the hidden size is 512. The above parameters are selected based on the validation set. We report the average results of 5 runs with different random seeds.

4.3 Comparison Models

We compare our approach with the latest benchmark in the MID task, BERTweetInd (Liu et al., 2023), which uses BERTweet as the backbone and detects indicator words from training set as the textual descriptions of facets. In addition, we test the zero/few-shot performance of advanced large language models (LLMs) in this task. Specifically, we select two popular LLMs, LLaMA2 (Touvron et al., 2023) and ChatGPT 222https://openai.com/blog/chatgpt, which exhibit superior capacities in communicating with humans, including solving a wide range of complex tasks without further training. We use the Llama-2-13b-chat and gpt-3.5-turbo-1106 versions. The prompts designed for LLMs can be found in Appendix B.

We also provide variants of our proposed approach in the ablation study:

Model Macro-F1 Micro-F1 Macro-Acc Micro-Acc
Subtask 1: Relevance Recognition
BERTweetInd 57.48 70.32 - -
LLaMA2-13B 27.45 32.28 - -
ChatGPT 33.11 40.07 - -
LLaMA2-13B 29.35 38.17 - -
ChatGPT 38.83 44.78 - -
Our approach 59.22superscript59.22\textbf{59.22}^{\dagger}59.22 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 72.14superscript72.14\textbf{72.14}^{\dagger}72.14 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT - -
w/o CL 58.56 71.42 - -
w/o BICo 58.14 70.41 - -
w/o CL&BICo 57.85 70.42 - -
Subtask 2: Ideology Analysis
BERTweetInd 42.68 69.28 65.88 76.38
LLaMA2-13B 35.60 47.33 45.98 49.69
ChatGPT 37.11 53.41 48.57 57.95
LLaMA2-13B 38.51 47.22 46.13 48.90
ChatGPT 42.64 60.54 58.44 68.25
Our approach 47.32superscript47.32\textbf{47.32}^{\dagger}47.32 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 70.90superscript70.90\textbf{70.90}^{\dagger}70.90 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 66.79 78.60superscript78.60\textbf{78.60}^{\dagger}78.60 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
w/o BICo 46.02 68.58 66.18 76.63
w/o Concept anchors 45.08 68.38 66.04 77.30
w/o CGCL 44.21 67.54 65.15 76.79
Table 1: Overall results of different models and ablation study. \circ and \triangle denote 0-shot and 3-shot, respectively. \dagger denotes the significance test over BERTweetInd at p-value<0.05. Bold values are the best results in the corresponding subtask.

•  Relevance Recognition

(1) “w/o CL” denotes without contrastive learning.

(2) “w/o BICo” denotes without bidirectional iterative concept flow, in which case the facet representations in Concept Attentive Matching are directly from the concept encoder.

(3) “w/o CL&BICo” denotes the combination of the above two cases.

•  Ideology Analysis

(1) “w/o BICo” denotes without bidirectional iterative concept flow. In this case, the concept anchors (i.e., ideology representations) are directly from the concept encoder.

(2) “w/o concept anchors” denotes performing the contrastive learning without the guidance of concept anchors, i.e., the anchors are text representations themselves, which is the case of Eq. (11).

(3) “w/o CGCL” denotes discarding the concept-guided contrastive learning.

Model PoR SS EO EE EP CSR CV DS MF SD JO PeR
Subtask 1: Relevance Recognition
BERTweetInd 46.92 32.71 71.05 63.29 82.26 35.04 19.52 62.73 85.99 44.07 75.55 70.71
LLaMA2-13B 3.33 9.41 31.30 20.48 47.23 5.19 4.33 30.82 56.42 26.32 56.06 61.34
ChatGPT 6.48 10.45 54.27 37.33 53.04 10.27 7.96 47.02 78.87 33.52 63.92 62.79
Our approach 52.63superscript52.63\textbf{52.63}^{\dagger}52.63 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 33.33superscript33.33\textbf{33.33}^{\dagger}33.33 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 71.65superscript71.65\textbf{71.65}^{\dagger}71.65 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 67.98superscript67.98\textbf{67.98}^{\dagger}67.98 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 83.00 31.58 19.35 66.80superscript66.80\textbf{66.80}^{\dagger}66.80 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 86.59superscript86.59\textbf{86.59}^{\dagger}86.59 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 50.55superscript50.55\textbf{50.55}^{\dagger}50.55 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 76.16superscript76.16\textbf{76.16}^{\dagger}76.16 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 70.98superscript70.98\textbf{70.98}^{\dagger}70.98 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
Subtask 2: Ideology Analysis
BERTweetInd 24.40 27.59 52.26 41.25 52.43 49.93 43.37 57.00 48.39 43.92 36.55 35.04
LLaMA2-13B 37.23 44.47 52.28 36.97 45.56 44.81 29.60 41.60 29.87 39.90 33.47 26.40
ChatGPT 24.44 43.91 50.79 43.73 54.06 33.33 51.59 52.92 33.03 36.03 41.95 45.87
Our approach 33.16 45.18superscript45.18\textbf{45.18}^{\dagger}45.18 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 50.25 61.70superscript61.70\textbf{61.70}^{\dagger}61.70 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 59.65superscript59.65\textbf{59.65}^{\dagger}59.65 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 35.56 49.31 62.35superscript62.35\textbf{62.35}^{\dagger}62.35 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 47.25 45.10superscript45.10\textbf{45.10}^{\dagger}45.10 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 37.10 41.24
Table 2: F1 scores of different models on each facet. \dagger denotes the significance test over BERTweetInd at p-value<0.05. \triangle denotes 3-shot. Bold and underlined values are the best and second-best results in the subtask, respectively. Full names of 12 facets in the first row can be found in Appendix A.

4.4 Main Results

We present the overall results of our approach and other models in Table 1. First, we can observe that our concept-enhanced method performs consistently better than other baseline models, including the advanced large language models, indicating the superiority of our approach for the MID task. Second, compared with BERTweetInd, which is also a BERTweet-based model, our approach achieves significant improvements in both subtasks. This suggests that the application of concept semantics in the hierarchical schema helps the model to capture the correlation between text and labels, thus improving the performance. Third, for the LLMs, although ChatGPT performs better than LLaMA2-13B, and the few in-context demonstrations improve the results, there is still a large gap between LLMs and other task-specific models. This indicates that the MID task remains challenging for current LLMs. One possible reason is that, this task requires not only strong text understanding and semantic reasoning abilities, but also the integration of specialized sociological knowledge and background information on relevant topics, which is difficult for general-purpose LLMs.

In more detail, F1 scores of different models on each facet are shown in Table 2. In the subtask of Relevance Recognition, our approach achieves the best results on 10 out of 12 facets, surpassing the second-place by over 4 points on 4 facets (PoR, EE, DS, SD). This again demonstrates the effectiveness of our concept-enhanced framwork in the MID task. However, on the facets of CSR and CV, our approach is inferior to BERTweetInd, especially on CSR. We think this is likely because there are too few related samples in CSR (as shown in Table 6), and our method uses a separate classification head for each facet, resulting in even more insufficient training for CSR. Although this issue affects the results, it is only an edge case. The two LLMs still perform poorly, especially on PoR, SS and CV. By analyzing the responses generated by LLMs, we find that LLMs are more likely to ignore or generalize the definitions in prompts on these facets. For the Ideology Analysis subtask, the baseline models achieve the best or second-best result on some facets. Nevertheless, our approach ranks in the top two on 10 out of 12 facets and shows overall superior performance.

4.5 Ablation Study

We conduct ablation studies to inspect the importance of major components in our model and the results are reported in Table 1. It is clear that the removal of either one of the modules causes a drop in performance. The Micro-F1 decreases by 1.73 and 2.32 points on the two subtasks, respectively, when BICo is removed, which validates that it is important to further model the schema hierarchy and concept interactions on top of the concept encoder. BICo iteratively performs concept diffusion and aggregation on the hierarchy tree, and the updated concept representations are enriched by higher-level general semantics and lower-level concrete perspectives, which are helpful for the model to understand the deep meaning of facet and ideology labels.

In Ideology Analysis, the removal of concept anchors leads to noticeable performance degradation. This suggests that relying solely on text content to identify ideology is insufficient, and injecting label semantics can guide the model to capture ideology features and distinguish among different ideologies more accurately, so as to improve the performance of MID. Moreover, the results of “w/o CL” in Relevance Recognition and “w/o CGCL” in Ideology Analysis verify the effectiveness of contrastive learning strategies in two subtasks.

We also conduct ablation study for the modules of Concept Metapath Diffusion and Concept Hierarchy Aggregation in BICo. The results are presented in Appendix C.

Test Topics Model Relevance Recognition Ideology Analysis
Micro-F1 Micro-Acc Micro-F1
CHR&GF BERTweetInd 59.60 70.20 52.41
LLaMA2-13B 28.29 56.79 44.22
ChatGPT 36.20 69.70 51.87
Our approach 61.00superscript61.00\textbf{61.00}^{\dagger}61.00 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 72.43superscript72.43\textbf{72.43}^{\dagger}72.43 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 54.76superscript54.76\textbf{54.76}^{\dagger}54.76 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
BLM&Dm BERTweetInd 54.69 80.64 58.89
LLaMA2-13B 31.90 58.93 46.27
ChatGPT 39.54 73.45 54.09
Our approach 62.88superscript62.88\textbf{62.88}^{\dagger}62.88 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 83.31superscript83.31\textbf{83.31}^{\dagger}83.31 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 61.04superscript61.04\textbf{61.04}^{\dagger}61.04 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
Table 3: Cross-topic results of different models. CHR means Capitol Hill Riot, GF means George Floyd, BLM means Black Lives Matter, Dm means Democracy. \dagger denotes the significance test over BERTweetInd at p-value<0.05. Bold values are the best results in the corresponding test topics.
Refer to caption
(a) w/o CGCL
Refer to caption
(b) w/ CL
Refer to caption
(c) w/ CGCL
Figure 3: T-SNE visualization of text representations learned by different model variants in the Ideology Analysis subtask. CGCL denotes our Concept-Guided Contrastive Learning. CL denotes the Contrastive Learning without concept anchors. Red, green and blue dots represent Left, Center and Right samples, respectively.
Refer to caption
Refer to caption
Figure 4: Results of different numbers of iterations.

4.6 Cross-Topic Generalization

In our approach, label concepts are incorporated to enhance the model and they are enriched by multi-granularity concepts from different levels in the hierarchical schema through BICo. Intuitively, concepts provide a general description of a label. Therefore, our model should have better generalization to new topics with the help of concept semantics. To validate this viewpoint, we test and compare the cross-topic generalization ability of different models.

In the cross-topic scenario, the models are trained on some topics and then tested on the rest topics. To reduce randomness, we conduct experiments on two sets of test topics and the results are shown in Table 3. It can be observed that our approach consistently outperforms other models in both subtasks, which verifies that our approach can better generalize the learning ability to deal with cross-topic scenarios. LLMs lag behind other models by a significant margin. This shows that task-specific models still have advantages even in cross-topic scenarios. However, for the test topics of CHR&GF, ChatGPT performs closely to task-specific models in the Ideology Analysis subtask, indicating that ChatGPT may have practical value in specific cross-topic scenarios.

4.7 Effect of Number of Iterations

To analyze the effect of using different numbers of iterations in BICo, we conduct experiments on both subtasks and present the results in Figure 4. We can observe a clear upward and then downward trend in model performance as the number of iterations increases. The optimal number of iterations for Relevance Recognition is 4 and for Ideology Analysis is 2. One possible reason for this trend is that, when the number of iterations is too small, the concept diffusion and aggregation are insufficient, and the concept representations do not fully perceive the semantics of different granularities in the hierarchical structure. In contrast, when the number of iterations is too large, there will be redundancy in information transfer, and the semantic features of the concept itself will be lost.

4.8 Visualization

To qualitatively examine the role of label semantics (concept anchors) in the concept-guided contrastive learning, we randomly select a facet (Diplomatic Strategy) and show the t-SNE projections of text representations from test set in Figure 3. As observed, for the case of “w/o CGCL”, all samples are almost scattered without separations. There is a similar but better distribution for the model trained with CL. While for our CGCL (i.e., the full model), instances are well clustered by labels with only a slight overlap and the concept anchors are approximately cluster centers. This confirms that concept representations learned from BICo guide the model to better distinguish among different ideologies in the embedding space, which is helpful for subsequent classification.

5 Related Work

Ideology Detection

This task detects the ideology of texts in a generic facet. Many studies rely on text analysis techniques and try to leverage various textual cues (Bhatia and P, 2018; Baly et al., 2019, 2020; Chen et al., 2020; Kabir and Madria, 2022; Liu et al., 2022; Kim and Johnson, 2022; Devatine et al., 2023; Hong et al., 2023; Chen et al., 2023). In addition to text content, social networks (Li and Goldwasser, 2019; Stefanov et al., 2020; Xiao et al., 2020; Li and Goldwasser, 2021), external knowledge (Kulkarni et al., 2018; Zhang et al., 2022) and multimodal information (Dinkov et al., 2019; Qiu et al., 2022) are utilized to identify the ideology of online texts.

Multifaceted Ideology Detection

Considering that some texts may contain descriptions of different issues and reflect the author’s ideology from various aspects, some recent work study ideology detection on multiple facets. Sinno et al. (2022) investigate the political ideology of news articles from three facets, social, economic and foreign. Liu et al. (2023) first propose the MID task and design the first multifaceted ideology schema which defines 5 domains and 12 facets in a hierarchical structure. They also manually annotate a high-quality MITweet dataset and build baselines for MID. We follow Liu et al. (2023) and introduce label semantics into models through encoding the hierarchical schema.

6 Conclusion

In this paper, we have proposed a concept semantics-enhanced framework for the MID task. We have also designed a novel bidirectional iterative concept flow method to capture multi-granularity concept semantics. Moreover, we have explored concept attentive matching and concept-guided contrastive learning strategies to enable the model to extract ideology features with the help of concept semantics. Experiment results have validated the superiority of our approach.

Acknowledgement

This work is supported in part by Major Project of National Social Science Foundation of China: “AI and Precise International Communication” (Grant No. 22&ZD317) and National Natural Science Foundation of China (Grant No. 62172167). The computation is supported by the HPC Platform of Huazhong University of Science and Technology.

Limitations

  • Following Liu et al. (2023), we divide multifaceted ideology detection into two subtasks in a pipeline manner. However, this modeling approach increases the computational cost in both training and inference stages. In addition, error propagation in this pipeline mode is also a problem that cannot be ignored. We will investigate how to solve this task in an end-to-end manner in future work.

  • While we attempt to tune the concepts defined in the schema to better fit our approach, we are constrained by computational resources and time, so we directly adopt the concepts in the schema. Although these concepts are representative, there may be better ones that could lead to better performance.

Ethical Considerations

We carry out this work and conduct the experiments in accordance with the general ethics in social science research. The proposed concept-enhanced framework could automatically detect the multifaceted ideology of given texts, which is helpful for policy-makers and social statisticians. However, the algorithm is not perfect and may make incorrect predictions. Therefore, researches should realize the potential harm from the misuse of the ideology detection system, and cannot rely solely on the system to make judgments.

References

Domain Facet Left Right
Politics Political Regime (PoR) Socialism Capitalism
State Structure (SS) Centralism Federalism
Economy Economic Orientation (EO) Command Economy Market Economy
Economic Equality (EE) Outcome Equality Opportunity Equality
Culture Ethical Pursuit (EP) Ethical Liberalism Ethical Conservatism
Church-State Relations (CSR) Secularism Caesaropapism
Cultural Value (CV) Collectivism Individualism
Diplomacy Diplomatic Strategy (DS) Globalism Isolationism
Military Force (MF) Militarism Pacifism
Society Social Development (SD) Revolutionism Reformism
Justice Orientation (JO) Result Justice Procedural Justice
Personal Right (PeR) Social Responsibility Individual Right
Table 4: Multifaceted ideology schema (Liu et al., 2023).

Appendix A Multifaceted Ideology Schema

We present the multifaceted ideology schema in Table 4. Concepts of facets and ideologies defined in the schema can be found in Liu et al. (2023). Note that the original schema does not give the concepts of “Center”, so we define them based on the concepts of “Left” and “Right”, as follows:

A.1 Domain 1: Politics

  • Political Regime (PoR)

    Center: A moderate stance advocating for a mix of public and private ownership, seeking a balanced approach to property control and means of production.

  • State Structure (SS)

    Center: A moderate stance advocating for a balanced power structure, combining elements of central authority and power distribution.

A.2 Domain 2: Economy

  • Economic Orientation (EO)

    Center: A moderate stance advocating for combining government intervention in important economic decisions with the role of individuals, organizations, and market interactions.

  • Economic Equality (EE)

    Center: A moderate position advocating for an economic system that balances equal treatment and access to resources with considerations for distribution outcomes among different groups.

A.3 Domain 3: Culture

  • Ethical Pursuit (EP)

    Center: The mainstream culture should consider individual freedoms and cultural norms while promoting inclusivity dialogue on controversial issues.

  • Church-State Relations (CSR)

    Center: A moderate position advocating for a balanced and cooperative relationship between the church and state, respecting both religious autonomy and the principles of secular governance.

  • Cultural Value (CV)

    Center: A moderate stance that recognizes the importance of both social collectives and individual autonomy in sha** and preserving a diverse and inclusive society.

A.4 Domain 4: Diplomacy

  • Diplomatic Strategy (DS)

    Center: A moderate position that balances international cooperation and national interests, recognizing the value of engagement while cautiously managing political and economic entanglements with other countries.

  • Military Force (MF)

    Center: A moderate stance that recognizes the need for armed defense and security while prioritizing non-violent resolution for conflicts.

A.5 Domain 5: Society

  • Social Development (SD)

    Center: A moderate position that advocates combining direct action when necessary with a recognition of the value of gradual and sustainable change to achieve social goals.

  • Justice Orientation (JO)

    Center: A moderate stance that seeks a balance between fair distribution and fair decision-making, considering both the outcomes and procedure of justice.

  • Personal Right (PeR)

    Center: A moderate position that recognizes the importance of both fulfilling individual responsibilities and protecting individual rights in an equitable manner.

Appendix B Prompts for LLMs

The prompt templates designed for LLMs in two subtasks are as follows. We fill the templates with the facet names and definitions in the multifaceted ideology schema. In few-shot experiments, we provide LLMs with a few in-context demonstrations, which are manually selected for each facet to ensure diversity. We also provide a brief analysis as chain-of-thought for each demonstration. In zero-shot experiments, the demonstrations in the prompts will be removed.

B.1 Relevance Recognition

  • System prompt

    You will be provided with a piece of text. Determine if the text is related to "{facet}".

    {facet} is defined as: {facet_def}

    First give your analysis briefly and then select your answer from ["Related", "Unrelated"].

    Here are some demonstrations:
    {demonstrations}

  • User prompt

    Text: """{text}"""

B.2 Ideology Analysis

  • System prompt

    You will be provided with a piece of text. Determine the orientation of the text towards "{facet}".

    The orientation towards "{facet}" can be divided into ["Left", "Right", "Center"]. The definitions are as follows:

    -Left: {left_def}
    -Right: {right_def}
    -Center: {center_def}

    First give your analysis briefly and then select your answer from ["Left", "Right", "Center"].

    Here are some demonstrations:
    {demonstrations}

  • User prompt

    Text: """{text}"""

Appendix C Additional Ablation Study

As shown in Table 5, the removal of Concept Metapath Diffusion or Concept Hierarchy Aggregation causes a drop in performance. And removing both of them (w/o BICo) leads to a more significant performance degradation. The concept diffusion from root to leaf enables the high-level general semantics to propagate to lower-level nodes, while the concept aggregation from leaf to root allows the high-level nodes to perceive multifaceted concepts from lower levels. Both contribute to enriching label representations. The results further validate the effectiveness of both modules.

Model Macro-F1 Micro-F1 Macro-Acc Micro-Acc
Subtask 1: Relevance Recognition
Our Approach 59.22 72.14 - -
w/o CMD 57.92 70.81 - -
w/o CHA 58.73 71.03 - -
w/o BICo 58.14 70.41 - -
Subtask 2: Ideology Analysis
Our Approach 47.32 70.90 66.79 78.60
w/o CMD 46.38 68.93 65.90 77.71
w/o CHA 46.13 69.48 66.75 77.85
w/o BICo 46.02 68.58 66.18 76.63
Table 5: Results of ablation study for the modules of Concept Metapath Diffusion (CMD) and Concept Hierarchy Aggregation (CHA) in BICo. Note that “w/o BICo” is equivalent to “w/o CMD&CHA”
Domain Facet Relevance Ideology
#Related #Left #Center #Right
Politcs PoR 112 39 14 59
SS 291 67 88 136
Economy EO 759 294 297 168
EE 672 520 119 33
Culture EP 2935 1976 465 494
CSR 68 33 17 18
CV 154 95 11 48
Diplomacy DS 1572 711 421 440
MF 1837 132 575 1130
Society SD 1737 1236 287 214
JO 3452 3058 281 113
PeR 3516 171 241 3104
Table 6: Statistics of the MITweet dataset.