Search | arXiv e-print repository

arXiv:2404.19007 [pdf, other]

How Did We Get Here? Summarizing Conversation Dynamics

Authors: Yilun Hua, Nicholas Chernogor, Yuzhe Gu, Seoyeon Julie Jeong, Miranda Luo, Cristian Danescu-Niculescu-Mizil

Abstract: Throughout a conversation, the way participants interact with each other is in constant flux: their tones may change, they may resort to different strategies to convey their points, or they might alter their interaction patterns. An understanding of these dynamics can complement that of the actual facts and opinions discussed, offering a more holistic view of the trajectory of the conversation: ho… ▽ More Throughout a conversation, the way participants interact with each other is in constant flux: their tones may change, they may resort to different strategies to convey their points, or they might alter their interaction patterns. An understanding of these dynamics can complement that of the actual facts and opinions discussed, offering a more holistic view of the trajectory of the conversation: how it arrived at its current state and where it is likely heading. In this work, we introduce the task of summarizing the dynamics of conversations, by constructing a dataset of human-written summaries, and exploring several automated baselines. We evaluate whether such summaries can capture the trajectory of conversations via an established downstream task: forecasting whether an ongoing conversation will eventually derail into toxic behavior. We show that they help both humans and automated systems with this forecasting task. Humans make predictions three times faster, and with greater confidence, when reading the summaries than when reading the transcripts. Furthermore, automated forecasting systems are more accurate when constructing, and then predicting based on, summaries of conversation dynamics, compared to directly predicting on the transcripts. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: To appear in the Proceedings of NAACL 2024. Data available in ConvoKit https://convokit.cornell.edu/

arXiv:2212.01401 [pdf, other]

doi 10.1145/3555603

Thread With Caution: Proactively Hel** Users Assess and Deescalate Tension in Their Online Discussions

Authors: Jonathan P. Chang, Charlotte Schluger, Cristian Danescu-Niculescu-Mizil

Abstract: Incivility remains a major challenge for online discussion platforms, to such an extent that even conversations between well-intentioned users can often derail into uncivil behavior. Traditionally, platforms have relied on moderators to -- with or without algorithmic assistance -- take corrective actions such as removing comments or banning users. In this work we propose a complementary paradigm t… ▽ More Incivility remains a major challenge for online discussion platforms, to such an extent that even conversations between well-intentioned users can often derail into uncivil behavior. Traditionally, platforms have relied on moderators to -- with or without algorithmic assistance -- take corrective actions such as removing comments or banning users. In this work we propose a complementary paradigm that directly empowers users by proactively enhancing their awareness about existing tension in the conversation they are engaging in and actively guides them as they are drafting their replies to avoid further escalation. As a proof of concept for this paradigm, we design an algorithmic tool that provides such proactive information directly to users, and conduct a user study in a popular discussion platform. Through a mixed methods approach combining surveys with a randomized controlled experiment, we uncover qualitative and quantitative insights regarding how the participants utilize and react to this information. Most participants report finding this proactive paradigm valuable, noting that it helps them to identify tension that they may have otherwise missed and prompts them to further reflect on their own replies and to revise them. These effects are corroborated by a comparison of how the participants draft their reply when our tool warns them that their conversation is at risk of derailing into uncivil behavior versus in a control condition where the tool is disabled. These preliminary findings highlight the potential of this user-centered paradigm and point to concrete directions for future implementations. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 37 pages, 2 figures. More information at https://www.cs.cornell.edu/~cristian/Thread_With_Caution.html

Journal ref: Proceedings of the ACM on Human-Computer Interaction, Volume 6, Issue CSCW2 (2022), Article 545 pp 1-37

arXiv:2211.16525 [pdf, other]

doi 10.1145/3555095

Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support

Authors: Charlotte Schluger, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, Karen Levy

Abstract: To address the widespread problem of uncivil behavior, many online discussion platforms employ human moderators to take action against objectionable content, such as removing it or placing sanctions on its authors. This reactive paradigm of taking action against already-posted antisocial content is currently the most common form of moderation, and has accordingly underpinned many recent efforts at… ▽ More To address the widespread problem of uncivil behavior, many online discussion platforms employ human moderators to take action against objectionable content, such as removing it or placing sanctions on its authors. This reactive paradigm of taking action against already-posted antisocial content is currently the most common form of moderation, and has accordingly underpinned many recent efforts at introducing automation into the moderation process. Comparatively less work has been done to understand other moderation paradigms -- such as proactively discouraging the emergence of antisocial behavior rather than reacting to it -- and the role algorithmic support can play in these paradigms. In this work, we investigate such a proactive framework for moderation in a case study of a collaborative setting: Wikipedia Talk Pages. We employ a mixed methods approach, combining qualitative and design components for a holistic analysis. Through interviews with moderators, we find that despite a lack of technical and social support, moderators already engage in a number of proactive moderation behaviors, such as preemptively intervening in conversations to keep them on track. Further, we explore how automation could assist with this existing proactive moderation workflow by building a prototype tool, presenting it to moderators, and examining how the assistance it provides might fit into their workflow. The resulting feedback uncovers both strengths and drawbacks of the prototype tool and suggests concrete steps towards further develo** such assisting technology so it can most effectively support moderators in their existing proactive moderation workflow. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 27 pages, 3 figures. More info at https://www.cs.cornell.edu/~cristian/Proactive_Moderation.html

Journal ref: Proceedings of the ACM on Human-Computer Interaction, Volume 6, Issue CSCW2 (2022), Article 370 pp 1-27

arXiv:2012.00012 [pdf, other]

Facilitating the Communication of Politeness through Fine-Grained Paraphrasing

Authors: Liye Fu, Susan R. Fussell, Cristian Danescu-Niculescu-Mizil

Abstract: Aided by technology, people are increasingly able to communicate across geographical, cultural, and language barriers. This ability also results in new challenges, as interlocutors need to adapt their communication approaches to increasingly diverse circumstances. In this work, we take the first steps towards automatically assisting people in adjusting their language to a specific communication ci… ▽ More Aided by technology, people are increasingly able to communicate across geographical, cultural, and language barriers. This ability also results in new challenges, as interlocutors need to adapt their communication approaches to increasingly diverse circumstances. In this work, we take the first steps towards automatically assisting people in adjusting their language to a specific communication circumstance. As a case study, we focus on facilitating the accurate transmission of pragmatic intentions and introduce a methodology for suggesting paraphrases that achieve the intended level of politeness under a given communication circumstance. We demonstrate the feasibility of this approach by evaluating our method in two realistic communication scenarios and show that it can reduce the potential for misalignment between the speaker's intentions and the listener's perceptions in both cases. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Comments: Proceedings of EMNLP 2020, 14 pages. Data and code at https://convokit.cornell.edu/ and https://github.com/CornellNLP/politeness-paraphrase

arXiv:2009.03897 [pdf, other]

doi 10.1145/3415202

Quantifying the Causal Effects of Conversational Tendencies

Authors: Justine Zhang, Sendhil Mullainathan, Cristian Danescu-Niculescu-Mizil

Abstract: Understanding what leads to effective conversations can aid the design of better computer-mediated communication platforms. In particular, prior observational work has sought to identify behaviors of individuals that correlate to their conversational efficiency. However, translating such correlations to causal interpretations is a necessary step in using them in a prescriptive fashion to guide bet… ▽ More Understanding what leads to effective conversations can aid the design of better computer-mediated communication platforms. In particular, prior observational work has sought to identify behaviors of individuals that correlate to their conversational efficiency. However, translating such correlations to causal interpretations is a necessary step in using them in a prescriptive fashion to guide better designs and policies. In this work, we formally describe the problem of drawing causal links between conversational behaviors and outcomes. We focus on the task of determining a particular type of policy for a text-based crisis counseling platform: how best to allocate counselors based on their behavioral tendencies exhibited in their past conversations. We apply arguments derived from causal inference to underline key challenges that arise in conversational settings where randomized trials are hard to implement. Finally, we show how to circumvent these inference challenges in our particular domain, and illustrate the potential benefits of an allocation policy informed by the resulting prescriptive information. △ Less

Submitted 8 September, 2020; originally announced September 2020.

Comments: 24 pages, 6 figures. In Proceedings of CSCW, 2020

arXiv:2005.04246 [pdf, other]

ConvoKit: A Toolkit for the Analysis of Conversations

Authors: Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Z. Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil

Abstract: This paper describes the design and functionality of ConvoKit, an open-source toolkit for analyzing conversations and the social interactions embedded within. ConvoKit provides an unified framework for representing and manipulating conversational data, as well as a large and diverse collection of conversational datasets. By providing an intuitive interface for exploring and interacting with conver… ▽ More This paper describes the design and functionality of ConvoKit, an open-source toolkit for analyzing conversations and the social interactions embedded within. ConvoKit provides an unified framework for representing and manipulating conversational data, as well as a large and diverse collection of conversational datasets. By providing an intuitive interface for exploring and interacting with conversational data, this toolkit lowers the technical barriers for the broad adoption of computational methods for conversational analysis. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: Proceedings of SIGDIAL 2020 (System Demos)

arXiv:2005.04245 [pdf, other]

Balancing Objectives in Counseling Conversations: Advancing Forwards or Looking Backwards

Authors: Justine Zhang, Cristian Danescu-Niculescu-Mizil

Abstract: Throughout a conversation, participants make choices that can orient the flow of the interaction. Such choices are particularly salient in the consequential domain of crisis counseling, where a difficulty for counselors is balancing between two key objectives: advancing the conversation towards a resolution, and empathetically addressing the crisis situation. In this work, we develop an unsuperv… ▽ More Throughout a conversation, participants make choices that can orient the flow of the interaction. Such choices are particularly salient in the consequential domain of crisis counseling, where a difficulty for counselors is balancing between two key objectives: advancing the conversation towards a resolution, and empathetically addressing the crisis situation. In this work, we develop an unsupervised methodology to quantify how counselors manage this balance. Our main intuition is that if an utterance can only receive a narrow range of appropriate replies, then its likely aim is to advance the conversation forwards, towards a target within that range. Likewise, an utterance that can only appropriately follow a narrow range of possible utterances is likely aimed backwards at addressing a specific situation within that range. By applying this intuition, we can map each utterance to a continuous orientation axis that captures the degree to which it is intended to direct the flow of the conversation forwards or backwards. This unsupervised method allows us to characterize counselor behaviors in a large dataset of crisis counseling conversations, where we show that known counseling strategies intuitively align with this axis. We also illustrate how our measure can be indicative of a conversation's progress, as well as its effectiveness. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: 14 pages, 6 figures, code available through the Cornell Conversational Analysis Toolkit (https://convokit.cornell.edu/). in Proceedings of ACL, 2020

arXiv:2004.13609 [pdf, other]

doi 10.1145/3366423.3380273

Don't Let Me Be Misunderstood: Comparing Intentions and Perceptions in Online Discussions

Authors: Jonathan P. Chang, Justin Cheng, Cristian Danescu-Niculescu-Mizil

Abstract: Discourse involves two perspectives: a person's intention in making an utterance and others' perception of that utterance. The misalignment between these perspectives can lead to undesirable outcomes, such as misunderstandings, low productivity and even overt strife. In this work, we present a computational framework for exploring and comparing both perspectives in online public discussions. We… ▽ More Discourse involves two perspectives: a person's intention in making an utterance and others' perception of that utterance. The misalignment between these perspectives can lead to undesirable outcomes, such as misunderstandings, low productivity and even overt strife. In this work, we present a computational framework for exploring and comparing both perspectives in online public discussions. We combine logged data about public comments on Facebook with a survey of over 16,000 people about their intentions in writing these comments or about their perceptions of comments that others had written. Unlike previous studies of online discussions that have largely relied on third-party labels to quantify properties such as sentiment and subjectivity, our approach also directly captures what the speakers actually intended when writing their comments. In particular, our analysis focuses on judgments of whether a comment is stating a fact or an opinion, since these concepts were shown to be often confused. We show that intentions and perceptions diverge in consequential ways. People are more likely to perceive opinions than to intend them, and linguistic cues that signal how an utterance is intended can differ from those that signal how it will be perceived. Further, this misalignment between intentions and perceptions can be linked to the future health of a conversation: when a comment whose author intended to share a fact is misperceived as sharing an opinion, the subsequent conversation is more likely to derail into uncivil behavior than when the comment is perceived as intended. Altogether, these findings may inform the design of discussion platforms that better promote positive interactions. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: Proceedings of The Web Conference (WWW) 2020

arXiv:1910.09563 [pdf, other]

doi 10.1145/3359265

Content Removal as a Moderation Strategy: Compliance and Other Outcomes in the ChangeMyView Community

Authors: Kumar Bhargav Srinivasan, Cristian Danescu-Niculescu-Mizil, Lillian Lee, Chenhao Tan

Abstract: Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subreddit. The standard analytic approach of interrup… ▽ More Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subreddit. The standard analytic approach of interrupted time-series analysis unfortunately cannot answer this question of causality because it fails to distinguish the effect of having made a non-compliant comment from the effect of being subjected to moderator removal of that comment. We therefore leverage a "delayed feedback" approach based on the observation that some users may remain active between the time when they posted the non-compliant comment and the time when that comment is deleted. Applying this approach to such users, we reveal the causal role of comment deletion in reducing immediate noncompliance rates, although we do not find evidence of it having a causal role in inducing other behavior improvements. Our work thus empirically demonstrates both the promise and some potential limits of content removal as a positive moderation strategy, and points to future directions for identifying causal effects from observational data. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: 21 pages, 8 figures, accepted at CSCW 2019, the dataset is available at https://chenhaot.com/papers/content-removal.html

arXiv:1909.01362 [pdf, other]

Trouble on the Horizon: Forecasting the Derailment of Online Conversations as they Develop

Authors: Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil

Abstract: Online discussions often derail into toxic exchanges between participants. Recent efforts mostly focused on detecting antisocial behavior after the fact, by analyzing single comments in isolation. To provide more timely notice to human moderators, a system needs to preemptively detect that a conversation is heading towards derailment before it actually turns toxic. This means modeling derailment a… ▽ More Online discussions often derail into toxic exchanges between participants. Recent efforts mostly focused on detecting antisocial behavior after the fact, by analyzing single comments in isolation. To provide more timely notice to human moderators, a system needs to preemptively detect that a conversation is heading towards derailment before it actually turns toxic. This means modeling derailment as an emerging property of a conversation rather than as an isolated utterance-level event. Forecasting emerging conversational properties, however, poses several inherent modeling challenges. First, since conversations are dynamic, a forecasting model needs to capture the flow of the discussion, rather than properties of individual comments. Second, real conversations have an unknown horizon: they can end or derail at any time; thus a practical forecasting model needs to assess the risk in an online fashion, as the conversation develops. In this work we introduce a conversational forecasting model that learns an unsupervised representation of conversational dynamics and exploits it to predict future derailment as the conversation develops. By applying this model to two new diverse datasets of online conversations with labels for antisocial events, we show that it outperforms state-of-the-art systems at forecasting derailment. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: To appear in Proceedings of EMNLP 2019. Data and code to be released as part of the Cornell Conversational Analysis Toolkit (convokit.cornell.edu)

arXiv:1906.07194 [pdf, other]

Finding Your Voice: The Linguistic Development of Mental Health Counselors

Authors: Justine Zhang, Robert Filbin, Christine Morrison, Jaclyn Weiser, Cristian Danescu-Niculescu-Mizil

Abstract: Mental health counseling is an enterprise with profound societal importance where conversations play a primary role. In order to acquire the conversational skills needed to face a challenging range of situations, mental health counselors must rely on training and on continued experience with actual clients. However, in the absence of large scale longitudinal studies, the nature and significance of… ▽ More Mental health counseling is an enterprise with profound societal importance where conversations play a primary role. In order to acquire the conversational skills needed to face a challenging range of situations, mental health counselors must rely on training and on continued experience with actual clients. However, in the absence of large scale longitudinal studies, the nature and significance of this developmental process remain unclear. For example, prior literature suggests that experience might not translate into consequential changes in counselor behavior. This has led some to even argue that counseling is a profession without expertise. In this work, we develop a computational framework to quantify the extent to which individuals change their linguistic behavior with experience and to study the nature of this evolution. We use our framework to conduct a large longitudinal study of mental health counseling conversations, tracking over 3,400 counselors across their tenure. We reveal that overall, counselors do indeed change their conversational behavior to become more diverse across interactions, develo** an individual voice that distinguishes them from other counselors. Furthermore, a finer-grained investigation shows that the rate and nature of this diversification vary across functionally different conversational components. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: To appear at ACL 2019, 12 pages, 2 figures; code available through the Cornell Conversational Analysis Toolkit (https://convokit.cornell.edu)

arXiv:1904.01587 [pdf, other]

Asking the Right Question: Inferring Advice-Seeking Intentions from Personal Narratives

Authors: Liye Fu, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil

Abstract: People often share personal narratives in order to seek advice from others. To properly infer the narrator's intention, one needs to apply a certain degree of common sense and social intuition. To test the capabilities of NLP systems to recover such intuition, we introduce the new task of inferring what is the advice-seeking goal behind a personal narrative. We formulate this as a cloze test, wher… ▽ More People often share personal narratives in order to seek advice from others. To properly infer the narrator's intention, one needs to apply a certain degree of common sense and social intuition. To test the capabilities of NLP systems to recover such intuition, we introduce the new task of inferring what is the advice-seeking goal behind a personal narrative. We formulate this as a cloze test, where the goal is to identify which of two advice-seeking questions was removed from a given narrative. The main challenge in constructing this task is finding pairs of semantically plausible advice-seeking questions for given narratives. To address this challenge, we devise a method that exploits commonalities in experiences people share online to automatically extract pairs of questions that are appropriate candidates for the cloze task. This results in a dataset of over 20,000 personal narratives, each matched with a pair of related advice-seeking questions: one actually intended by the narrator, and the other one not. The dataset covers a very broad array of human experiences, from dating, to career options, to stolen iPads. We use human annotation to determine the degree to which the task relies on common sense and social intuition in addition to a semantic understanding of the narrative. By introducing several baselines for this new task we demonstrate its feasibility and identify avenues for better modeling the intention of the narrator. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: To appear in the Proceedings of NAACL 2019, 14 pages. Data, code and additional information at https://github.com/CornellNLP/ASQ

arXiv:1902.08628 [pdf, other]

doi 10.1145/3308558.3313638

Trajectories of Blocked Community Members: Redemption, Recidivism and Departure

Authors: Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil

Abstract: Community norm violations can impair constructive communication and collaboration online. As a defense mechanism, community moderators often address such transgressions by temporarily blocking the perpetrator. Such actions, however, come with the cost of potentially alienating community members. Given this tradeoff, it is essential to understand to what extent, and in which situations, this common… ▽ More Community norm violations can impair constructive communication and collaboration online. As a defense mechanism, community moderators often address such transgressions by temporarily blocking the perpetrator. Such actions, however, come with the cost of potentially alienating community members. Given this tradeoff, it is essential to understand to what extent, and in which situations, this common moderation practice is effective in reinforcing community rules. In this work, we introduce a computational framework for studying the future behavior of blocked users on Wikipedia. After their block expires, they can take several distinct paths: they can reform and adhere to the rules, but they can also recidivate, or straight-out abandon the community. We reveal that these trajectories are tied to factors rooted both in the characteristics of the blocked individual and in whether they perceived the block to be fair and justified. Based on these insights, we formulate a series of prediction tasks aiming to determine which of these paths a user is likely to take after being blocked for their first offense, and demonstrate the feasibility of these new tasks. Overall, this work builds towards a more nuanced approach to moderation by highlighting the tradeoffs that are in play. △ Less

Submitted 22 February, 2019; originally announced February 2019.

Comments: To appear in Proceedings of the 2019 World Wide Web Conference (WWW '19), May 13-17, 2019, San Francisco, CA, USA. Code and data available as part of ConvoKit: convokit.cornell.edu

arXiv:1810.13181 [pdf, other]

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Authors: Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

Abstract: We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of deta… ▽ More We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person's conversational behavior depends on how they relate to the discussion's venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. Finally the reconstruction framework is designed to be language agnostic, and we show that it can extract high quality conversational data in both Chinese and English. △ Less

Submitted 31 October, 2018; originally announced October 2018.

Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

arXiv:1805.05345 [pdf, other]

Conversations Gone Awry: Detecting Early Signs of Conversational Failure

Authors: Justine Zhang, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Nithum Thain, Dario Taraborelli

Abstract: One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversa… ▽ More One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged. To this end, we develop a framework for capturing pragmatic devices---such as politeness strategies and rhetorical prompts---used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions. △ Less

Submitted 14 May, 2018; originally announced May 2018.

Comments: To appear in the Proceedings of ACL 2018, 15 pages, 1 figure. Data, quiz, code and additional information at http://www.cs.cornell.edu/~cristian/Conversations_gone_awry.html

arXiv:1708.02254 [pdf, other]

Asking Too Much? The Rhetorical Role of Questions in Political Discourse

Authors: Justine Zhang, Arthur Spirling, Cristian Danescu-Niculescu-Mizil

Abstract: Questions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange. The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor. While the informational nature of questions has been extensively examined in the context of questio… ▽ More Questions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange. The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor. While the informational nature of questions has been extensively examined in the context of question-answering applications, their rhetorical aspects have been largely understudied. In this work we introduce an unsupervised methodology for extracting surface motifs that recur in questions, and for grou** them according to their latent rhetorical role. By applying this framework to the setting of question sessions in the UK parliament, we show that the resulting typology encodes key aspects of the political discourse---such as the bifurcation in questioning behavior between government and opposition parties---and reveals new insights into the effects of a legislator's tenure and political career ambitions. △ Less

Submitted 7 August, 2017; originally announced August 2017.

Comments: To appear at EMNLP 2017; 15 pages including appendix; 3 figures; parliament data and code available at http://www.cs.cornell.edu/~cristian/Asking_too_much.html

arXiv:1705.09665 [pdf, other]

Community Identity and User Engagement in a Multi-Community Landscape

Authors: Justine Zhang, William L. Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec

Abstract: A community's identity defines and shapes its internal dynamics. Our current understanding of this interplay is mostly limited to glimpses gathered from isolated studies of individual communities. In this work we provide a systematic exploration of the nature of this relation across a wide variety of online communities. To this end we introduce a quantitative, language-based typology reflecting tw… ▽ More A community's identity defines and shapes its internal dynamics. Our current understanding of this interplay is mostly limited to glimpses gathered from isolated studies of individual communities. In this work we provide a systematic exploration of the nature of this relation across a wide variety of online communities. To this end we introduce a quantitative, language-based typology reflecting two key aspects of a community's identity: how distinctive, and how temporally dynamic it is. By map** almost 300 Reddit communities into the landscape induced by this typology, we reveal regularities in how patterns of user engagement vary with the characteristics of a community. Our results suggest that the way new and existing users engage with a community depends strongly and systematically on the nature of the collective identity it fosters, in ways that are highly consequential to community maintainers. For example, communities with distinctive and highly dynamic identities are more likely to retain their users. However, such niche communities also exhibit much larger acculturation gaps between existing users and newcomers, which potentially hinder the integration of the latter. More generally, our methodology reveals differences in how various social phenomena manifest across communities, and shows that structuring the multi-community landscape can lead to a better understanding of the systematic nature of this diversity. △ Less

Submitted 26 May, 2017; originally announced May 2017.

Comments: 10 page, 3 figures, To appear in the Proceedings of the 11th International Conference On Web And Social Media, ICWSM 2017; this version has subtle differences with the proceedings version, including an introductory quote

arXiv:1705.02522 [pdf, other]

People on Drugs: Credibility of User Statements in Health Communities

Authors: Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil

Abstract: Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision fro… ▽ More Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information. △ Less

Submitted 6 May, 2017; originally announced May 2017.

arXiv:1703.09315 [pdf, other]

Tracing the Use of Practices through Networks of Collaboration

Authors: Rahmtin Rotabi, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg

Abstract: An active line of research has used on-line data to study the ways in which discrete units of information---including messages, photos, product recommendations, group invitations---spread through social networks. There is relatively little understanding, however, of how on-line data might help in studying the diffusion of more complex {\em practices}---roughly, routines or styles of work that are… ▽ More An active line of research has used on-line data to study the ways in which discrete units of information---including messages, photos, product recommendations, group invitations---spread through social networks. There is relatively little understanding, however, of how on-line data might help in studying the diffusion of more complex {\em practices}---roughly, routines or styles of work that are generally handed down from one person to another through collaboration or mentorship. In this work, we propose a framework together with a novel type of data analysis that seeks to study the spread of such practices by tracking their syntactic signatures in large document collections. Central to this framework is the notion of an "inheritance graph" that represents how people pass the practice on to others through collaboration. Our analysis of these inheritance graphs demonstrates that we can trace a significant number of practices over long time-spans, and we show that the structure of these graphs can help in predicting the longevity of collaborations within a field, as well as the fitness of the practices themselves. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: To Appear in Proceedings of ICWSM 2017, data at https://github.com/CornellNLP/Macros

ACM Class: H.2.8

arXiv:1703.03386 [pdf, other]

Loyalty in Online Communities

Authors: William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec

Abstract: Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether loyalty is encouraged by certain community characteristics. In this paper we operationa… ▽ More Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether loyalty is encouraged by certain community characteristics. In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time. By exploring this relation using a large dataset of discussion communities from Reddit, we reveal that loyalty is manifested in remarkably consistent behaviors across a wide spectrum of communities. Loyal users employ language that signals collective identity and engage with more esoteric, less popular content, indicating they may play a curational role in surfacing new material. Loyal communities have denser user-user interaction networks and lower rates of triadic closure, suggesting that community-level loyalty is associated with more cohesive interactions and less fragmentation into subgroups. We exploit these general patterns to predict future rates of loyalty. Our results show that a user's propensity to become loyal is apparent from their first interactions with a community, suggesting that some users are intrinsically loyal from the very beginning. △ Less

Submitted 24 May, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

Comments: Extended version of a paper appearing in the Proceedings of ICWSM 2017 (with the same title); please cite the official ICWSM version

arXiv:1702.07717 [pdf, other]

When confidence and competence collide: Effects on online decision-making discussions

Authors: Liye Fu, Lillian Lee, Cristian Danescu-Niculescu-Mizil

Abstract: Group discussions are a way for individuals to exchange ideas and arguments in order to reach better decisions than they could on their own. One of the premises of productive discussions is that better solutions will prevail, and that the idea selection process is mediated by the (relative) competence of the individuals involved. However, since people may not know their actual competence on a new… ▽ More Group discussions are a way for individuals to exchange ideas and arguments in order to reach better decisions than they could on their own. One of the premises of productive discussions is that better solutions will prevail, and that the idea selection process is mediated by the (relative) competence of the individuals involved. However, since people may not know their actual competence on a new task, their behavior is influenced by their self-estimated competence --- that is, their confidence --- which can be misaligned with their actual competence. Our goal in this work is to understand the effects of confidence-competence misalignment on the dynamics and outcomes of discussions. To this end, we design a large-scale natural setting, in the form of an online team-based geography game, that allows us to disentangle confidence from competence and thus separate their effects. We find that in task-oriented discussions, the more-confident individuals have a larger impact on the group's decisions even when these individuals are at the same level of competence as their teammates. Furthermore, this unjustified role of confidence in the decision-making process often leads teams to under-perform. We explore this phenomenon by investigating the effects of confidence on conversational dynamics. △ Less

Submitted 5 March, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

Comments: To appear in Proceedings of WWW 2017. Online multiplayer game available at http://streetcrowd.us/start

arXiv:1702.06527 [pdf, other]

doi 10.1145/3038912.3052652

Competition and Selection Among Conventions

Authors: Rahmtin Rotabi, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg

Abstract: In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grai… ▽ More In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grained level. In analyzing the dynamics of conventions over time, however, even with detailed on-line data, one encounters two significant challenges. First, as conventions evolve, the underlying substance of their meaning tends to change as well; and such substantive changes confound investigations of social effects. Second, the selection of a convention takes place through the complex interactions of individuals within a community, and contention between the users of competing conventions plays a key role in the convention's evolution. Any analysis must take place in the presence of these two issues. In this work we study a setting in which we can cleanly track the competition among conventions. Our analysis is based on the spread of low-level authoring conventions in the eprint arXiv over 24 years: by tracking the spread of macros and other author-defined conventions, we are able to study conventions that vary even as the underlying meaning remains constant. We find that the interaction among co-authors over time plays a crucial role in the selection of them; the distinction between more and less experienced members of the community, and the distinction between conventions with visible versus invisible effects, are both central to the underlying processes. Through our analysis we make predictions at the population level about the ultimate success of different synonymous conventions over time--and at the individual level about the outcome of "fights" between people over convention choices. △ Less

Submitted 26 March, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

Comments: To appear in Proceedings of WWW 2017, data at https://github.com/CornellNLP/Macros

arXiv:1702.01119 [pdf, other]

doi 10.1145/2998181.2998213

Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions

Authors: Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

Abstract: In online communities, antisocial behavior such as trolling disrupts constructive discussion. While prior work suggests that trolling behavior is confined to a vocal and antisocial minority, we demonstrate that ordinary people can engage in such behavior as well. We propose two primary trigger mechanisms: the individual's mood, and the surrounding context of a discussion (e.g., exposure to prior t… ▽ More In online communities, antisocial behavior such as trolling disrupts constructive discussion. While prior work suggests that trolling behavior is confined to a vocal and antisocial minority, we demonstrate that ordinary people can engage in such behavior as well. We propose two primary trigger mechanisms: the individual's mood, and the surrounding context of a discussion (e.g., exposure to prior trolling behavior). Through an experiment simulating an online discussion, we find that both negative mood and seeing troll posts by others significantly increases the probability of a user trolling, and together double this probability. To support and extend these results, we study how these same mechanisms play out in the wild via a data-driven, longitudinal analysis of a large online news discussion community. This analysis reveals temporal mood effects, and explores long range patterns of repeated exposure to trolling. A predictive model of trolling behavior shows that mood and discussion context together can explain trolling behavior better than an individual's history of trolling. These results combine to suggest that ordinary people can, under the right circumstances, behave like trolls. △ Less

Submitted 3 February, 2017; originally announced February 2017.

Comments: Best Paper Award at CSCW 2017

ACM Class: H.2.8; J.4

arXiv:1607.03895 [pdf, other]

Tie-breaker: Using language models to quantify gender bias in sports journalism

Authors: Liye Fu, Cristian Danescu-Niculescu-Mizil, Lillian Lee

Abstract: Gender bias is an increasingly important issue in sports journalism. In this work, we propose a language-model-based approach to quantify differences in questions posed to female vs. male athletes, and apply it to tennis post-match interviews. We find that journalists ask male players questions that are generally more focused on the game when compared with the questions they ask their female count… ▽ More Gender bias is an increasingly important issue in sports journalism. In this work, we propose a language-model-based approach to quantify differences in questions posed to female vs. male athletes, and apply it to tennis post-match interviews. We find that journalists ask male players questions that are generally more focused on the game when compared with the questions they ask their female counterparts. We also provide a fine-grained analysis of the extent to which the salience of this bias depends on various factors, such as question type, game outcome or player rank. △ Less

Submitted 13 July, 2016; originally announced July 2016.

Comments: Best paper award at the IJCAI workshop on NLP Meets Journalism; 5 pages, 2 figures; data and other info available at http://www.cs.cornell.edu/~liye/tennis.html

arXiv:1604.07407 [pdf, other]

Conversational Markers of Constructive Discussions

Authors: Vlad Niculae, Cristian Danescu-Niculescu-Mizil

Abstract: Group discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discus… ▽ More Group discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discussion, the sum becomes greater than the parts. However, this assumption is not irrefutable: anecdotal evidence of wasteful discussions abounds, and in our own experiments we find that over 30% of discussions are unproductive. We propose a framework for analyzing conversational dynamics in order to determine whether a given task-oriented discussion is worth having or not. We exploit conversational patterns reflecting the flow of ideas and the balance between the participants, as well as their linguistic choices. We apply this framework to conversations naturally occurring in an online collaborative world exploration game developed and deployed to support this research. Using this setting, we show that linguistic cues and conversational patterns extracted from the first 20 seconds of a team discussion are predictive of whether it will be a wasteful or a productive one. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Comments: To appear at NAACL-HLT 2016. 11pp, 5 fig. Data and other info available at http://vene.ro/constructive/

arXiv:1604.03114 [pdf, other]

Conversational flow in Oxford-style debates

Authors: Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil

Abstract: Public debates are a common platform for presenting and juxtaposing diverging views on important issues. In this work we propose a methodology for tracking how ideas flow between participants throughout a debate. We use this approach in a case study of Oxford-style debates---a competitive format where the winner is determined by audience votes---and show how the outcome of a debate depends on aspe… ▽ More Public debates are a common platform for presenting and juxtaposing diverging views on important issues. In this work we propose a methodology for tracking how ideas flow between participants throughout a debate. We use this approach in a case study of Oxford-style debates---a competitive format where the winner is determined by audience votes---and show how the outcome of a debate depends on aspects of conversational flow. In particular, we find that winners tend to make better use of a debate's interactive component than losers, by actively pursuing their opponents' points rather than promoting their own ideas over the course of the conversation. △ Less

Submitted 11 April, 2016; originally announced April 2016.

Comments: To appear at NAACL 2016. 5 pp, 1 fig. Data and other info available at http://www.cs.cornell.edu/~cristian/debates

arXiv:1602.01103 [pdf, other]

doi 10.1145/2872427.2883081

Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-faith Online Discussions

Authors: Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, Lillian Lee

Abstract: Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their own opinions and reasoning, invite others to contes… ▽ More Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their own opinions and reasoning, invite others to contest them, and acknowledge when the ensuing discussions change their original views. In this work, we study these interactions to understand the mechanisms behind persuasion. We find that persuasive arguments are characterized by interesting patterns of interaction dynamics, such as participant entry-order and degree of back-and-forth exchange. Furthermore, by comparing similar counterarguments to the same opinion, we show that language factors play an essential role. In particular, the interplay between the language of the opinion holder and that of the counterargument provides highly predictive cues of persuasiveness. Finally, since even in this favorable setting people may not be persuaded, we investigate the problem of determining whether someone's opinion is susceptible to being changed at all. For this more difficult task, we show that stylistic choices in how the opinion is expressed carry predictive power. △ Less

Submitted 6 February, 2016; v1 submitted 2 February, 2016; originally announced February 2016.

Comments: 12 pages, 10 figures, to appear in Proceedings of WWW 2016, data and more at https://chenhaot.com/pages/changemyview.html (v2 made a minor correction on submission rules in ChangeMyView.)

arXiv:1506.04744 [pdf, other]

Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

Authors: Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian Danescu-Niculescu-Mizil

Abstract: Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal.… ▽ More Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal. △ Less

Submitted 15 June, 2015; originally announced June 2015.

Comments: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal/

arXiv:1504.01383 [pdf, other]

QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns

Authors: Vlad Niculae, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

Abstract: Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent… ▽ More Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent of systematicity remains widely disputed. In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. We apply this framework to a massive dataset of news articles spanning the six years of Obama's presidency and all of his speeches, and reveal that a systematic pattern does indeed emerge from the outlet's quoting behavior. Moreover, we show that this pattern can be successfully exploited in an unsupervised prediction setting, to determine which new quotes an outlet will select to broadcast. By encoding bias patterns in a low-rank space we provide an analysis of the structure of political media coverage. This reveals a latent media bias space that aligns surprisingly well with political ideology and outlet type. A linguistic analysis exposes striking differences across these latent dimensions, showing how the different types of media outlets portray different realities even when reporting on the same events. For example, outlets mapped to the mainstream conservative side of the latent space focus on quotes that portray a presidential persona disproportionately characterized by negativity. △ Less

Submitted 6 April, 2015; originally announced April 2015.

Comments: To appear in the Proceedings of WWW 2015. 11pp, 10 fig. Interactive visualization, data, and other info available at http://snap.stanford.edu/quotus/

arXiv:1504.00680 [pdf, other]

Antisocial Behavior in Online Discussion Communities

Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

Abstract: User contributions in the form of posts, comments, and votes are essential to the success of online communities. However, allowing user participation also invites undesirable behavior such as trolling. In this paper, we characterize antisocial behavior in three large online discussion communities by analyzing users who were banned from these communities. We find that such users tend to concentrate… ▽ More User contributions in the form of posts, comments, and votes are essential to the success of online communities. However, allowing user participation also invites undesirable behavior such as trolling. In this paper, we characterize antisocial behavior in three large online discussion communities by analyzing users who were banned from these communities. We find that such users tend to concentrate their efforts in a small number of threads, are more likely to post irrelevantly, and are more successful at garnering responses from other users. Studying the evolution of these users from the moment they join a community up to when they get banned, we find that not only do they write worse than other users over time, but they also become increasingly less tolerated by the community. Further, we discover that antisocial behavior is exacerbated when community feedback is overly harsh. Our analysis also reveals distinct groups of users with different levels of antisocial behavior that can change over time. We use these insights to identify antisocial users early on, a task of high practical importance to community maintainers. △ Less

Submitted 16 May, 2016; v1 submitted 2 April, 2015; originally announced April 2015.

Comments: ICWSM 2015

arXiv:1405.3282 [pdf, other]

How to Ask for a Favor: A Case Study on the Success of Altruistic Requests

Authors: Tim Althoff, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky

Abstract: Requests are at the core of many social media systems such as question & answer sites and online philanthropy communities. While the success of such requests is critical to the success of the community, the factors that lead community members to satisfy a request are largely unknown. Success of a request depends on factors like who is asking, how they are asking, when are they asking, and most cri… ▽ More Requests are at the core of many social media systems such as question & answer sites and online philanthropy communities. While the success of such requests is critical to the success of the community, the factors that lead community members to satisfy a request are largely unknown. Success of a request depends on factors like who is asking, how they are asking, when are they asking, and most critically what is being requested, ranging from small favors to substantial monetary donations. We present a case study of altruistic requests in an online community where all requests ask for the very same contribution and do not offer anything tangible in return, allowing us to disentangle what is requested from textual and social factors. Drawing from social psychology literature, we extract high-level social features from text that operationalize social relations between recipient and donor and demonstrate that these extracted relations are predictive of success. More specifically, we find that clearly communicating need through the narrative is essential and that that linguistic indications of gratitude, evidentiality, and generalized reciprocity, as well as high status of the asker further increase the likelihood of success. Building on this understanding, we develop a model that can predict the success of unseen requests, significantly improving over several baselines. We link these findings to research in psychology on hel** behavior, providing a basis for further analysis of success in social media systems. △ Less

Submitted 13 May, 2014; originally announced May 2014.

Comments: To appear at ICWSM 2014. 10pp, 3 fig. Data and other info available at http://www.mpi-sws.org/~cristian/How_to_Ask_for_a_Favor.html

ACM Class: I.2.7; J.4

arXiv:1405.1429 [pdf, other]

How Community Feedback Shapes User Behavior

Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

Abstract: Social media systems rely on user feedback and rating mechanisms for personalization, ranking, and content filtering. However, when users evaluate content contributed by fellow users (e.g., by liking a post or voting on a comment), these evaluations create complex social feedback effects. This paper investigates how ratings on a piece of content affect its author's future behavior. By studying fou… ▽ More Social media systems rely on user feedback and rating mechanisms for personalization, ranking, and content filtering. However, when users evaluate content contributed by fellow users (e.g., by liking a post or voting on a comment), these evaluations create complex social feedback effects. This paper investigates how ratings on a piece of content affect its author's future behavior. By studying four large comment-based news communities, we find that negative feedback leads to significant behavioral changes that are detrimental to the community. Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such. Moreover, these authors are more likely to subsequently evaluate their fellow users negatively, percolating these effects through the community. In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts. Interestingly, the authors that receive no feedback are most likely to leave a community. Furthermore, a structural analysis of the voter network reveals that evaluations polarize the community the most when positive and negative votes are equally split. △ Less

Submitted 6 May, 2014; originally announced May 2014.

Comments: ICWSM 2014

ACM Class: H.2.8

arXiv:1306.6078 [pdf, other]

A Computational Approach to Politeness with Application to Social Factors

Authors: Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts

Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features opera… ▽ More We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between politeness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community. △ Less

Submitted 25 June, 2013; originally announced June 2013.

Comments: To appear at ACL 2013. 10pp, 3 fig. Data and other info available at http://www.mpi-sws.org/~cristian/Politeness.html

ACM Class: I.2.7

arXiv:1304.4602 [pdf, other]

Characterizing and curating conversation threads: Expansion, focus, volume, re-entry

Authors: Lars Backstrom, Jon Kleinberg, Lillian Lee, Cristian Danescu-Niculescu-Mizil

Abstract: Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations --- the dev… ▽ More Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations --- the development of automated methods to select a subset of discussions to present to a user. Here we consider two key sub-problems inherent in conversational curation: length prediction --- predicting the number of comments a discussion thread will receive --- and the novel task of re-entry prediction --- predicting whether a user who has participated in a thread will later contribute another comment to it. The first of these sub-problems arises in estimating how interesting a thread is, in the sense of generating a lot of conversation; the second can help determine whether users should be kept notified of the progress of a thread to which they have already contributed. We develop and evaluate a range of approaches for these tasks, based on an analysis of the network structure and arrival pattern among the participants, as well as a novel dichotomy in the structure of long threads. We find that for both tasks, learning-based approaches using these sources of information yield improvements for all the performance metrics we used. △ Less

Submitted 16 April, 2013; originally announced April 2013.

ACM Class: H.2.8

Journal ref: Proceedings of WSDM 2013, pp. 13-22

arXiv:1206.1066 [pdf, ps, other]

Hedge detection as a lens on framing in the GMO debates: A position paper

Authors: Eunsol Choi, Chenhao Tan, Lillian Lee, Cristian Danescu-Niculescu-Mizil, Jennifer Spindel

Abstract: Understanding the ways in which participants in public discussions frame their arguments is important in understanding how public opinion is formed. In this paper, we adopt the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, we propose the following specific, interesting and, we believe, relatively accessi… ▽ More Understanding the ways in which participants in public discussions frame their arguments is important in understanding how public opinion is formed. In this paper, we adopt the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, we propose the following specific, interesting and, we believe, relatively accessible question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a "scientific" tone? Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. We propose a detailed approach to studying whether hedge detection can be used to understanding scientific framing in the GMO debates, and provide corpora to facilitate this study. Some of our preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature. We hope that our initial work and data will encourage others to pursue this promising line of inquiry. △ Less

Submitted 5 June, 2012; originally announced June 2012.

Comments: 10 pp; to appear in Proceedings of the ACL Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, 2012. Data available at https://confluence.cornell.edu/display/llresearch/HedgingFramingGMOs

arXiv:1203.6360 [pdf, other]

You had me at hello: How phrasing affects memorability

Authors: Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian Lee

Abstract: Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information… ▽ More Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns. Another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts --- that is, more portable. We also show how the concept of "memorable language" can be extended across domains. △ Less

Submitted 30 April, 2012; v1 submitted 28 March, 2012; originally announced March 2012.

Comments: Final version of paper to appear at ACL 2012. 10pp, 1 fig. Data, demo memorability test and other info available at http://www.cs.cornell.edu/~cristian/memorability.html

ACM Class: I.2.7; J.4

arXiv:1112.3670 [pdf, other]

Echoes of power: Language effects and power differences in social interaction

Authors: Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, Jon Kleinberg

Abstract: Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language --- either spoken or written --- and one could reasonably suppose that signals manifested in… ▽ More Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language --- either spoken or written --- and one could reasonably suppose that signals manifested in language might also provide information about roles, status, and other aspects of the group's dynamics. To date, however, finding such domain-independent language-based signals has been a challenge. Here, we show that in group discussions power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. Starting from this observation, we propose an analysis framework based on linguistic coordination that can be used to shed light on power relationships and that works consistently across multiple types of power --- including a more "static" form of power based on status differences, and a more "situational" form of power in which one individual experiences a type of dependence on another. Using this framework, we study how conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court. △ Less

Submitted 12 April, 2012; v1 submitted 15 December, 2011; originally announced December 2011.

Comments: v3 is the camera-ready for the Proceedings of WWW 2012. Changes from v2 include additional technical analysis. See http://www.cs.cornell.edu/~cristian/www2012 for data and more info

arXiv:1106.3077 [pdf, other]

Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs

Authors: Cristian Danescu-Niculescu-Mizil, Lillian Lee

Abstract: Conversational participants tend to immediately and unconsciously adapt to each other's language styles: a speaker will even adjust the number of articles and other function words in their next utterance in response to the number in their partner's immediately preceding utterance. This striking level of coordination is thought to have arisen as a way to achieve social goals, such as gaining approv… ▽ More Conversational participants tend to immediately and unconsciously adapt to each other's language styles: a speaker will even adjust the number of articles and other function words in their next utterance in response to the number in their partner's immediately preceding utterance. This striking level of coordination is thought to have arisen as a way to achieve social goals, such as gaining approval or emphasizing difference in status. But has the adaptation mechanism become so deeply embedded in the language-generation process as to become a reflex? We argue that fictional dialogs offer a way to study this question, since authors create the conversations but don't receive the social benefits (rather, the imagined characters do). Indeed, we find significant coordination across many families of function words in our large movie-script corpus. We also report suggestive preliminary findings on the effects of gender and other features; e.g., surprisingly, for articles, on average, characters adapt more to females than to males. △ Less

Submitted 15 June, 2011; originally announced June 2011.

Comments: data available at http://www.cs.cornell.edu/~cristian/movies

ACM Class: I.2.7; J.4

Journal ref: Proceedings of the ACL workshop on Cognitive Modeling and Computational Linguistics, pp 76-87, 2011

arXiv:1105.0673 [pdf, other]

doi 10.1145/1963405.1963509

Mark My Words! Linguistic Style Accommodation in Social Media

Authors: Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan Dumais

Abstract: The psycholinguistic theory of communication accommodation accounts for the general observation that participants in conversations tend to converge to one another's communicative behavior: they coordinate in a variety of dimensions including choice of words, syntax, utterance length, pitch and gestures. In its almost forty years of existence, this theory has been empirically supported exclusively… ▽ More The psycholinguistic theory of communication accommodation accounts for the general observation that participants in conversations tend to converge to one another's communicative behavior: they coordinate in a variety of dimensions including choice of words, syntax, utterance length, pitch and gestures. In its almost forty years of existence, this theory has been empirically supported exclusively through small-scale or controlled laboratory studies. Here we address this phenomenon in the context of Twitter conversations. Undoubtedly, this setting is unlike any other in which accommodation was observed and, thus, challenging to the theory. Its novelty comes not only from its size, but also from the non real-time nature of conversations, from the 140 character length restriction, from the wide variety of social relation types, and from a design that was initially not geared towards conversation at all. Given such constraints, it is not clear a priori whether accommodation is robust enough to occur given the constraints of this new environment. To investigate this, we develop a probabilistic framework that can model accommodation and measure its effects. We apply it to a large Twitter conversational dataset specifically developed for this task. This is the first time the hypothesis of linguistic style accommodation has been examined (and verified) in a large scale, real world setting. Furthermore, when investigating concepts such as stylistic influence and symmetry of accommodation, we discover a complexity of the phenomenon which was never observed before. We also explore the potential relation between stylistic influence and network features commonly associated with social status. △ Less

Submitted 3 May, 2011; originally announced May 2011.

Comments: Talk slides available at http://www.cs.cornell.edu/~cristian/www2011

Journal ref: Proceedings of WWW, pp. 141--150, 2009

arXiv:1008.3169 [pdf, other]

Don't 'have a clue'? Unsupervised co-learning of downward-entailing operators

Authors: Cristian Danescu-Niculescu-Mizil, Lillian Lee

Abstract: Researchers in textual entailment have begun to consider inferences involving 'downward-entailing operators', an interesting and important class of lexical items that change the way inferences are made. Recent work proposed a method for learning English downward-entailing operators that requires access to a high-quality collection of 'negative polarity items' (NPIs). However, English is one of the… ▽ More Researchers in textual entailment have begun to consider inferences involving 'downward-entailing operators', an interesting and important class of lexical items that change the way inferences are made. Recent work proposed a method for learning English downward-entailing operators that requires access to a high-quality collection of 'negative polarity items' (NPIs). However, English is one of the very few languages for which such a list exists. We propose the first approach that can be applied to the many languages for which there is no pre-existing high-precision database of NPIs. As a case study, we apply our method to Romanian and show that our method yields good results. Also, we perform a cross-linguistic analysis that suggests interesting connections to some findings in linguistic typology. △ Less

Submitted 26 November, 2010; v1 submitted 18 August, 2010; originally announced August 2010.

Comments: pp 1-6 are identical to the ACL 2010 published version; pp. 7-8 are the "externally-available appendices". Revision contains an additional appendix correcting the origin of the term "pseudo-polarity item"

ACM Class: I.2.7

Journal ref: Proceedings of the ACL Short Papers, pp. 247-252, 2010

arXiv:1008.1986 [pdf, ps, other]

For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia

Authors: Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil, Lillian Lee

Abstract: We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to… ▽ More We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list. △ Less

Submitted 11 August, 2010; originally announced August 2010.

Comments: 4 pp; data available at http://www.cs.cornell.edu/home/llee/data/simple/

ACM Class: I.2.7

Journal ref: Proceedings of the NAACL, pp. 365-368, 2010. Short paper

arXiv:0906.3741 [pdf, other]

How opinions are received by online communities: A case study on Amazon.com helpfulness votes

Authors: Cristian Danescu-Niculescu-Mizil, Gueorgi Kossinets, Jon Kleinberg, Lillian Lee

Abstract: There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, including market research and political campaigns. Re… ▽ More There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, including market research and political campaigns. Reasoning about the evaluation of an opinion is fundamentally different from reasoning about the opinion itself: rather than asking, "What did Y think of X?", we are asking, "What did Z think of Y's opinion of X?" Here we develop a framework for analyzing and modeling opinion evaluation, using a large-scale collection of Amazon book reviews as a dataset. We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product. As part of our approach, we develop novel methods that take advantage of the phenomenon of review "plagiarism" to control for the effects of text in opinion evaluation, and we provide a simple and natural mathematical model consistent with our findings. Our analysis also allows us to distinguish among the predictions of competing theories from sociology and social psychology, and to discover unexpected differences in the collective opinion-evaluation behavior of user populations from different countries. △ Less

Submitted 20 June, 2009; originally announced June 2009.

Journal ref: Proceedings of WWW, pp. 141--150, 2009

arXiv:0906.2415 [pdf, ps, other]

Without a 'doubt'? Unsupervised discovery of downward-entailing operators

Authors: Cristian Danescu-Niculescu-Mizil, Lillian Lee, Richard Ducott

Abstract: An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement 'We know the epidemic spread quickly' does not entail 'We know the epidemic spread quickly via fleas', but 'We doubt the epidemic spread quickly' entails 'We doubt the epidemic spread… ▽ More An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement 'We know the epidemic spread quickly' does not entail 'We know the epidemic spread quickly via fleas', but 'We doubt the epidemic spread quickly' entails 'We doubt the epidemic spread quickly via fleas'. Here, we present the first algorithm for the challenging lexical-semantics problem of learning linguistic constructions that, like 'doubt', are downward entailing (DE). Our algorithm is unsupervised, resource-lean, and effective, accurately recovering many DE operators that are missing from the hand-constructed lists that textual-inference systems currently use. △ Less

Submitted 12 June, 2009; originally announced June 2009.

Comments: System output available at http://www.cs.cornell.edu/~cristian/Without_a_doubt_-_Data.html

Journal ref: Proceedings of NAACL HLT, pp. 137--145, 2009

Showing 1–43 of 43 results for author: Danescu-Niculescu-Mizil, C