Examining the Feasibility of Off-the-Shelf Algorithms for Masking Directly Identifiable Information in Social Media Data
Authors:
Rachel Dorn,
Alicia L. Nobles,
Masoud Rouhizadeh,
Mark Dredze
Abstract:
The identification and removal/replacement of protected information from social media data is an understudied problem, despite being desirable from an ethical and legal perspective. This paper identifies types of potentially directly identifiable information (inspired by protected health information in clinical texts) contained in tweets that may be readily removed using off-the-shelf algorithms,…
▽ More
The identification and removal/replacement of protected information from social media data is an understudied problem, despite being desirable from an ethical and legal perspective. This paper identifies types of potentially directly identifiable information (inspired by protected health information in clinical texts) contained in tweets that may be readily removed using off-the-shelf algorithms, introduces an English dataset of tweets annotated for identifiable information, and compiles these off-the-shelf algorithms into a tool (Nightjar) to evaluate the feasibility of using Nightjar to remove directly identifiable information from the tweets. Nightjar as well as the annotated data can be retrieved from https://bitbucket.org/mdredze/nightjar.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement
Authors:
Aaron Mueller,
Zach Wood-Doughty,
Silvio Amir,
Mark Dredze,
Alicia L. Nobles
Abstract:
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements…
▽ More
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements. Through an analysis of over 600,000 tweets from over 256,000 unique users, we examine online #MeToo conversations across gender and racial/ethnic identities and the topics that each demographic emphasized. We found that tweets authored by white women were overrepresented in the movement compared to other demographics, aligning with criticism of unequal representation. We found that intersected identities contributed differing narratives to frame the movement, co-opted the movement to raise visibility in parallel ongoing movements, employed the same hashtags both critically and supportively, and revived and created new hashtags in response to pivotal moments. Notably, tweets authored by black women often expressed emotional support and were critical about differential treatment in the justice system and by police. In comparison, tweets authored by white women and men often highlighted sexual harassment and violence by public figures and weaved in more general political discussions. We discuss the implications of work for digital activism research and design including suggestions to raise visibility by those who were under-represented in this hashtag activism movement. Content warning: this article discusses issues of sexual harassment and violence.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.