Skip to main content

Showing 1–16 of 16 results for author: Giorgi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14462  [pdf, other

    cs.CL

    Explicit and Implicit Large Language Model Personas Generate Opinions but Fail to Replicate Deeper Perceptions and Biases

    Authors: Salvatore Giorgi, Tingting Liu, Ankit Aich, Kelsey Isman, Garrick Sherman, Zachary Fried, João Sedoc, Lyle H. Ungar, Brenda Curtis

    Abstract: Large language models (LLMs) are increasingly being used in human-centered social scientific tasks, such as data annotation, synthetic data creation, and engaging in dialog. However, these tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences. Thus, employing LLMs (which do not have such human factors) in these tasks may re… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.12679  [pdf, other

    cs.CL

    Vernacular? I Barely Know Her: Challenges with Style Control and Stereoty**

    Authors: Ankit Aich, Tingting Liu, Salvatore Giorgi, Kelsey Isman, Lyle Ungar, Brenda Curtis

    Abstract: Large Language Models (LLMs) are increasingly being used in educational and learning applications. Research has demonstrated that controlling for style, to fit the needs of the learner, fosters increased understanding, promotes inclusion, and helps with knowledge distillation. To understand the capabilities and limitations of contemporary LLMs in style control, we evaluated five state-of-the-art m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.11622  [pdf, other

    cs.CL

    Building Knowledge-Guided Lexica to Model Cultural Variation

    Authors: Shreya Havaldar, Salvatore Giorgi, Sunny Rai, Thomas Talhelm, Sharath Chandra Guntuku, Lyle Ungar

    Abstract: Cultural variation exists between nations (e.g., the United States vs. China), but also within regions (e.g., California vs. Texas, Los Angeles vs. San Francisco). Measuring this regional cultural variation can illuminate how and why people think and behave differently. Historically, it has been difficult to computationally model cultural variation due to a lack of training data and scalability co… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at NAACL 2024

  4. arXiv:2402.01980  [pdf, other

    cs.CL

    SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks

    Authors: Gourab Dey, Adithya V Ganesan, Yash Kumar Lal, Manal Shah, Shreyashee Sinha, Matthew Matero, Salvatore Giorgi, Vivek Kulkarni, H. Andrew Schwartz

    Abstract: Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about… ▽ More

    Submitted 14 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Short paper accepted to EACL 2024. 4 pgs, 2 tables

  5. arXiv:2308.15352  [pdf

    cs.CL cs.SI physics.soc-ph

    Historical patterns of rice farming explain modern-day language use in China and Japan more than modernization and urbanization

    Authors: Sharath Chandra Guntuku, Thomas Talhelm, Garrick Sherman, Angel Fan, Salvatore Giorgi, Liuqing Wei, Lyle H. Ungar

    Abstract: We used natural language processing to analyze a billion words to study cultural differences on Weibo, one of China's largest social media platforms. We compared predictions from two common explanations about cultural differences in China (economic development and urban-rural differences) against the less-obvious legacy of rice versus wheat farming. Rice farmers had to coordinate shared irrigation… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Includes Supplemental Materials

  6. arXiv:2305.14757  [pdf, other

    cs.CL

    Psychological Metrics for Dialog System Evaluation

    Authors: Salvatore Giorgi, Shreya Havaldar, Farhan Ahmed, Zuhaib Akhtar, Shalaka Vaidya, Gary Pan, Lyle H. Ungar, H. Andrew Schwartz, Joao Sedoc

    Abstract: We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens in which conversational agents express a diversity of both states (e.g., emotion) and traits (e.g., personality), just as people do. We present five interpretable metrics from established psychology that are fundamental to human communication and relationships: emotional entropy, linguistic style and e… ▽ More

    Submitted 15 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  7. arXiv:2302.12952  [pdf

    cs.CL

    Robust language-based mental health assessments in time and space through social media

    Authors: Siddharth Mangalik, Johannes C. Eichstaedt, Salvatore Giorgi, Jihu Mun, Farhan Ahmed, Gilvir Gill, Adithya V. Ganesan, Shashanka Subrahmanya, Nikita Soni, Sean A. P. Clouston, H. Andrew Schwartz

    Abstract: Compared to physical health, population mental health measurement in the U.S. is very coarse-grained. Currently, in the largest population surveys, such as those carried out by the Centers for Disease Control or Gallup, mental health is only broadly captured through "mentally unhealthy days" or "sadness", and limited to relatively infrequent state or metropolitan estimates. Through the large scale… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 9 pages, 7 figures, pre-print

    ACM Class: J.4; I.2.7

  8. arXiv:2302.02064  [pdf, other

    cs.CL

    Lived Experience Matters: Automatic Detection of Stigma on Social Media Toward People Who Use Substances

    Authors: Salvatore Giorgi, Douglas Bellew, Daniel Roy Sadek Habib, Garrick Sherman, Joao Sedoc, Chase Smitterberg, Amanda Devoto, McKenzie Himelein-Wachowiak, Brenda Curtis

    Abstract: Stigma toward people who use substances (PWUS) is a leading barrier to seeking treatment.Further, those in treatment are more likely to drop out if they experience higher levels of stigmatization. While related concepts of hate speech and toxicity, including those targeted toward vulnerable populations, have been the focus of automatic content moderation research, stigma and, in particular, people… ▽ More

    Submitted 16 July, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Accepted for publication the 2024 International AAAI Conference on Web and Social Media (ICWSM)

  9. Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community

    Authors: Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin

    Abstract: In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and,… ▽ More

    Submitted 15 March, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

    Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 2023, 17(1), 233-244

  10. arXiv:2202.01802  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Different Affordances on Facebook and SMS Text Messaging Do Not Impede Generalization of Language-Based Predictive Models

    Authors: Tingting Liu, Salvatore Giorgi, Xiangyu Tao, Sharath Chandra Guntuku, Douglas Bellew, Brenda Curtis, Lyle Ungar

    Abstract: Adaptive mobile device-based health interventions often use machine learning models trained on non-mobile device data, such as social media text, due to the difficulty and high expense of collecting large text message (SMS) data. Therefore, understanding the differences and generalization of models between these platforms is crucial for proper deployment. We examined the psycho-linguistic differen… ▽ More

    Submitted 23 May, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

  11. arXiv:2201.08451  [pdf, other

    cs.CL cs.AI cs.LG

    Regional Negative Bias in Word Embeddings Predicts Racial Animus--but only via Name Frequency

    Authors: Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt

    Abstract: The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-black WEAT estimates from ge… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 5 pages, 1 figure

  12. arXiv:2009.00596  [pdf, other

    cs.SI cs.CL

    Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2021

    Authors: Salvatore Giorgi, Sharath Chandra Guntuku, McKenzie Himelein-Wachowiak, Amy Kwarteng, Sy Hwang, Muhammad Rahman, Brenda Curtis

    Abstract: Black Lives Matter (BLM) is a decentralized social movement protesting violence against Black individuals and communities, with a focus on police brutality. The movement gained significant attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd in 2020. The #BlackLivesMatter social media hashtag has come to represent the grassroots movement, with similar hashtags counte… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: Published at the 16th International AAAI Conference on Web and Social Media (ICWSM) 2022

  13. arXiv:2004.06303  [pdf, other

    cs.CL cs.CY cs.SI

    Quantifying Community Characteristics of Maternal Mortality Using Social Media

    Authors: Rediet Abebe, Salvatore Giorgi, Anna Tedijanto, Anneke Buffone, H. Andrew Schwartz

    Abstract: While most mortality rates have decreased in the US, maternal mortality has increased and is among the highest of any OECD nation. Extensive public health research is ongoing to better understand the characteristics of communities with relatively high or low rates. In this work, we explore the role that social media language can play in providing insights into such community characteristics. Analy… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: In Proceedings of The Web Conference 2020(WWW '20)

  14. arXiv:1911.03855  [pdf, other

    cs.SI cs.CL cs.CY

    Correcting Sociodemographic Selection Biases for Population Prediction from Social Media

    Authors: Salvatore Giorgi, Veronica Lynn, Keshav Gupta, Farhan Ahmed, Sandra Matz, Lyle Ungar, H. Andrew Schwartz

    Abstract: Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population -- a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where observations are reweighted according to how under- or… ▽ More

    Submitted 7 June, 2022; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: Published at the 16th International AAAI Conference on Web and Social Media (ICWSM) 2022

  15. arXiv:1808.09600  [pdf, ps, other

    cs.SI cs.CY

    The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions

    Authors: Salvatore Giorgi, Daniel Preotiuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle H. Ungar, H. Andrew Schwartz

    Abstract: Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: To appear in the proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  16. arXiv:1808.09479  [pdf, other

    cs.CL

    Residualized Factor Adaptation for Community Social Media Prediction Tasks

    Authors: Mohammadzaman Zamani, H. Andrew Schwartz, Veronica E. Lynn, Salvatore Giorgi, Niranjan Balasubramanian

    Abstract: Predictive models over social media language have shown promise in capturing community outcomes, but approaches thus far largely neglect the socio-demographic context (e.g. age, education rates, race) of the community from which the language originates. For example, it may be inaccurate to assume people in Mobile, Alabama, where the population is relatively older, will use words the same way as th… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3560-3569, 2018