Skip to main content

Showing 1–26 of 26 results for author: Foulds, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13925  [pdf, other

    cs.CL cs.AI

    GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models

    Authors: Tao Zhang, Ziqian Zeng, Yuxiang Xiao, Hui** Zhuang, Cen Chen, James Foulds, Shimei Pan

    Abstract: Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2403.01193  [pdf, other

    cs.CL cs.AI

    RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

    Authors: Philip Feldman, James R. Foulds, Shimei Pan

    Abstract: Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation… ▽ More

    Submitted 12 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: 7 Pages, 1 Figure, 1 Table

    ACM Class: H.3.3; I.2.7

  3. arXiv:2402.01663  [pdf, other

    cs.CY cs.CR cs.LG

    Killer Apps: Low-Speed, Large-Scale AI Weapons

    Authors: Philip Feldman, Aaron Dant, James R. Foulds

    Abstract: The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid de… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 January, 2024; originally announced February 2024.

    Comments: 10 pages with 10 pages of appendices. 3 Figures, 2 code listings

    ACM Class: I.2.7; H.4.3; J.4

    Journal ref: Workshops at the International Conference on Intelligent User Interfaces (IUI) 2024

  4. arXiv:2311.07014  [pdf, other

    cs.CL cs.SD eess.AS

    Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings

    Authors: Fatema Hasan, Yulong Li, James Foulds, Shimei Pan, Bishwaranjan Bhattacharjee

    Abstract: Speech data has rich acoustic and paralinguistic information with important cues for understanding a speaker's tone, emotion, and intent, yet traditional large language models such as BERT do not incorporate this information. There has been an increased interest in multi-modal language models leveraging audio and/or visual information and text. However, current multi-modal language models require… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 11 pages

  5. arXiv:2306.06085  [pdf, other

    cs.CL cs.AI

    Trap** LLM Hallucinations Using Tagged Context Prompts

    Authors: Philip Feldman, James R. Foulds, Shimei Pan

    Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have led to highly sophisticated conversation agents. However, these models suffer from "hallucinations," where the model generates false or fabricated information. Addressing this challenge is crucial, particularly with AI-driven platforms being adopted across various sectors. In this paper, we propose a novel method to recognize a… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 13 pages, 3 Figures, 2 Tables

    ACM Class: I.2.7; K.4.2

  6. arXiv:2301.05198  [pdf, other

    cs.HC

    The Keyword Explorer Suite: A Toolkit for Understanding Online Populations

    Authors: Philip Feldman, Shimei Pan, James R. Foulds

    Abstract: We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI's GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3… ▽ More

    Submitted 13 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: 6 pages, 4 figures

    ACM Class: H.5.2; H.1.2; I.2.7

  7. arXiv:2209.07044  [pdf, other

    cs.LG cs.CY

    Fair Inference for Discrete Latent Variable Models

    Authors: Rashidul Islam, Shimei Pan, James R. Foulds

    Abstract: It is now well understood that machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. Traditional algorithmic fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  8. arXiv:2205.02987  [pdf, ps, other

    cs.HC cs.AI cs.CY

    Tell Me Something That Will Help Me Trust You: A Survey of Trust Calibration in Human-Agent Interaction

    Authors: George J. Cancro, Shimei Pan, James Foulds

    Abstract: When a human receives a prediction or recommended course of action from an intelligent agent, what additional information, beyond the prediction or recommendation itself, does the human require from the agent to decide whether to trust or reject the prediction or recommendation? In this paper we survey literature in the area of trust between a single human supervisor and a single agent subordinate… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 8 pages, 0 figures

    ACM Class: H.5.2; H.1.2; I.2.9

  9. arXiv:2204.07483  [pdf, other

    cs.CL cs.CY

    Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models

    Authors: Philip Feldman, Aaron Dant, James R. Foulds, Shemei Pan

    Abstract: Text analysis of social media for sentiment, topic analysis, and other analysis depends initially on the selection of keywords and phrases that will be used to create the research corpora. However, keywords that researchers choose may occur infrequently, leading to errors that arise from using small samples. In this paper, we use the capacity for memorization, interpolation, and extrapolation of T… ▽ More

    Submitted 19 April, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: 10 pages, 9 figures, 7 tables

    ACM Class: K.4.m

  10. arXiv:2106.07112  [pdf, other

    cs.CY cs.AI cs.LG

    User Acceptance of Gender Stereotypes in Automated Career Recommendations

    Authors: Clarice Wang, Kathryn Wang, Andrew Bian, Rashidul Islam, Kamrun Naher Keya, James Foulds, Shimei Pan

    Abstract: Currently, there is a surge of interest in fair Artificial Intelligence (AI) and Machine Learning (ML) research which aims to mitigate discriminatory bias in AI algorithms, e.g. along lines of gender, age, and race. While most research in this domain focuses on develo** fair AI algorithms, in this work, we show that a fair AI algorithm on its own may be insufficient to achieve its intended resul… ▽ More

    Submitted 28 July, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

  11. arXiv:2105.07996  [pdf, ps, other

    cs.AI

    Learning User Embeddings from Temporal Social Media Data: A Survey

    Authors: Fatema Hasan, Kevin S. Xu, James R. Foulds, Shimei Pan

    Abstract: User-generated data on social media contain rich information about who we are, what we like and how we make decisions. In this paper, we survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user. The learned user embeddings can later be used to support different downstream user analysis tasks… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  12. arXiv:2104.10259  [pdf, other

    cs.CL cs.CY

    Analyzing COVID-19 Tweets with Transformer-based Language Models

    Authors: Philip Feldman, Sim Tiwari, Charissa S. L. Cheah, James R. Foulds, Shimei Pan

    Abstract: This paper describes a method for using Transformer-based Language Models (TLMs) to understand public opinion from social media posts. In this approach, we train a set of GPT models on several COVID-19 tweet corpora that reflect populations of users with distinctive views. We then use prompt-based queries to probe these models to reveal insights into the biases and opinions of the users. We demons… ▽ More

    Submitted 5 May, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Six pages, six tables, four figures

    ACM Class: J.4; I.2.7

  13. arXiv:2104.08769  [pdf, other

    cs.AI

    Fair Representation Learning for Heterogeneous Information Networks

    Authors: Ziqian Zeng, Rashidul Islam, Kamrun Naher Keya, James Foulds, Yangqiu Song, Shimei Pan

    Abstract: Recently, much attention has been paid to the societal impact of AI, especially concerns regarding its fairness. A growing body of research has identified unfair AI systems and proposed methods to debias them, yet many challenges remain. Representation learning for Heterogeneous Information Networks (HINs), a fundamental building block used in complex network mining, has socially consequential app… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted at ICWSM 2021

  14. arXiv:2010.06820  [pdf, other

    cs.LG cs.AI cs.CY

    Equitable Allocation of Healthcare Resources with Fair Cox Models

    Authors: Kamrun Naher Keya, Rashidul Islam, Shimei Pan, Ian Stockwell, James R. Foulds

    Abstract: Healthcare programs such as Medicaid provide crucial services to vulnerable populations, but due to limited resources, many of the individuals who need these services the most languish on waiting lists. Survival models, e.g. the Cox proportional hazards model, can potentially improve this situation by predicting individuals' levels of need, which can then be used to prioritize the waiting lists. P… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: AAAI Fall Symposium on AI in Government and Public Sector (AAAI FSS-20), 2020

  15. arXiv:2010.04609  [pdf, other

    cs.LG cs.CL cs.IR

    Causal Feature Selection with Dimension Reduction for Interpretable Text Classification

    Authors: Guohou Shan, James Foulds, Shimei Pan

    Abstract: Text features that are correlated with class labels, but do not directly cause them, are sometimesuseful for prediction, but they may not be insightful. As an alternative to traditional correlation-basedfeature selection, causal inference could reveal more principled, meaningful relationships betweentext features and labels. To help researchers gain insight into text data, e.g. for social sciencea… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 11 pages, 3 pages

    ACM Class: I.2.7

  16. arXiv:2009.08955  [pdf, other

    cs.IR cs.LG stat.ML

    Neural Fair Collaborative Filtering

    Authors: Rashidul Islam, Kamrun Naher Keya, Ziqian Zeng, Shimei Pan, James Foulds

    Abstract: A growing proportion of human interactions are digitized on social media platforms and subjected to algorithmic decision-making, and it has become increasingly important to ensure fair treatment from these algorithms. In this work, we investigate gender bias in collaborative-filtering recommender systems trained on social media data. We develop neural fair collaborative filtering (NFCF), a practic… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  17. arXiv:1909.04702  [pdf, other

    cs.CL cs.IR cs.LG

    Neural Embedding Allocation: Distributed Representations of Topic Models

    Authors: Kamrun Naher Keya, Yannis Papanikolaou, James R. Foulds

    Abstract: Word embedding models such as the skip-gram learn vector representations of words' semantic relationships, and document embedding models learn similar representations for documents. On the other hand, topic models provide latent representations of the documents' topical themes. To get the benefits of these representations simultaneously, we propose a unifying algorithm, called neural embedding all… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  18. arXiv:1901.07469  [pdf, other

    cs.LG cs.AI stat.ML

    Estimating Buildings' Parameters over Time Including Prior Knowledge

    Authors: Nilavra Pathak, James Foulds, Nirmalya Roy, Nilanjan Banerjee, Ryan Robucci

    Abstract: Modeling buildings' heat dynamics is a complex process which depends on various factors including weather, building thermal capacity, insulation preservation, and residents' behavior. Gray-box models offer a causal inference of those dynamics expressed in few parameters specific to built environments. These parameters can provide compelling insights into the characteristics of building artifacts a… ▽ More

    Submitted 18 February, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: 11 pages with reference

  19. arXiv:1811.07255  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Bayesian Modeling of Intersectional Fairness: The Variance of Bias

    Authors: James Foulds, Rashidul Islam, Kamrun Keya, Shimei Pan

    Abstract: Intersectionality is a framework that analyzes how interlocking systems of power and oppression affect individuals along overlap** dimensions including race, gender, sexual orientation, class, and disability. Intersectionality theory therefore implies it is important that fairness in artificial intelligence systems be protected with regard to multi-dimensional protected attributes. However, the… ▽ More

    Submitted 10 September, 2019; v1 submitted 17 November, 2018; originally announced November 2018.

  20. arXiv:1807.08362  [pdf, other

    cs.LG cs.CY stat.ML

    An Intersectional Definition of Fairness

    Authors: James Foulds, Rashidul Islam, Kamrun Naher Keya, Shimei Pan

    Abstract: We propose definitions of fairness in machine learning and artificial intelligence systems that are informed by the framework of intersectionality, a critical lens arising from the Humanities literature which analyzes how interlocking systems of power and oppression affect individuals along overlap** dimensions including gender, race, sexual orientation, class, and disability. We show that our c… ▽ More

    Submitted 10 September, 2019; v1 submitted 22 July, 2018; originally announced July 2018.

  21. arXiv:1705.07368  [pdf, other

    cs.CL cs.AI cs.LG

    Mixed Membership Word Embeddings for Computational Social Science

    Authors: James Foulds

    Abstract: Word embeddings improve the performance of NLP systems by revealing the hidden structural relationships between words. Despite their success in many applications, word embeddings have seen very little use in computational social science NLP tasks, presumably due to their reliance on big data, and to a lack of interpretability. I propose a probabilistic model-based word embedding method which can r… ▽ More

    Submitted 19 February, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

  22. arXiv:1611.00340  [pdf, other

    stat.ML cs.CR

    Variational Bayes In Private Settings (VIPS)

    Authors: Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling

    Abstract: Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilist… ▽ More

    Submitted 3 December, 2018; v1 submitted 1 November, 2016; originally announced November 2016.

    Comments: The previous version of this paper had an error in the composition method we used. This version fixed that error

  23. arXiv:1609.04120  [pdf, other

    stat.ML cs.CR

    Private Topic Modeling

    Authors: Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling

    Abstract: We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA). The iterative nature of stochastic variational inference presents challenges: multiple iterations are required to obtain accurate posterior distributions, yet each iteration increases the amount of noise that must be added to achieve a reasonable degree of privacy. We propose a practical algorith… ▽ More

    Submitted 3 December, 2018; v1 submitted 13 September, 2016; originally announced September 2016.

  24. arXiv:1605.06995  [pdf, other

    cs.LG cs.AI cs.CR stat.ME stat.ML

    DP-EM: Differentially Private Expectation Maximization

    Authors: Mijung Park, Jimmy Foulds, Kamalika Chaudhuri, Max Welling

    Abstract: The iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed. We propose a practical private EM algorithm that overcomes this challenge using two innovations: (1) a novel moment perturbation formulation for differentially private EM (DP-EM), and (2) the use of two recently develope… ▽ More

    Submitted 31 October, 2016; v1 submitted 23 May, 2016; originally announced May 2016.

  25. arXiv:1603.07294  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis

    Authors: James Foulds, Joseph Geumlek, Max Welling, Kamalika Chaudhuri

    Abstract: Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015). While this one posterior sample (OPS) approach elegantly provides privacy "for free," it is data inefficient in the sense of asy… ▽ More

    Submitted 8 June, 2016; v1 submitted 23 March, 2016; originally announced March 2016.

    Comments: Updated to match the accepted UAI version. Generalized the ARE result and included a more detailed proof. Improved some figures, etc

    Journal ref: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), 2016

  26. arXiv:1305.2452  [pdf, ps, other

    cs.LG

    Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation

    Authors: James Foulds, Levi Boyles, Christopher Dubois, Padhraic Smyth, Max Welling

    Abstract: In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take fu… ▽ More

    Submitted 10 May, 2013; originally announced May 2013.