Search | arXiv e-print repository

Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models

Authors: Carlos Aguirre, Kuleen Sasse, Isabel Cachola, Mark Dredze

Abstract: Recently, work in NLP has shifted to few-shot (in-context) learning, with large language models (LLMs) performing well across a range of tasks. However, while fairness evaluations have become a standard for supervised methods, little is known about the fairness of LLMs as prediction systems. Further, common standard methods for fairness involve access to models weights or are applied during finetu… ▽ More Recently, work in NLP has shifted to few-shot (in-context) learning, with large language models (LLMs) performing well across a range of tasks. However, while fairness evaluations have become a standard for supervised methods, little is known about the fairness of LLMs as prediction systems. Further, common standard methods for fairness involve access to models weights or are applied during finetuning, which are not applicable in few-shot learning. Do LLMs exhibit prediction biases when used for standard NLP tasks? In this work, we explore the effect of shots, which directly affect the performance of models, on the fairness of LLMs as NLP classification systems. We consider how different shot selection strategies, both existing and new demographically sensitive methods, affect model fairness across three standard fairness datasets. We discuss how future work can include LLM fairness evaluations. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2305.12671 [pdf, other]

Transferring Fairness using Multi-Task Learning with Limited Demographic Information

Authors: Carlos Aguirre, Mark Dredze

Abstract: Training supervised machine learning systems with a fairness loss can improve prediction fairness across different demographic groups. However, doing so requires demographic annotations for training data, without which we cannot produce debiased classifiers for most tasks. Drawing inspiration from transfer learning methods, we investigate whether we can utilize demographic data from a related task… ▽ More Training supervised machine learning systems with a fairness loss can improve prediction fairness across different demographic groups. However, doing so requires demographic annotations for training data, without which we cannot produce debiased classifiers for most tasks. Drawing inspiration from transfer learning methods, we investigate whether we can utilize demographic data from a related task to improve the fairness of a target task. We adapt a single-task fairness loss to a multi-task setting to exploit demographic labels from a related task in debiasing a target task and demonstrate that demographic fairness objectives transfer fairness within a multi-task framework. Additionally, we show that this approach enables intersectional fairness by transferring between two datasets with different single-axis demographics. We explore different data domains to show how our loss can improve fairness domains and tasks. △ Less

Submitted 15 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2211.07932 [pdf, other]

Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics

Authors: Carlos Aguirre, Mark Dredze, Philip Resnik

Abstract: Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-repo… ▽ More Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression. Finally, we find that differences in stressors translate to downstream performance differences across demographic groups. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

arXiv:2103.10550 [pdf, other]

Gender and Racial Fairness in Depression Research using Social Media

Authors: Carlos Aguirre, Keith Harrigian, Mark Dredze

Abstract: Multiple studies have demonstrated that behavior on internet-based social media platforms can be indicative of an individual's mental health status. The widespread availability of such data has spurred interest in mental health research from a computational lens. While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these b… ▽ More Multiple studies have demonstrated that behavior on internet-based social media platforms can be indicative of an individual's mental health status. The widespread availability of such data has spurred interest in mental health research from a computational lens. While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these biases actually manifest themselves with respect to different demographic groups, such as gender and racial/ethnic groups. Here, we analyze the fairness of depression classifiers trained on Twitter data with respect to gender and racial demographic groups. We find that model performance systematically differs for underrepresented groups and that these discrepancies cannot be fully explained by trivial data representation issues. Our study concludes with recommendations on how to avoid these biases in future research. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted to EACL 2021

arXiv:2011.05233 [pdf, other]

On the State of Social Media Data for Mental Health Research

Authors: Keith Harrigian, Carlos Aguirre, Mark Dredze

Abstract: Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain, in terms of both medical understanding and system performance, remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-relate… ▽ More Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain, in terms of both medical understanding and system performance, remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis. △ Less

Submitted 25 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: Originally submitted to ICWSM in January 2020. v1 updated November 2020. v2 updated April 2021, to appear at CLPsych 2021. Supplementary material at https://github.com/kharrigian/mental-health-datasets

arXiv:2002.00994 [pdf, other]

doi 10.1093/mnras/staa350

Scalable End-to-end Recurrent Neural Network for Variable star classification

Authors: Ignacio Becker, Karim Pichara, Márcio Catelan, Pavlos Protopapas, Carlos Aguirre, Fatemeh Nikzat

Abstract: During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied.… ▽ More During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on Recurrent Neural Networks and test them in automated classification scenarios. Our method uses minimal data preprocessing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive datasets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia and WISE. We obtain accuracies of about $95\%$ in the main classes and $75\%$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light curve size, while the traditional approach cost grows as $N\log{(N)}$. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: 15 pages, 17 figures. To be published in MNRAS

arXiv:1912.07747 [pdf]

doi 10.1109/ICDARW.2019.10037

Pipelines for Procedural Information Extraction from Scientific Literature: Towards Recipes using Machine Learning and Data Science

Authors: Huichen Yang, Carlos A. Aguirre, Maria F. De La Torre, Derek Christensen, Luis Bobadilla, Emily Davich, Jordan Roth, Lei Luo, Yihong Theis, Alice Lam, T. Yong-** Han, David Buttler, William H. Hsu

Abstract: This paper describes a machine learning and data science pipeline for structured information extraction from documents, implemented as a suite of open-source tools and extensions to existing tools. It centers around a methodology for extracting procedural information in the form of recipes, stepwise procedures for creating an artifact (in this case synthesizing a nanomaterial), from published scie… ▽ More This paper describes a machine learning and data science pipeline for structured information extraction from documents, implemented as a suite of open-source tools and extensions to existing tools. It centers around a methodology for extracting procedural information in the form of recipes, stepwise procedures for creating an artifact (in this case synthesizing a nanomaterial), from published scientific literature. From our overall goal of producing recipes from free text, we derive the technical objectives of a system consisting of pipeline stages: document acquisition and filtering, payload extraction, recipe step extraction as a relationship extraction task, recipe assembly, and presentation through an information retrieval interface with question answering (QA) functionality. This system meets computational information and knowledge management (CIKM) requirements of metadata-driven payload extraction, named entity extraction, and relationship extraction from text. Functional contributions described in this paper include semi-supervised machine learning methods for PDF filtering and payload extraction tasks, followed by structured extraction and data transformation tasks beginning with section extraction, recipe steps as information tuples, and finally assembled recipes. Measurable objective criteria for extraction quality include precision and recall of recipe steps, ordering constraints, and QA accuracy, precision, and recall. Results, key novel contributions, and significant open problems derived from this work center around the attribution of these holistic quality measures to specific machine learning and inference stages of the pipeline, each with their performance measures. The desired recipes contain identified preconditions, material inputs, and operations, and constitute the overall output generated by our computational information and knowledge management (CIKM) system. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: 15th International Conference on Document Analysis and Recognition Workshops (ICDARW 2019)

Report number: 2019-1 MSC Class: I.2.7; I.2.6; H.3.3; H.3.4; I.2.10; I.5.4 ACM Class: I.2.7; I.2.6; H.3.3; H.3.4; I.2.10; I.5.4

arXiv:1907.07768 [pdf, other]

A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams

Authors: Avishek Bose, Vahid Behzadan, Carlos Aguirre, William H. Hsu

Abstract: We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and develo** (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a… ▽ More We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and develo** (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a holistic measure, this work focuses on detecting both novel and develo** events using an unsupervised machine learning approach. Furthermore, our proposed approach enables the ranking of cyber threat events based on an importance score by extracting the tweet terms that are characterized as named entities, keywords, or both. We also impute influence to users in order to assign a weighted score to noun phrases in proportion to user influence and the corresponding event scores for named entities and keywords. To evaluate the performance of our proposed approach, we measure the efficiency and detection error rate for events over a specified time interval, relative to human annotator ground truth. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: 9 pages, 3 figures, and 5 tables

arXiv:1810.09440 [pdf, other]

doi 10.1093/mnras/sty2836

Deep multi-survey classification of variable stars

Authors: Carlos Aguirre, Karim Pichara, Ignacio Becker

Abstract: During the last decade, a considerable amount of effort has been made to classify variable stars using different machine learning techniques. Typically, light curves are represented as vectors of statistical descriptors or features that are used to train various algorithms. These features demand big computational powers that can last from hours to days, making impossible to create scalable and eff… ▽ More During the last decade, a considerable amount of effort has been made to classify variable stars using different machine learning techniques. Typically, light curves are represented as vectors of statistical descriptors or features that are used to train various algorithms. These features demand big computational powers that can last from hours to days, making impossible to create scalable and efficient ways of automatically classifying variable stars. Also, light curves from different surveys cannot be integrated and analyzed together when using features, because of observational differences. For example, having variations in cadence and filters, feature distributions become biased and require expensive data-calibration models. The vast amount of data that will be generated soon make necessary to develop scalable machine learning architectures without expensive integration techniques. Convolutional Neural Networks have shown impressing results in raw image classification and representation within the machine learning literature. In this work, we present a novel Deep Learning model for light curve classification, mainly based on convolutional units. Our architecture receives as input the differences between time and magnitude of light curves. It captures the essential classification patterns regardless of cadence and filter. In addition, we introduce a novel data augmentation schema for unevenly sampled time series. We test our method using three different surveys: OGLE-III; Corot; and VVV, which differ in filters, cadence, and area of the sky. We show that besides the benefit of scalability, our model obtains state of the art levels accuracy in light curve classification benchmarks. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: Accepted for publication in Monthly Notices of the Royal Astronomical Society

arXiv:1211.5986 [pdf, ps, other]

Signal recognition and adapted filtering by non-commutative tomography

Authors: Carlos Aguirre, R. Vilela Mendes

Abstract: Tomograms, a generalization of the Radon transform to arbitrary pairs of non-commuting operators, are positive bilinear transforms with a rigorous probabilistic interpretation which provide a full characterization of the signal and are robust in the presence of noise. Tomograms based on the time-frequency operator pair, were used in the past for component separation and denoising. Here we show how… ▽ More Tomograms, a generalization of the Radon transform to arbitrary pairs of non-commuting operators, are positive bilinear transforms with a rigorous probabilistic interpretation which provide a full characterization of the signal and are robust in the presence of noise. Tomograms based on the time-frequency operator pair, were used in the past for component separation and denoising. Here we show how, by the construction of an operator pair adapted to the signal, meaningful information with good time resolution is extracted even in very noisy situations. △ Less

Submitted 26 November, 2012; originally announced November 2012.

Comments: 19 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1107.0929

Journal ref: IET Signal Processing 8 (2014) 67 - 75

Showing 1–10 of 10 results for author: Aguirre, C