-
FairTargetSim: An Interactive Simulator for Understanding and Explaining the Fairness Effects of Target Variable Definition
Authors:
Dalia Gala,
Milo Phillips-Brown,
Naman Goel,
Carinal Prunkl,
Laura Alvarez Jubete,
medb corcoran,
Ray Eitel-Porter
Abstract:
Machine learning requires defining one's target variable for predictions or decisions, a process that can have profound implications on fairness: biases are often encoded in target variable definition itself, before any data collection or training. We present an interactive simulator, FairTargetSim (FTS), that illustrates how target variable definition impacts fairness. FTS is a valuable tool for…
▽ More
Machine learning requires defining one's target variable for predictions or decisions, a process that can have profound implications on fairness: biases are often encoded in target variable definition itself, before any data collection or training. We present an interactive simulator, FairTargetSim (FTS), that illustrates how target variable definition impacts fairness. FTS is a valuable tool for algorithm developers, researchers, and non-technical stakeholders. FTS uses a case study of algorithmic hiring, using real-world data and user-defined target variables. FTS is open-source and available at: http://tinyurl.com/ftsinterface. The video accompanying this paper is here: http://tinyurl.com/ijcaifts.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
cuSLINK: Single-linkage Agglomerative Clustering on the GPU
Authors:
Corey J. Nolet,
Divye Gala,
Alex Fender,
Mahesh Doijade,
Joe Eaton,
Edward Raff,
John Zedlewski,
Brad Rees,
Tim Oates
Abstract:
In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, a…
▽ More
In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Authors:
Conrad Borchers,
Dalia Sara Gala,
Benjamin Gilburt,
Eduard Oravkin,
Wilfried Bounsi,
Yuki M. Asano,
Hannah Rose Kirk
Abstract:
The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal…
▽ More
The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal of writing unbiased and realistic job advertisements. We first assess the bias and realism of zero-shot generated advertisements and compare them to real-world advertisements. We then evaluate prompt-engineering and fine-tuning as debiasing methods. We find that prompt-engineering with diversity-encouraging prompts gives no significant improvement to bias, nor realism. Conversely, fine-tuning, especially on unbiased real advertisements, can improve realism and reduce bias.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
GPU Semiring Primitives for Sparse Neighborhood Methods
Authors:
Corey J. Nolet,
Divye Gala,
Edward Raff,
Joe Eaton,
Brad Rees,
John Zedlewski,
Tim Oates
Abstract:
High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations. We demonstrate that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures while maintaining performance and memory efficiency on t…
▽ More
High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations. We demonstrate that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures while maintaining performance and memory efficiency on the GPU. We further show that this primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input. To our knowledge, this is the first work aiming to unify the computation of several critical distance measures on the GPU under a single flexible design paradigm and we hope that it provides a good baseline for future research in this area. Our implementation is fully open source and publicly available as part of the RAFT library of GPU-accelerated machine learning primitives (https://github.com/rapidsai/raft).
△ Less
Submitted 4 March, 2022; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Analyzing Gender Bias within Narrative Tropes
Authors:
Dhruvil Gala,
Mohammad Omar Khursheed,
Hannah Lerner,
Brendan O'Connor,
Mohit Iyyer
Abstract:
Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated wi…
▽ More
Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated with 1.9M examples of their occurrences across film, television, and literature. We automatically score the "genderedness" of each trope in our TVTROPES dataset, which enables an analysis of (1) highly-gendered topics within tropes, (2) the relationship between gender bias and popular reception, and (3) how the gender of a work's creator correlates with the types of tropes that they use.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Multi-Sound-Source Localization Using Machine Learning for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone Array
Authors:
Deepak Gala,
Nathan Lindsay,
Liang Sun
Abstract:
Abstract While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capabilities have not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform three-dimensional (3D) multi-sound-sources localization (MSSL) using only the inter-channel time difference (ICTD) signal generated by…
▽ More
Abstract While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capabilities have not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform three-dimensional (3D) multi-sound-sources localization (MSSL) using only the inter-channel time difference (ICTD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on two machine learning techniques viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant environment.
△ Less
Submitted 28 June, 2020; v1 submitted 13 April, 2018;
originally announced April 2018.
-
Realtime Active Sound Source Localization for Unmanned Ground Robots Using a Self-Rotational Bi-Microphone Array
Authors:
Deepak Gala,
Nathan Lindsay,
Liang Sun
Abstract:
This work presents a novel technique that performs both orientation and distance localization of a sound source in a three-dimensional (3D) space using only the interaural time difference (ITD) cue, generated by a newly-developed self-rotational bi-microphone robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis…
▽ More
This work presents a novel technique that performs both orientation and distance localization of a sound source in a three-dimensional (3D) space using only the interaural time difference (ITD) cue, generated by a newly-developed self-rotational bi-microphone robotic platform. The system dynamics is established in the spherical coordinate frame using a state-space model. The observability analysis of the state-space model shows that the system is unobservable when the sound source is placed with elevation angles of $90$ and $0$ degree. The proposed method utilizes the difference between the azimuth estimates resulting from respectively the 3D and the two-dimensional models to check the zero-degree-elevation condition and further estimates the elevation angle using a polynomial curve fitting approach. Also, the proposed method is capable of detecting a $90$-degree elevation by extracting the zero-ITD signal 'buried' in noise. Additionally, a distance localization is performed by first rotating the microphone array to face toward the sound source and then shifting the microphone perpendicular to the source-robot vector by a predefined distance of a fixed number of steps. The integrated rotational and translational motions of the microphone array provide a complete orientation and distance localization using only the ITD cue. A novel robotic platform using a self-rotational bi-microphone array was also developed for unmanned ground robots performing sound source localization. The proposed technique was first tested in simulation and was then verified on the newly-developed robotic platform. Experimental data collected by the microphones installed on a KEMAR dummy head were also used to test the proposed technique. All results show the effectiveness of the proposed technique.
△ Less
Submitted 10 April, 2018;
originally announced April 2018.