-
Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources
Authors:
Lasse Hyldig Hansen,
Nikolaj Andersen,
Jack Gallifant,
Liam G. McCoy,
James K Stone,
Nura Izath,
Marcela Aguirre-Jerez,
Danielle S Bitterman,
Judy Gichoya,
Leo Anthony Celi
Abstract:
Background Advancements in Large Language Models (LLMs) hold transformative potential in healthcare, however, recent work has raised concern about the tendency of these models to produce outputs that display racial or gender biases. Although training data is a likely source of such biases, exploration of disease and demographic associations in text data at scale has been limited.
Methods We cond…
▽ More
Background Advancements in Large Language Models (LLMs) hold transformative potential in healthcare, however, recent work has raised concern about the tendency of these models to produce outputs that display racial or gender biases. Although training data is a likely source of such biases, exploration of disease and demographic associations in text data at scale has been limited.
Methods We conducted a large-scale textual analysis using a dataset comprising diverse web sources, including Arxiv, Wikipedia, and Common Crawl. The study analyzed the context in which various diseases are discussed alongside markers of race and gender. Given that LLMs are pre-trained on similar datasets, this approach allowed us to examine the potential biases that LLMs may learn and internalize. We compared these findings with actual demographic disease prevalence as well as GPT-4 outputs in order to evaluate the extent of bias representation.
Results Our findings indicate that demographic terms are disproportionately associated with specific disease concepts in online texts. gender terms are prominently associated with disease concepts, while racial terms are much less frequently associated. We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed. Most prominently, we see an overall significant overrepresentation of Black race mentions in comparison to population proportions.
Conclusions Our results highlight the need for critical examination and transparent reporting of biases in LLM pretraining datasets. Our study suggests the need to develop mitigation strategies to counteract the influence of biased training data in LLMs, particularly in sensitive domains such as healthcare.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Pair Programming Practiced in Hybrid Work
Authors:
Anastasiia Tkalich,
Nils Brede Moe,
Nina Haugland Andersen,
Viktoria Stray,
Astri Moksnes Barbala
Abstract:
Pair programming (PP) has been a widespread practice for decades and is known for facilitating knowledge exchange and improving the quality of software. Many agilists advocated the importance of collocation, face-to-face interaction, and physical artifacts incorporated in the shared workspace when pairing. After a long period of forced work-from-home, many knowledge workers prefer to work remotely…
▽ More
Pair programming (PP) has been a widespread practice for decades and is known for facilitating knowledge exchange and improving the quality of software. Many agilists advocated the importance of collocation, face-to-face interaction, and physical artifacts incorporated in the shared workspace when pairing. After a long period of forced work-from-home, many knowledge workers prefer to work remotely two or three days per week, which is affecting practices such as PP. In this revelatory single-case study, we aimed to understand how PP is practiced during hybrid work when team members alternate between on-site days and working from home. We collected qualitative and quantitative data through 11 semi-structured interviews, observations, feedback sessions, and self-reported surveys. The interviewees were members of an agile software development team in a Norwegian fintech company. The results presented in this paper indicate that PP can be practiced through on-site, remote, and mixed sessions, where the mixed mode seems to be the least advantageous. The findings highlight the importance of adapting the work environment to suit individual work mode preferences when it comes to PP. In the future, we will build on these findings to explore PP in other teams and organizations practicing hybrid work.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Self-Distillation for Gaussian Process Regression and Classification
Authors:
Kenneth Borup,
Lars Nørvang Andersen
Abstract:
We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-use…
▽ More
We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
What happens to psychological safety when going remote?
Authors:
Anastasiia Tkalich,
Darja Smite,
Nina Haugland Andersen,
Nils Brede Moe
Abstract:
Psychological safety is a precondition for learning and success in software teams. Companies such as SavingsBank, which is discussed in this article, have developed good practices to facilitate psychological safety, most of which depend on face-to-face interaction. However, what happens to psychological safety when working remotely? In this article, we explore how Norwegian software developers exp…
▽ More
Psychological safety is a precondition for learning and success in software teams. Companies such as SavingsBank, which is discussed in this article, have developed good practices to facilitate psychological safety, most of which depend on face-to-face interaction. However, what happens to psychological safety when working remotely? In this article, we explore how Norwegian software developers experienced pandemic and post-pandemic remote work and describe simple behaviors and attitudes related to psychological safety. We pay special attention to the hybrid work mode, in which team members alternate days in the office with days working from home. Our key takeaway is that spontaneous interaction in the office facilitates psychological safety, while remote work increases the thresholds for both spontaneous interaction and psychological safety. We recommend that software teams synchronize their office presence to increase chances for spontaneous interaction in the office while benefitting from focused work while at home.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
A Comparison of Different Approaches to Dynamic Origin-Destination Matrix Estimation in Urban Traffic
Authors:
Nicklas Sindlev Andersen,
Marco Chiarandini,
Kristian Debrabant
Abstract:
Given the counters of vehicles that traverse the roads of a traffic network, we reconstruct the travel demand that generated them expressed in terms of the number of origin-destination trips made by users. We model the problem as a bi-level optimization problem. At the inner-level, given a tentative demand, we solve a Dynamic Traffic Assignment (DTA) problem to decide the routing of the users betw…
▽ More
Given the counters of vehicles that traverse the roads of a traffic network, we reconstruct the travel demand that generated them expressed in terms of the number of origin-destination trips made by users. We model the problem as a bi-level optimization problem. At the inner-level, given a tentative demand, we solve a Dynamic Traffic Assignment (DTA) problem to decide the routing of the users between their origins and destinations. Finally, we adjust the number of trips and their origins and destinations at the outer-level to minimize the discrepancy between the counters generated at the inner-level and the given vehicle counts measured by sensors in the traffic network. We solve the DTA problem by employing a mesoscopic model implemented by the traffic simulator SUMO. Thus, the outer problem becomes an optimization problem that minimizes a black-box Objective Function (OF) determined by the results of the simulation, which is a costly computation. We study different approaches to the outer-level problem categorized as gradient-based and derivative-free approaches. Among the gradient-based approaches, we look at an assignment matrix-based approach and an assignment matrix-free approach that uses the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm. Among the derivative-free approaches, we investigate Machine Learning (ML) algorithms to learn a model of the simulator that can then be used as a surrogate OF in the optimization problem. We compare these approaches computationally on an artificial network. The gradient-based approaches perform the best in terms of solution quality and computational requirements. In contrast, the results obtained by the ML approach are currently less satisfactory but provide an interesting avenue for future research.
△ Less
Submitted 31 May, 2022; v1 submitted 31 January, 2022;
originally announced February 2022.
-
Detecting Wandering Behavior of People with Dementia
Authors:
Nicklas Sindlev Andersen,
Marco Chiarandini,
Stefan Jänicke,
Panagiotis Tampakis,
Arthur Zimek
Abstract:
Wandering is a problematic behavior in people with dementia that can lead to dangerous situations. To alleviate this problem we design an approach for the real-time automatic detection of wandering leading to getting lost. The approach relies on GPS data to determine frequent locations between which movement occurs and a step that transforms GPS data into geohash sequences. Those can be used to fi…
▽ More
Wandering is a problematic behavior in people with dementia that can lead to dangerous situations. To alleviate this problem we design an approach for the real-time automatic detection of wandering leading to getting lost. The approach relies on GPS data to determine frequent locations between which movement occurs and a step that transforms GPS data into geohash sequences. Those can be used to find frequent and normal movement patterns in historical data to then be able to determine whether a new on-going sequence is anomalous. We conduct experiments on synthetic data to test the ability of the approach to find frequent locations and to compare it against an alternative, state-of-the-art approach. Our approach is able to identify frequent locations and to obtain good performance (up to AUC = 0.99 for certain parameter settings) outperforming the state-of-the-art approach.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Not all noise is accounted equally: How differentially private learning benefits from large sampling rates
Authors:
Friedrich Dörmann,
Osvald Frisk,
Lars Nørvang Andersen,
Christian Fischer Pedersen
Abstract:
Learning often involves sensitive data and as such, privacy preserving extensions to Stochastic Gradient Descent (SGD) and other machine learning algorithms have been developed using the definitions of Differential Privacy (DP). In differentially private SGD, the gradients computed at each training iteration are subject to two different types of noise. Firstly, inherent sampling noise arising from…
▽ More
Learning often involves sensitive data and as such, privacy preserving extensions to Stochastic Gradient Descent (SGD) and other machine learning algorithms have been developed using the definitions of Differential Privacy (DP). In differentially private SGD, the gradients computed at each training iteration are subject to two different types of noise. Firstly, inherent sampling noise arising from the use of minibatches. Secondly, additive Gaussian noise from the underlying mechanisms that introduce privacy. In this study, we show that these two types of noise are equivalent in their effect on the utility of private neural networks, however they are not accounted for equally in the privacy budget. Given this observation, we propose a training paradigm that shifts the proportions of noise towards less inherent and more additive noise, such that more of the overall noise can be accounted for in the privacy budget. With this paradigm, we are able to improve on the state-of-the-art in the privacy/utility tradeoff of private end-to-end CNNs.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do
Authors:
Patrick Schramowski,
Cigdem Turan,
Nico Andersen,
Constantin A. Rothkopf,
Kristian Kersting
Abstract:
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, its variants, GPT-2/3, and others. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended state of the art for many NLP tasks and shown that they capture not only linguistic knowledge but also retain general knowledge i…
▽ More
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, its variants, GPT-2/3, and others. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended state of the art for many NLP tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show that recent LMs also contain human-like biases of what is right and wrong to do, some form of ethical and moral norms of the society -- they bring a "moral direction" to surface. That is, we show that these norms can be captured geometrically by a direction, which can be computed, e.g., by a PCA, in the embedding space, reflecting well the agreement of phrases to social norms implicitly expressed in the training texts and providing a path for attenuating or even preventing toxic degeneration in LMs. Being able to rate the (non-)normativity of arbitrary phrases without explicitly training the LM for this task, we demonstrate the capabilities of the "moral direction" for guiding (even other) LMs towards producing normative text and showcase it on RealToxicityPrompts testbed, preventing the neural toxic degeneration in GPT-2.
△ Less
Submitted 14 February, 2022; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Wandering and getting lost: the architecture of an app activating local communities on dementia issues
Authors:
Nicklas Sindlev Andersen,
Marco Chiarandini,
Jacopo Mauro
Abstract:
We describe the architecture of Sammen Om Demens (SOD), an application for portable devices aiming at hel** persons with dementia when wandering and getting lost through the involvement of caregivers, family members, and ordinary citizens who volunteer.
To enable the real-time detection of a person with dementia that has lost orientation, we transfer location data at high frequency from a fron…
▽ More
We describe the architecture of Sammen Om Demens (SOD), an application for portable devices aiming at hel** persons with dementia when wandering and getting lost through the involvement of caregivers, family members, and ordinary citizens who volunteer.
To enable the real-time detection of a person with dementia that has lost orientation, we transfer location data at high frequency from a frontend on the smartphone of a person with dementia to a backend system. The backend system must be able to cope with the high throughput data and carry out possibly heavy computations for the detection of anomalous behavior via artificial intelligence techniques. This sets certain performance and architectural requirements on the design of the backend.
In the paper, we discuss our design and implementation choices for the backend of SOD that involve microservices and serverless services to achieve efficiency and scalability. We give evidence of the achieved goals by deploying the SOD backend on a public cloud and measuring the performance on simulated load tests.
△ Less
Submitted 25 October, 2021; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation
Authors:
Kenneth Borup,
Lars N. Andersen
Abstract:
Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of se…
▽ More
Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. This allows us to provide the first theoretical results on the importance of using the weighted ground-truth targets in self-distillation. Our focus is on fitting nonlinear functions to training data with a weighted mean square error objective function suitable for distillation, subject to $\ell_2$ regularization of the model parameters. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization. Furthermore, we provide a closed form solution for the optimal choice of weighting parameter at each step, and show how to efficiently estimate this weighting parameter for deep learning and significantly reduce the computational requirements compared to a grid search.
△ Less
Submitted 15 October, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
A 97mW 110MS/s 12b Pipeline ADC Implemented in 0.18$μ$m Digital CMOS
Authors:
Terje N. Andersen,
Atle Briskemyr,
Frode Telsto,
Johnny Bjornsen,
Thomas E. Bonnerud,
Bjornar Hernes,
Oystein Moldsvor
Abstract:
A 12 bit Pipeline ADC fabricated in a 0.18 $μ$m pure digital CMOS technology is presented. Its nominal conversion rate is 110MS/s and the nominal supply voltage is 1.8V. The effective number of bits is 10.4 when a 10MHz input signal with 2V_{P-P} signal swing is applied. The occupied silicon area is 0.86mm^2 and the power consumption equals 97mW. A switched capacitor bias current circuit scale t…
▽ More
A 12 bit Pipeline ADC fabricated in a 0.18 $μ$m pure digital CMOS technology is presented. Its nominal conversion rate is 110MS/s and the nominal supply voltage is 1.8V. The effective number of bits is 10.4 when a 10MHz input signal with 2V_{P-P} signal swing is applied. The occupied silicon area is 0.86mm^2 and the power consumption equals 97mW. A switched capacitor bias current circuit scale the bias current automatically with the conversion rate, which gives scaleable power consumption and full performance of the ADC from 20 to 140MS/s.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.