-
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Authors:
Christoph Kern,
Stephanie Eckman,
Jacob Beck,
Rob Chew,
Bolei Ma,
Frauke Kreuter
Abstract:
When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annota…
▽ More
When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.
△ Less
Submitted 22 January, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization
Authors:
Mingyuan Cheng,
Xinru Liao,
Quan Liu,
Bin Ma,
Jian Xu,
Bo Zheng
Abstract:
Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed g…
▽ More
Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed great success in treatment effect estimation. However, it remains an open problem how to learn the underlying disentangled factors precisely. Specifically, previous methods fail to obtain independent disentangled factors, which is a necessary condition for identifying treatment effect. In this paper, we propose Disentangled Representations for Counterfactual Regression via Mutual Information Minimization (MIM-DRCFR), which uses a multi-task learning framework to share information when learning the latent factors and incorporates MI minimization learning criteria to ensure the independence of these factors. Extensive experiments including public benchmarks and real-world industrial user growth datasets demonstrate that our method performs much better than state-of-the-art methods.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
A Proof of First Digit Law from Laplace Transform
Authors:
Mingshu Cong,
Bo-Qiang Ma
Abstract:
The first digit law, also known as Benford's law or the significant digit law, is an empirical phenomenon that the leading digit of numbers from real world sources favors small ones in a form $\log(1+{1}/{d})$, where $d=1, 2, ..., 9$. Such a law keeps elusive for over one hundred years because it was obscure whether this law is due to the logical consequence of the number system or some mysterious…
▽ More
The first digit law, also known as Benford's law or the significant digit law, is an empirical phenomenon that the leading digit of numbers from real world sources favors small ones in a form $\log(1+{1}/{d})$, where $d=1, 2, ..., 9$. Such a law keeps elusive for over one hundred years because it was obscure whether this law is due to the logical consequence of the number system or some mysterious mechanism of the nature. We provide a simple and elegant proof of this law from the application of the Laplace transform, which is an important tool of mathematical methods in physics. We reveal that the first digit law is originated from the basic property of the number system, thus it should be attributed as a basic mathematical knowledge for wide applications.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
First digit law from Laplace transform
Authors:
Mingshu Cong,
Congqiao Li,
Bo-Qiang Ma
Abstract:
The occurrence of digits 1 through 9 as the leftmost nonzero digit of numbers from real-world sources is distributed unevenly according to an empirical law, known as Benford's law or the first digit law. It remains obscure why a variety of data sets generated from quite different dynamics obey this particular law. We perform a study of Benford's law from the application of the Laplace transform, a…
▽ More
The occurrence of digits 1 through 9 as the leftmost nonzero digit of numbers from real-world sources is distributed unevenly according to an empirical law, known as Benford's law or the first digit law. It remains obscure why a variety of data sets generated from quite different dynamics obey this particular law. We perform a study of Benford's law from the application of the Laplace transform, and find that the logarithmic Laplace spectrum of the digital indicator function can be approximately taken as a constant. This particular constant, being exactly the Benford term, explains the prevalence of Benford's law. The slight variation from the Benford term leads to deviations from Benford's law for distributions which oscillate violently in the inverse Laplace space. We prove that the whole family of completely monotonic distributions can satisfy Benford's law within a small bound. Our study suggests that Benford's law originates from the way that we write numbers, thus should be taken as a basic mathematical knowledge.
△ Less
Submitted 30 April, 2019;
originally announced May 2019.
-
Statistics of bedload transport over steep slopes: Separation of time scales and collective motion
Authors:
J. Heyman,
F. Mettra,
H. B. Ma,
C. Ancey
Abstract:
Steep slope streams show large fluctuations of sediment discharge across several time scales. These fluctuations may be inherent to the internal dynamics of the sediment transport process. A probabilistic framework thus seems appropriate to analyze such a process. In this letter, we present an experimental study of bedload transport over a steep slope flume for small to moderate Shields numbers. T…
▽ More
Steep slope streams show large fluctuations of sediment discharge across several time scales. These fluctuations may be inherent to the internal dynamics of the sediment transport process. A probabilistic framework thus seems appropriate to analyze such a process. In this letter, we present an experimental study of bedload transport over a steep slope flume for small to moderate Shields numbers. The sampling technique allows the acquisition of high-resolution time series of the solid discharge. The resolved time scales range from $10^{-2}$s up to $10^{5}$s. We show that two distinct time scales can be observed in the probability density function for the waiting time between moving particles. We make the point that the separation of time scales is related to collective dynamics. Proper statistics of a Markov process including collective entrainment are derived. The separation of time scales is recovered theoretically for low entrainment rates.
△ Less
Submitted 13 November, 2016;
originally announced November 2016.