Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!
Authors:
Nathan TeBlunthuis,
Valerie Hase,
Chung-Hong Chan
Abstract:
Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading resul…
▽ More
Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.
△ Less
Submitted 10 December, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
Scale-Space Anisotropic Total Variation for Limited Angle Tomography
Authors:
Yixing Huang,
Oliver Taubmann,
Xiaolin Huang,
Viktor Haase,
Guenter Lauritsch,
Andreas Maier
Abstract:
This paper addresses streak reduction in limited angle tomography. Although the iterative reweighted total variation (wTV) algorithm reduces small streaks well, it is rather inept at eliminating large ones since total variation (TV) regularization is scale-dependent and may regard these streaks as homogeneous areas. Hence, the main purpose of this paper is to reduce streak artifacts at various sca…
▽ More
This paper addresses streak reduction in limited angle tomography. Although the iterative reweighted total variation (wTV) algorithm reduces small streaks well, it is rather inept at eliminating large ones since total variation (TV) regularization is scale-dependent and may regard these streaks as homogeneous areas. Hence, the main purpose of this paper is to reduce streak artifacts at various scales. We propose the scale-space anisotropic total variation (ssaTV) algorithm in two different implementations. The first implementation (ssaTV-1) utilizes an anisotropic gradient-like operator which uses 2s neighboring pixels along the streaks' normal direction at each scale s. The second implementation (ssaTV-2) makes use of anisotropic down-sampling and up-sampling operations, similarly oriented along the streaks' normal direction, to apply TV regularization at various scales. Experiments on numerical and clinical data demonstrate that both ssaTV algorithms reduce streak artifacts more effectively and efficiently than wTV, particularly when using multiple scales.
△ Less
Submitted 29 January, 2018; v1 submitted 19 December, 2017;
originally announced December 2017.