-
What is a Goldilocks Face Verification Test Set?
Authors:
Haiyu Wu,
Sicong Tian,
Aman Bhatta,
Jacob Gutierrez,
Grace Bezold,
Genesis Argueta,
Karl Ricanek Jr.,
Michael C. King,
Kevin W. Bowyer
Abstract:
Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accu…
▽ More
Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accuracy levels become saturated, such as LFW $>99.8\%$, more challenging test sets are needed. We show that current train and test sets are generally not identity- or even image-disjoint, and that this results in an optimistic bias in the estimated accuracy. In addition, we show that identity-disjoint folds are important in the 10-fold cross-validation estimate of test accuracy. To better support continued advances in face recognition, we introduce two "Goldilocks" test sets, Hadrian and Eclipse. The former emphasizes challenging facial hairstyles and latter emphasizes challenging over- and under-exposure conditions. Images in both datasets are from a large, controlled-acquisition (not web-scraped) dataset, so they are identity- and image-disjoint with all popular training sets. Accuracy for these new test sets generally falls below that observed on LFW, CPLFW, CALFW, CFP-FP and AgeDB-30, showing that these datasets represent important dimensions for improvement of face recognition. The datasets are available at: \url{https://github.com/HaiyuWu/SOTA-Face-Recognition-Train-and-Test}
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Identifying topologically associating domains using differential kernels
Authors:
Luka Maisuradze,
Megan C. King,
Ivan V. Surovtsev,
Simon G. J. Mochrie,
Mark D. Shattuck,
Corey S. O'Hern
Abstract:
Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix where each element rep…
▽ More
Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix where each element represents the average contact probability or number of contacts between two chromatin loci. Previous studies have detected topologically associating domains (TADs), or self-interacting regions in Hi-C maps within which the contact probability is greater than that outside the region. Many algorithms have been developed to identify TADs within Hi-C maps. However, most TAD identification algorithms are unable to identify nested or overlap** TADs and for a given Hi-C map there is significant variation in the location and number of TADs identified by different methods. We develop a novel method, KerTAD, using a kernel-based technique from computer vision and image processing that is able to accurately identify nested and overlap** TADs. We benchmark this method against state-of-the-art TAD identification methods on both synthetic and experimental data sets. We find that KerTAD consistently has higher true positive rates (TPR) and lower false discovery rates (FDR) than all tested methods for both synthetic and manually annotated experimental Hi-C maps. The TPR for KerTAD is also largely insensitive to increasing noise and sparsity, in contrast to the other methods. We also find that KerTAD is consistent in the number and size of TADs identified across replicate experimental Hi-C maps for several organisms. KerTAD will improve automated TAD identification and enable researchers to better correlate changes in TADs to biological phenomena, such as enhancer-promoter interactions and disease states.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
What's color got to do with it? Face recognition in grayscale
Authors:
Aman Bhatta,
Domingo Mery,
Haiyu Wu,
Joyce Annan,
Micheal C. King,
Kevin W. Bowyer
Abstract:
State-of-the-art deep CNN face matchers are typically created using extensive training sets of color face images. Our study reveals that such matchers attain virtually identical accuracy when trained on either grayscale or color versions of the training set, even when the evaluation is done using color test images. Furthermore, we demonstrate that shallower models, lacking the capacity to model co…
▽ More
State-of-the-art deep CNN face matchers are typically created using extensive training sets of color face images. Our study reveals that such matchers attain virtually identical accuracy when trained on either grayscale or color versions of the training set, even when the evaluation is done using color test images. Furthermore, we demonstrate that shallower models, lacking the capacity to model complex representations, rely more heavily on low-level features such as those associated with color. As a result, they display diminished accuracy when trained with grayscale images. We then consider possible causes for deeper CNN face matchers "not seeing color". Popular web-scraped face datasets actually have 30 to 60% of their identities with one or more grayscale images. We analyze whether this grayscale element in the training set impacts the accuracy achieved, and conclude that it does not. We demonstrate that using only grayscale images for both training and testing achieves accuracy comparable to that achieved using only color images for deeper models. This holds true for both real and synthetic training datasets. HSV color space, which separates chroma and luma information, does not improve the network's learning about color any more than in the RGB color space. We then show that the skin region of an individual's images in a web-scraped training set exhibits significant variation in their map** to color space. This suggests that color carries limited identity-specific information. We also show that when the first convolution layer is restricted to a single filter, models learn a grayscale conversion filter and pass a grayscale version of the input color image to the next layer. Finally, we demonstrate that leveraging the lower per-image storage for grayscale to increase the number of images in the training set can improve accuracy of the face recognition model.
△ Less
Submitted 2 July, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Impact of Blur and Resolution on Demographic Disparities in 1-to-Many Facial Identification
Authors:
Aman Bhatta,
Gabriella Pangelinan,
Michael C. King,
Kevin W. Bowyer
Abstract:
Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera…
▽ More
Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera quality" images. Cumulative match characteristic curves (CMC) are not appropriate for comparing propensity for rank-one recognition errors across demographics, and so we use three metrics for our analysis: (1) the well-known d' metric between mated and non-mated score distributions, and introduced in this work, (2) absolute score difference between thresholds in the high-similarity tail of the non-mated and the low-similarity tail of the mated distribution, and (3) distribution of (mated - non-mated rank-one scores) across the set of probe images. We find that demographic variation in 1-to-many accuracy does not entirely follow what has been observed in 1-to-1 matching accuracy. Also, different from 1-to-1 accuracy, demographic comparison of 1-to-many accuracy can be affected by different numbers of identities and images across demographics. More importantly, we show that increased blur in the probe image, or reduced resolution of the face in the probe image, can significantly increase the false positive identification rate. And we show that the demographic variation in these high blur or low resolution conditions is much larger for male / female than for African-American / Caucasian. The point that 1-to-many accuracy can potentially collapse in the context of processing "surveillance camera quality" probe images against a "government ID quality" gallery is an important one.
△ Less
Submitted 23 January, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Analysis of Adversarial Image Manipulations
Authors:
Ahsi Lo,
Gabriella Pangelinan,
Michael C. King
Abstract:
As virtual and physical identity grow increasingly intertwined, the importance of privacy and security in the online sphere becomes paramount. In recent years, multiple news stories have emerged of private companies scra** web content and doing research with or selling the data. Images uploaded online can be scraped without users' consent or knowledge. Users of social media platforms whose image…
▽ More
As virtual and physical identity grow increasingly intertwined, the importance of privacy and security in the online sphere becomes paramount. In recent years, multiple news stories have emerged of private companies scra** web content and doing research with or selling the data. Images uploaded online can be scraped without users' consent or knowledge. Users of social media platforms whose images are scraped may be at risk of being identified in other uploaded images or in real-world identification situations. This paper investigates how simple, accessible image manipulation techniques affect the accuracy of facial recognition software in identifying an individual's various face images based on one unique image.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
The effect of loops on the mean square displacement of Rouse-model chromatin
Authors:
Tianyu Yuan,
Hao Yan,
Mary Lou P. Bailey,
Jessica F. Williams,
Ivan Surovtsev,
Megan C. King,
Simon G. J. Mochrie
Abstract:
Many researchers have been encouraged to describe the dynamics of chromosomal loci in chromatin using the classical Rouse model of polymer dynamics by the agreement between the measured mean square displacement (MSD) versus time of fluorescently-labelled loci and the Rouse-model predictions. However, the discovery of intermediate-scale chromatin organization, known as topologically associating dom…
▽ More
Many researchers have been encouraged to describe the dynamics of chromosomal loci in chromatin using the classical Rouse model of polymer dynamics by the agreement between the measured mean square displacement (MSD) versus time of fluorescently-labelled loci and the Rouse-model predictions. However, the discovery of intermediate-scale chromatin organization, known as topologically associating domains (TADs), together with the proposed explanation of TADs in terms of chromatin loops and loop extrusion, is at odds with the classical Rouse model, which does not contain loops. Accordingly, we introduce an extended Rouse model that incorporates chromatin loop configurations from loop-extrusion-factor-model simulations. Specifically, we extend the classical Rouse model by modifying the polymer's dynamical matrix to incorporate extra springs that represent loop bases. We also theoretically generalize the friction coefficient matrix so that the Rouse beads with non-uniform friction coefficients are compatible with our Rouse model simulation method. This extended Rouse model allowes us to investigate the impact of loops and loop extrusion on the dynamics of chromatin. We show that loops significantly suppress the averaged MSD of a chromosomal locus, consistent with recent experiments that track fluorescently-labelled chromatin loci in fission yeast [M. L. P. Bailey, I. Surovtsev, J. F. Williams, H. Yan, T. Yuan, S. G. Mochrie, and M. C. King, Mol. Biol. Cell (in press)]. We also find that loops slightly reduce the MSD's stretching exponent from the classical Rouse-model value of 0.5 to a loop-density-dependent value in the 0.45-0.40 range. Remarkably, stretching exponent values in this range have also been reported in recent experiments [S. C. Weber, A. J. Spakowitz, and J. A. Theriot, Phys. Rev. Lett. 104, 238102 (2010) and Bailey et al., Mol. Biol. Cell (in press)].
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Exploring Causes of Demographic Variations In Face Recognition Accuracy
Authors:
Gabriella Pangelinan,
K. S. Krishnapriya,
Vitor Albiero,
Grace Bezold,
Kai Zhang,
Kushal Vangara,
Michael C. King,
Kevin W. Bowyer
Abstract:
In recent years, media reports have called out bias and racism in face recognition technology. We review experimental results exploring several speculated causes for asymmetric cross-demographic performance. We consider accuracy differences as represented by variations in non-mated (impostor) and / or mated (genuine) distributions for 1-to-1 face matching. Possible causes explored include differen…
▽ More
In recent years, media reports have called out bias and racism in face recognition technology. We review experimental results exploring several speculated causes for asymmetric cross-demographic performance. We consider accuracy differences as represented by variations in non-mated (impostor) and / or mated (genuine) distributions for 1-to-1 face matching. Possible causes explored include differences in skin tone, face size and shape, imbalance in number of identities and images in the training data, and amount of face visible in the test data ("face pixels"). We find that demographic differences in face pixel information of the test images appear to most directly impact the resultant differences in face recognition accuracy.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Consistency and Accuracy of CelebA Attribute Values
Authors:
Haiyu Wu,
Grace Bezold,
Manuel Günther,
Terrance Boult,
Michael C. King,
Kevin W. Bowyer
Abstract:
We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes…
▽ More
We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes have contradicting values on from 10 to 860 of the 5,068 duplicates. Manual audit of a subset of CelebA estimates error rates as high as 40% for (no beard=false), even though the labeling consistency experiment indicates that no beard could be assigned with >= 95% consistency. Selecting the mouth slightly open (MSO) for deeper analysis, we estimate the error rate for (MSO=true) at about 20% and (MSO=false) at about 2%. A corrected version of the MSO attribute values enables learning a model that achieves higher accuracy than previously reported for MSO. Corrected values for CelebA MSO are available at https://github.com/HaiyuWu/CelebAMSO.
△ Less
Submitted 16 April, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
The Gender Gap in Face Recognition Accuracy Is a Hairy Problem
Authors:
Aman Bhatta,
Vítor Albiero,
Kevin W. Bowyer,
Michael C. King
Abstract:
It is broadly accepted that there is a "gender gap" in face recognition accuracy, with females having higher false match and false non-match rates. However, relatively little is known about the cause(s) of this gender gap. Even the recent NIST report on demographic effects lists "analyze cause and effect" under "what we did not do". We first demonstrate that female and male hairstyles have importa…
▽ More
It is broadly accepted that there is a "gender gap" in face recognition accuracy, with females having higher false match and false non-match rates. However, relatively little is known about the cause(s) of this gender gap. Even the recent NIST report on demographic effects lists "analyze cause and effect" under "what we did not do". We first demonstrate that female and male hairstyles have important differences that impact face recognition accuracy. In particular, compared to females, male facial hair contributes to creating a greater average difference in appearance between different male faces. We then demonstrate that when the data used to estimate recognition accuracy is balanced across gender for how hairstyles occlude the face, the initially observed gender gap in accuracy largely disappears. We show this result for two different matchers, and analyzing images of Caucasians and of African-Americans. These results suggest that future research on demographic variation in accuracy should include a check for balanced quality of the test data as part of the problem formulation. To promote reproducible research, matchers, attribute classifiers, and datasets used in this research are/will be publicly available.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Face Recognition Accuracy Across Demographics: Shining a Light Into the Problem
Authors:
Haiyu Wu,
Vítor Albiero,
K. S. Krishnapriya,
Michael C. King,
Kevin W. Bowyer
Abstract:
We explore varying face recognition accuracy across demographic groups as a phenomenon partly caused by differences in face illumination. We observe that for a common operational scenario with controlled image acquisition, there is a large difference in face region brightness between African-American and Caucasian, and also a smaller difference between male and female. We show that impostor image…
▽ More
We explore varying face recognition accuracy across demographic groups as a phenomenon partly caused by differences in face illumination. We observe that for a common operational scenario with controlled image acquisition, there is a large difference in face region brightness between African-American and Caucasian, and also a smaller difference between male and female. We show that impostor image pairs with both faces under-exposed, or both overexposed, have an increased false match rate (FMR). Conversely, image pairs with strongly different face brightness have a decreased similarity measure. We propose a brightness information metric to measure variation in brightness in the face and show that face brightness that is too low or too high has reduced information in the face region, providing a cause for the lower accuracy. Based on this, for operational scenarios with controlled image acquisition, illumination should be adjusted for each individual to obtain appropriate face image brightness. This is the first work that we are aware of to explore how the level of brightness of the skin region in a pair of face images (rather than a single image) impacts face recognition accuracy, and to evaluate this as a systematic factor causing unequal accuracy across demographics. The code is at https://github.com/HaiyuWu/FaceBrightness.
△ Less
Submitted 16 April, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Gendered Differences in Face Recognition Accuracy Explained by Hairstyles, Makeup, and Facial Morphology
Authors:
Vítor Albiero,
Kai Zhang,
Michael C. King,
Kevin W. Bowyer
Abstract:
Media reports have accused face recognition of being ''biased'', ''sexist'' and ''racist''. There is consensus in the research literature that face recognition accuracy is lower for females, who often have both a higher false match rate and a higher false non-match rate. However, there is little published research aimed at identifying the cause of lower accuracy for females. For instance, the 2019…
▽ More
Media reports have accused face recognition of being ''biased'', ''sexist'' and ''racist''. There is consensus in the research literature that face recognition accuracy is lower for females, who often have both a higher false match rate and a higher false non-match rate. However, there is little published research aimed at identifying the cause of lower accuracy for females. For instance, the 2019 Face Recognition Vendor Test that documents lower female accuracy across a broad range of algorithms and datasets also lists ''Analyze cause and effect'' under the heading ''What we did not do''. We present the first experimental analysis to identify major causes of lower face recognition accuracy for females on datasets where previous research has observed this result. Controlling for equal amount of visible face in the test images mitigates the apparent higher false non-match rate for females. Additional analysis shows that makeup-balanced datasets further improves females to achieve lower false non-match rates. Finally, a clustering experiment suggests that images of two different females are inherently more similar than of two different males, potentially accounting for a difference in false match rates.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Analysis of Manual and Automated Skin Tone Assignments for Face Recognition Applications
Authors:
KS Krishnapriya,
Michael C. King,
Kevin W. Bowyer
Abstract:
News reports have suggested that darker skin tone causes an increase in face recognition errors. The Fitzpatrick scale is widely used in dermatology to classify sensitivity to sun exposure and skin tone. In this paper, we analyze a set of manual Fitzpatrick skin type assignments and also employ the individual typology angle to automatically estimate the skin tone from face images. The set of manua…
▽ More
News reports have suggested that darker skin tone causes an increase in face recognition errors. The Fitzpatrick scale is widely used in dermatology to classify sensitivity to sun exposure and skin tone. In this paper, we analyze a set of manual Fitzpatrick skin type assignments and also employ the individual typology angle to automatically estimate the skin tone from face images. The set of manual skin tone rating experiments shows that there are inconsistencies between human raters that are difficult to eliminate. Efforts to automate skin tone rating suggest that it is particularly challenging on images collected without a calibration object in the scene. However, after the color-correction, the level of agreement between automated and manual approaches is found to be 96% or better for the MORPH images. To our knowledge, this is the first work to: (a) examine the consistency of manual skin tone ratings across observers, (b) document that there is substantial variation in the rating of the same image by different observers even when exemplar images are given for guidance and all images are color-corrected, and (c) compare manual versus automated skin tone ratings.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Does Face Recognition Error Echo Gender Classification Error?
Authors:
Ying Qiu,
Vítor Albiero,
Michael C. King,
Kevin W. Bowyer
Abstract:
This paper is the first to explore the question of whether images that are classified incorrectly by a face analytics algorithm (e.g., gender classification) are any more or less likely to participate in an image pair that results in a face recognition error. We analyze results from three different gender classification algorithms (one open-source and two commercial), and two face recognition algo…
▽ More
This paper is the first to explore the question of whether images that are classified incorrectly by a face analytics algorithm (e.g., gender classification) are any more or less likely to participate in an image pair that results in a face recognition error. We analyze results from three different gender classification algorithms (one open-source and two commercial), and two face recognition algorithms (one open-source and one commercial), on image sets representing four demographic groups (African-American female and male, Caucasian female and male). For impostor image pairs, our results show that pairs in which one image has a gender classification error have a better impostor distribution than pairs in which both images have correct gender classification, and so are less likely to generate a false match error. For genuine image pairs, our results show that individuals whose images have a mix of correct and incorrect gender classification have a worse genuine distribution (increased false non-match rate) compared to individuals whose images all have correct gender classification. Thus, compared to images that generate correct gender classification, images that generate gender classification errors do generate a different pattern of recognition errors, both better (false match) and worse (false non-match).
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Extrusion of chromatin loops by a composite loop extrusion factor
Authors:
Hao Yan,
Ivan Surovtsev,
Jessica F Williams,
Mary Lou P Bailey,
Megan C King,
Simon G J Mochrie
Abstract:
Chromatin loop extrusion by Structural Maintenance of Chromosome (SMC) complexes is thought to underlie intermediate-scale chromatin organization inside cells. Motivated by a number of experiments suggesting that nucleosomes may block loop extrusion by SMCs, such as cohesin and condensin complexes, we introduce and characterize theoretically a composite loop extrusion factor (composite LEF) model.…
▽ More
Chromatin loop extrusion by Structural Maintenance of Chromosome (SMC) complexes is thought to underlie intermediate-scale chromatin organization inside cells. Motivated by a number of experiments suggesting that nucleosomes may block loop extrusion by SMCs, such as cohesin and condensin complexes, we introduce and characterize theoretically a composite loop extrusion factor (composite LEF) model. In addition to an SMC complex that creates a chromatin loop by encircling two threads of DNA, this model includes a remodeling complex that relocates or removes nucleosomes as it progresses along the chromatin, and nucleosomes that block SMC translocation along the DNA. Loop extrusion is enabled by SMC motion along nucleosome-free DNA, created in the wake of the remodeling complex, while nucleosome re-binding behind the SMC acts as a ratchet, holding the SMC close to the remodeling complex. We show that, for a wide range of parameter values, this collection of factors constitutes a composite LEF that extrudes loops with a velocity, comparable to the velocity of remodeling complex translocation on chromatin in the absence of SMC, and much faster than loop extrusion by an isolated SMC that is blocked by nucleosomes.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Covariance Distributions in Single Particle Tracking
Authors:
Mary Lou P Bailey,
Hao Yan,
Ivan Surovtsev,
Jessica F Williams,
Megan C King,
Simon G J Mochrie
Abstract:
Several recent experiments, including our own in the fission yeast, S. pombe, have characterized the motions of gene loci within living nuclei by measuring the locus position over time, then proceeding to obtain the statistical properties of this motion. To address the question of whether a population of single particle tracks, obtained from many different cells, corresponds to a single mode of di…
▽ More
Several recent experiments, including our own in the fission yeast, S. pombe, have characterized the motions of gene loci within living nuclei by measuring the locus position over time, then proceeding to obtain the statistical properties of this motion. To address the question of whether a population of single particle tracks, obtained from many different cells, corresponds to a single mode of diffusion, we derive theoretical equations describing the probability distribution of the displacement covariance, assuming the displacement is a zero-mean multivariate Gaussian random variable. We also determine the corresponding theoretical means, variances, and third central moments. Bolstering the theory is good agreement between its predictions and the results obtained for various simulated and measured data sets, including simulated particle trajectories of simple and anomalous diffusion, and the measured trajectories of an optically-trapped bead in water, and in a viscoelastic solution. We also show that, for sufficiently long tracks, each covariance distribution in these examples is well-described by a skew-normal distribution with mean, variance, and skewness given by theory. For experimental S. pombe gene locus data, however, we find that the first two covariance distributions are wider than predicted, although the third and subsequent covariances are well-described by theory. This suggests that the origin of the theory-experiment discrepancy is associated with localization noise, which influences only the first two covariances. Thus, we hypothesize that the discrepancy is caused by locus-to-locus heterogeneity in the localization noise. Further simulations reveal excess covariance widths can be largely recreated on the basis of heterogeneous noise. We conclude that the motion of gene loci in fission yeast is consistent with a single mode of diffusion.
△ Less
Submitted 11 January, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Analysis of Gender Inequality In Face Recognition Accuracy
Authors:
Vítor Albiero,
Krishnapriya K. S.,
Kushal Vangara,
Kai Zhang,
Michael C. King,
Kevin W. Bowyer
Abstract:
We present a comprehensive analysis of how and why face recognition accuracy differs between men and women. We show that accuracy is lower for women due to the combination of (1) the impostor distribution for women having a skew toward higher similarity scores, and (2) the genuine distribution for women having a skew toward lower similarity scores. We show that this phenomenon of the impostor and…
▽ More
We present a comprehensive analysis of how and why face recognition accuracy differs between men and women. We show that accuracy is lower for women due to the combination of (1) the impostor distribution for women having a skew toward higher similarity scores, and (2) the genuine distribution for women having a skew toward lower similarity scores. We show that this phenomenon of the impostor and genuine distributions for women shifting closer towards each other is general across datasets of African-American, Caucasian, and Asian faces. We show that the distribution of facial expressions may differ between male/female, but that the accuracy difference persists for image subsets rated confidently as neutral expression. The accuracy difference also persists for image subsets rated as close to zero pitch angle. Even when removing images with forehead partially occluded by hair/hat, the same impostor/genuine accuracy difference persists. We show that the female genuine distribution improves when only female images without facial cosmetics are used, but that the female impostor distribution also degrades at the same time. Lastly, we show that the accuracy difference persists even if a state-of-the-art deep learning method is trained from scratch using training data explicitly balanced between male and female images and subjects.
△ Less
Submitted 31 January, 2020;
originally announced February 2020.
-
Does Face Recognition Accuracy Get Better With Age? Deep Face Matchers Say No
Authors:
Vítor Albiero,
Kevin W. Bowyer,
Kushal Vangara,
Michael C. King
Abstract:
Previous studies generally agree that face recognition accuracy is higher for older persons than for younger persons. But most previous studies were before the wave of deep learning matchers, and most considered accuracy only in terms of the verification rate for genuine pairs. This paper investigates accuracy for age groups 16-29, 30-49 and 50-70, using three modern deep CNN matchers, and conside…
▽ More
Previous studies generally agree that face recognition accuracy is higher for older persons than for younger persons. But most previous studies were before the wave of deep learning matchers, and most considered accuracy only in terms of the verification rate for genuine pairs. This paper investigates accuracy for age groups 16-29, 30-49 and 50-70, using three modern deep CNN matchers, and considers differences in the impostor and genuine distributions as well as verification rates and ROC curves. We find that accuracy is lower for older persons and higher for younger persons. In contrast, a pre deep learning matcher on the same dataset shows the traditional result of higher accuracy for older persons, although its overall accuracy is much lower than that of the deep learning matchers. Comparing the impostor and genuine distributions, we conclude that impostor scores have a larger effect than genuine scores in causing lower accuracy for the older age group. We also investigate the effects of training data across the age groups. Our results show that fine-tuning the deep CNN models on additional images of older persons actually lowers accuracy for the older age group. Also, we fine-tune and train from scratch two models using age-balanced training datasets, and these results also show lower accuracy for older age group. These results argue that the lower accuracy for the older age group is not due to imbalance in the original training data.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Characterizing the Variability in Face Recognition Accuracy Relative to Race
Authors:
KS Krishnapriya,
Kushal Vangara,
Michael C. King,
Vitor Albiero,
Kevin Bowyer
Abstract:
Many recent news headlines have labeled face recognition technology as biased or racist. We report on a methodical investigation into differences in face recognition accuracy between African-American and Caucasian image cohorts of the MORPH dataset. We find that, for all four matchers considered, the impostor and the genuine distributions are statistically significantly different between cohorts.…
▽ More
Many recent news headlines have labeled face recognition technology as biased or racist. We report on a methodical investigation into differences in face recognition accuracy between African-American and Caucasian image cohorts of the MORPH dataset. We find that, for all four matchers considered, the impostor and the genuine distributions are statistically significantly different between cohorts. For a fixed decision threshold, the African-American image cohort has a higher false match rate and a lower false non-match rate. ROC curves compare verification rates at the same false match rate, but the different cohorts achieve the same false match rate at different thresholds. This means that ROC comparisons are not relevant to operational scenarios that use a fixed decision threshold. We show that, for the ResNet matcher, the two cohorts have approximately equal separation of impostor and genuine distributions. Using ICAO compliance as a standard of image quality, we find that the initial image cohorts have unequal rates of good quality images. The ICAO-compliant subsets of the original image cohorts show improved accuracy, with the main effect being to reducing the low-similarity tail of the genuine distributions.
△ Less
Submitted 8 May, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.