-
Gender mobility in the labor market with skills-based matching models
Authors:
Ajaya Adhikari,
Steven Vethman,
Daan Vos,
Marc Lenz,
Ioana Cocu,
Ioannis Tolios,
Cor J. Veenman
Abstract:
Skills-based matching promises mobility of workers between different sectors and occupations in the labor market. In this case, job seekers can look for jobs they do not yet have experience in, but for which they do have relevant skills. Currently, there are multiple occupations with a skewed gender distribution. For skills-based matching, it is unclear if and how a shift in the gender distributio…
▽ More
Skills-based matching promises mobility of workers between different sectors and occupations in the labor market. In this case, job seekers can look for jobs they do not yet have experience in, but for which they do have relevant skills. Currently, there are multiple occupations with a skewed gender distribution. For skills-based matching, it is unclear if and how a shift in the gender distribution, which we call gender mobility, between occupations will be effected. It is expected that the skills-based matching approach will likely be data-driven, including computational language models and supervised learning methods.
This work, first, shows the presence of gender segregation in language model-based skills representation of occupations. Second, we assess the use of these representations in a potential application based on simulated data, and show that the gender segregation is propagated by various data-driven skills-based matching models.These models are based on different language representations (bag of words, word2vec, and BERT), and distance metrics (static and machine learning-based). Accordingly, we show how skills-based matching approaches can be evaluated and compared on matching performance as well as on the risk of gender segregation. Making the gender segregation bias of models more explicit can help in generating healthy trust in the use of these models in practice.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Anomalous NO2 emitting ship detection with TROPOMI satellite data and machine learning
Authors:
Solomiia Kurchaba,
Jasper van Vliet,
Fons J. Verbeek,
Cor J. Veenman
Abstract:
Starting from 2021, more demanding $\text{NO}_\text{x}$ emission restrictions were introduced for ships operating in the North and Baltic Sea waters. Since all methods currently used for ship compliance monitoring are financially and time demanding, it is important to prioritize the inspection of ships that have high chances of being non-compliant. The current state-of-the-art approach for a large…
▽ More
Starting from 2021, more demanding $\text{NO}_\text{x}$ emission restrictions were introduced for ships operating in the North and Baltic Sea waters. Since all methods currently used for ship compliance monitoring are financially and time demanding, it is important to prioritize the inspection of ships that have high chances of being non-compliant. The current state-of-the-art approach for a large-scale ship $\text{NO}_\text{2}$ estimation is a supervised machine learning-based segmentation of ship plumes on TROPOMI/S5P images. However, challenging data annotation and insufficiently complex ship emission proxy used for the validation limit the applicability of the model for ship compliance monitoring. In this study, we present a method for the automated selection of potentially non-compliant ships using a combination of machine learning models on TROPOMI satellite data. It is based on a proposed regression model predicting the amount of $\text{NO}_\text{2}$ that is expected to be produced by a ship with certain properties operating in the given atmospheric conditions. The model does not require manual labeling and is validated with TROPOMI data directly. The differences between the predicted and actual amount of produced $\text{NO}_\text{2}$ are integrated over observations of the ship in time and are used as a measure of the inspection worthiness of a ship. To assure the robustness of the results, we compare the obtained results with the results of the previously developed segmentation-based method. Ships that are also highly deviating in accordance with the segmentation method require further attention. If no other explanations can be found by checking the TROPOMI data, the respective ships are advised to be the candidates for inspection.
△ Less
Submitted 7 April, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
PERFEX: Classifier Performance Explanations for Trustworthy AI Systems
Authors:
Erwin Walraven,
Ajaya Adhikari,
Cor J. Veenman
Abstract:
Explainability of a classification model is crucial when deployed in real-world decision support systems. Explanations make predictions actionable to the user and should inform about the capabilities and limitations of the system. Existing explanation methods, however, typically only provide explanations for individual predictions. Information about conditions under which the classifier is able to…
▽ More
Explainability of a classification model is crucial when deployed in real-world decision support systems. Explanations make predictions actionable to the user and should inform about the capabilities and limitations of the system. Existing explanation methods, however, typically only provide explanations for individual predictions. Information about conditions under which the classifier is able to support the decision maker is not available, while for instance information about when the system is not able to differentiate classes can be very helpful. In the development phase it can support the search for new features or combining models, and in the operational phase it supports decision makers in deciding e.g. not to use the system. This paper presents a method to explain the qualities of a trained base classifier, called PERFormance EXplainer (PERFEX). Our method consists of a meta tree learning algorithm that is able to predict and explain under which conditions the base classifier has a high or low error or any other classification performance metric. We evaluate PERFEX using several classifiers and datasets, including a case study with urban mobility data. It turns out that PERFEX typically has high meta prediction performance even if the base classifier is hardly able to differentiate classes, while giving compact performance explanations.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Supervised segmentation of NO2 plumes from individual ships using TROPOMI satellite data
Authors:
Solomiia Kurchaba,
Jasper van Vliet,
Fons J. Verbeek,
Jacqueline J. Meulman,
Cor J. Veenman
Abstract:
The ship** industry is one of the strongest anthropogenic emitters of $\text{NO}_\text{x}$ -- substance harmful both to human health and the environment. The rapid growth of the industry causes societal pressure on controlling the emission levels produced by ships. All the methods currently used for ship emission monitoring are costly and require proximity to a ship, which makes global and conti…
▽ More
The ship** industry is one of the strongest anthropogenic emitters of $\text{NO}_\text{x}$ -- substance harmful both to human health and the environment. The rapid growth of the industry causes societal pressure on controlling the emission levels produced by ships. All the methods currently used for ship emission monitoring are costly and require proximity to a ship, which makes global and continuous emission monitoring impossible. A promising approach is the application of remote sensing. Studies showed that some of the $\text{NO}_\text{2}$ plumes from individual ships can visually be distinguished using the TROPOspheric Monitoring Instrument on board the Copernicus Sentinel 5 Precursor (TROPOMI/S5P). To deploy a remote sensing-based global emission monitoring system, an automated procedure for the estimation of $\text{NO}_\text{2}$ emissions from individual ships is needed. The extremely low signal-to-noise ratio of the available data as well as the absence of ground truth makes the task very challenging. Here, we present a methodology for the automated segmentation of $\text{NO}_\text{2}$ plumes produced by seagoing ships using supervised machine learning on TROPOMI/S5P data. We show that the proposed approach leads to a more than a 20\% increase in the average precision score in comparison to the methods used in previous studies and results in a high correlation of 0.834 with the theoretically derived ship emission proxy. This work is a crucial step toward the development of an automated procedure for global ship emission monitoring using remote sensing data.
△ Less
Submitted 7 April, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Context-Aware Discrimination Detection in Job Vacancies using Computational Language Models
Authors:
S. Vethman,
A. Adhikari,
M. H. T. de Boer,
J. A. G. M. van Genabeek,
C. J. Veenman
Abstract:
Discriminatory job vacancies are disapproved worldwide, but remain persistent. Discrimination in job vacancies can be explicit by directly referring to demographic memberships of candidates. More implicit forms of discrimination are also present that may not always be illegal but still influence the diversity of applicants. Explicit written discrimination is still present in numerous job vacancies…
▽ More
Discriminatory job vacancies are disapproved worldwide, but remain persistent. Discrimination in job vacancies can be explicit by directly referring to demographic memberships of candidates. More implicit forms of discrimination are also present that may not always be illegal but still influence the diversity of applicants. Explicit written discrimination is still present in numerous job vacancies, as was recently observed in the Netherlands. Current efforts for the detection of explicit discrimination concern the identification of job vacancies containing potentially discriminating terms such as "young" or "male". However, automatic detection is inefficient due to low precision: e.g. "we are a young company" or "working with mostly male patients" are phrases that contain explicit terms, while the context shows that these do not reflect discriminatory content.
In this paper, we show how machine learning based computational language models can raise precision in the detection of explicit discrimination by identifying when the potentially discriminating terms are used in a discriminatory context. We focus on gender discrimination, which indeed suffers from low precision when filtering explicit terms. First, we created a data set for gender discrimination in job vacancies. Second, we investigated a variety of computational language models for discriminatory context detection. Third, we evaluated the capability of these models to detect unforeseen discriminating terms in context. The results show that machine learning based methods can detect explicit gender discrimination with high precision and help in finding new forms of discrimination. Accordingly, the proposed methods can substantially increase the effectiveness of detecting job vacancies which are highly suspected to be discriminatory. In turn, this may lower the discrimination experienced at the start of the recruitment process.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Fair Tree Classifier using Strong Demographic Parity
Authors:
António Pereira Barata,
Frank W. Takes,
H. Jaap van den Herik,
Cor J. Veenman
Abstract:
When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the th…
▽ More
When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair tree classification methods mainly optimise for a fixed threshold on both the classification task as well as the fairness metric. In this paper, we propose a compound splitting criterion which combines threshold-free (i.e., strong) demographic parity with ROC-AUC termed SCAFF -- Splitting Criterion AUC for Fairness -- and easily extends to bagged and boosted tree frameworks. Our method simultaneously leverages multiple sensitive attributes of which the values may be multicategorical or intersectional, and is tunable with respect to the unavoidable performance-fairness trade-off. In our experiments, we demonstrate how SCAFF generates models with performance and fairness with respect to binary, multicategorical, and multiple sensitive attributes.
△ Less
Submitted 22 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.