Search | arXiv e-print repository

arXiv:2405.20562 [pdf, other]

Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study

Authors: Haroon Miah, Dimitrios Kollias, Giacinto Luca Pedone, Drew Provan, Frederick Chen

Abstract: Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcom… ▽ More Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome. In this work we conduct a feasibility study to check if machine learning can be applied effectively for diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting. Various ML models, including Logistic Regression, Support Vector Machine, k-Nearest Neighbor, Decision Tree and Random Forest, were applied to data from the UK Adult ITP Registry and a general hematology clinic. Two different approaches were investigated: a demographic-unaware and a demographic-aware one. We conduct extensive experiments to evaluate the predictive performance of these models and approaches, as well as their bias. The results revealed that Decision Tree and Random Forest models were both superior and fair, achieving nearly perfect predictive and fairness scores, with platelet count identified as the most significant variable. Models not provided with demographic information performed better in terms of predictive accuracy but showed lower fairness score, illustrating a trade-off between predictive performance and fairness. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.06841 [pdf, other]

Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis

Authors: Guanyu Hu, Eleni Papadopoulou, Dimitrios Kollias, Paraskevi Tzouveli, Jie Wei, Xinyu Yang

Abstract: The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machin… ▽ More The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: https://github.com/dkollias/Fair-Consistent-Affect-Analysis △ Less

Submitted 16 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: accepted at IEEE FG 2024

arXiv:2405.06765 [pdf, other]

Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object Detection

Authors: Anastasios Arsenos, Vasileios Karampinis, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

Abstract: The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tra… ▽ More The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments. △ Less

Submitted 16 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06749 [pdf, other]

Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Authors: Vasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

Abstract: In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framewo… ▽ More In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset. △ Less

Submitted 16 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: accepted at ICUAS 2024

arXiv:2404.18952 [pdf, other]

CUE-Net: Violence Detection Video Analytics with Spatial Crop**, Enhanced UniformerV2 and Modified Efficient Additive Attention

Authors: Damith Chamalke Senadeera, Xiaoyun Yang, Dimitrios Kollias, Gregory Slabaugh

Abstract: In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Crop** with an enhanced version of the U… ▽ More In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Crop** with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: To be published in the proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:2403.07514 [pdf, other]

Uncertainty-guided Contrastive Learning for Single Source Domain Generalisation

Authors: Anastasios Arsenos, Dimitrios Kollias, Evangelos Petrongonas, Christos Skliros, Stefanos Kollias

Abstract: In the context of single domain generalisation, the objective is for models that have been exclusively trained on data from a single domain to demonstrate strong performance when confronted with various unfamiliar domains. In this paper, we introduce a novel model referred to as Contrastive Uncertainty Domain Generalisation Network (CUDGNet). The key idea is to augment the source capacity in both… ▽ More In the context of single domain generalisation, the objective is for models that have been exclusively trained on data from a single domain to demonstrate strong performance when confronted with various unfamiliar domains. In this paper, we introduce a novel model referred to as Contrastive Uncertainty Domain Generalisation Network (CUDGNet). The key idea is to augment the source capacity in both input and label spaces through the fictitious domain generator and jointly learn the domain invariant representation of each class through contrastive learning. Extensive experiments on two Single Source Domain Generalisation (SSDG) datasets demonstrate the effectiveness of our approach, which surpasses the state-of-the-art single-DG methods by up to $7.08\%$. Our method also provides efficient uncertainty estimation at inference time from a single forward pass through the generator subnetwork. △ Less

Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: accepted at IEEE ICASSP 2024

arXiv:2403.06242 [pdf, other]

COVID-19 Computer-aided Diagnosis through AI-assisted CT Imaging Analysis: Deploying a Medical AI System

Authors: Demetris Gerogiannis, Anastasios Arsenos, Dimitrios Kollias, Dimitris Nikitopoulos, Stefanos Kollias

Abstract: Computer-aided diagnosis (CAD) systems stand out as potent aids for physicians in identifying the novel Coronavirus Disease 2019 (COVID-19) through medical imaging modalities. In this paper, we showcase the integration and reliable and fast deployment of a state-of-the-art AI system designed to automatically analyze CT images, offering infection probability for the swift detection of COVID-19. The… ▽ More Computer-aided diagnosis (CAD) systems stand out as potent aids for physicians in identifying the novel Coronavirus Disease 2019 (COVID-19) through medical imaging modalities. In this paper, we showcase the integration and reliable and fast deployment of a state-of-the-art AI system designed to automatically analyze CT images, offering infection probability for the swift detection of COVID-19. The suggested system, comprising both classification and segmentation components, is anticipated to reduce physicians' detection time and enhance the overall efficiency of COVID-19 detection. We successfully surmounted various challenges, such as data discrepancy and anonymisation, testing the time-effectiveness of the model, and data security, enabling reliable and scalable deployment of the system on both cloud and edge environments. Additionally, our AI system assigns a probability of infection to each 3D CT scan and enhances explainability through anchor set similarity, facilitating timely confirmation and segregation of infected patients by physicians. △ Less

Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: accepted at IEEE ISBI 2024

arXiv:2403.02192 [pdf, other]

Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scans

Authors: Dimitrios Kollias, Anastasios Arsenos, Stefanos Kollias

Abstract: The paper presents the DEF-AI-MIA COV19D Competition, which is organized in the framework of the 'Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)' Workshop of the 2024 Computer Vision and Pattern Recognition (CVPR) Conference. The Competition is the 4th in the series, following the first three Competitions held in the framework of ICCV 2021, ECCV 2022 and… ▽ More The paper presents the DEF-AI-MIA COV19D Competition, which is organized in the framework of the 'Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)' Workshop of the 2024 Computer Vision and Pattern Recognition (CVPR) Conference. The Competition is the 4th in the series, following the first three Competitions held in the framework of ICCV 2021, ECCV 2022 and ICASSP 2023 International Conferences respectively. It includes two Challenges on: i) Covid-19 Detection and ii) Covid-19 Domain Adaptation. The Competition use data from COV19-CT-DB database, which is described in the paper and includes a large number of chest CT scan series. Each chest CT scan series consists of a sequence of 2-D CT slices, the number of which is between 50 and 700. Training, validation and test datasets have been extracted from COV19-CT-DB and provided to the participants in both Challenges. The paper presents the baseline models used in the Challenges and the performance which was obtained respectively. △ Less

Submitted 10 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19344 [pdf, other]

The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Alice Baird, Chris Gagne, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related bench… ▽ More This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related benchmarking tasks and comprises of five sub-challenges: i) Valence-Arousal Estimation (the target is to estimate two continuous affect dimensions, valence and arousal), ii) Expression Recognition (the target is to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), iii) Action Unit Detection (the target is to detect 12 action units), iv) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes), and v) Emotional Mimicry Intensity Estimation (the target is to estimate six continuous emotion dimensions). In the paper, we present these Challenges, describe their respective datasets and challenge protocols (we outline the evaluation metrics) and present the baseline systems as well as their obtained performance. More information for the Competition can be found in: https://affective-behavior-analysis-in-the-wild.github.io/6th. △ Less

Submitted 12 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2401.01219 [pdf, ps, other]

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Authors: Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou

Abstract: Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is proh… ▽ More Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlap** annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shop** item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model). △ Less

Submitted 3 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: accepted at AAAI 2024. arXiv admin note: text overlap with arXiv:2105.03790

arXiv:2310.03485 [pdf, other]

BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification

Authors: Dimitrios Kollias, Karanjot Vendal, Priyanka Gadhavi, Solomon Russom

Abstract: Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, le… ▽ More Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment. △ Less

Submitted 7 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2303.01498 [pdf, ps, other]

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alice Baird, Alan Cowen, Stefanos Zafeiriou

Abstract: The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically… ▽ More The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically analyzing affect. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The former database is an audiovisual one of around 600 videos of around 3M frames and is annotated with respect to:a) two continuous affect dimensions -valence (how positive/negative a person is) and arousal (how active/passive a person is)-; b) basic expressions (e.g. happiness, sadness, neutral state); and c) atomic facial muscle actions (i.e., action units). The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities. Thus the 5th ABAW Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Emotional Reaction Intensity Estimation. In this paper, we present these Challenges, along with their corpora, we outline the evaluation metrics, we present the baseline systems and illustrate their obtained performance. △ Less

Submitted 20 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.10659

arXiv:2303.00180 [pdf, other]

FaceRNET: a Facial Expression Intensity Estimation Network

Authors: Dimitrios Kollias, Andreas Psaroudakis, Anastasios Arsenos, Paraskevi Theofilou

Abstract: This paper presents our approach for Facial Expression Intensity Estimation from videos. It includes two components: i) a representation extractor network that extracts various emotion descriptors (valence-arousal, action units and basic expressions) from each videoframe; ii) a RNN that captures temporal information in the data, followed by a mask layer which enables handling varying input video l… ▽ More This paper presents our approach for Facial Expression Intensity Estimation from videos. It includes two components: i) a representation extractor network that extracts various emotion descriptors (valence-arousal, action units and basic expressions) from each videoframe; ii) a RNN that captures temporal information in the data, followed by a mask layer which enables handling varying input video lengths through dynamic routing. This approach has been tested on the Hume-Reaction dataset yielding excellent results. △ Less

Submitted 7 October, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

arXiv:2303.00175 [pdf, other]

A Deep Neural Architecture for Harmonizing 3-D Input Data Analysis and Decision Making in Medical Imaging

Authors: Dimitrios Kollias, Anastasios Arsenos, Stefanos Kollias

Abstract: Harmonizing the analysis of data, especially of 3-D image volumes, consisting of different number of slices and annotated per volume, is a significant problem in training and using deep neural networks in various applications, including medical imaging. Moreover, unifying the decision making of the networks over different input datasets is crucial for the generation of rich data-driven knowledge a… ▽ More Harmonizing the analysis of data, especially of 3-D image volumes, consisting of different number of slices and annotated per volume, is a significant problem in training and using deep neural networks in various applications, including medical imaging. Moreover, unifying the decision making of the networks over different input datasets is crucial for the generation of rich data-driven knowledge and for trusted usage in the applications. This paper presents a new deep neural architecture, named RACNet, which includes routing and feature alignment steps and effectively handles different input lengths and single annotations of the 3-D image inputs, whilst providing highly accurate decisions. In addition, through latent variable extraction from the trained RACNet, a set of anchors are generated providing further insight on the network's decision making. These can be used to enrich and unify data-driven knowledge extracted from different datasets. An extensive experimental study illustrates the above developments, focusing on COVID-19 diagnosis through analysis of 3-D chest CT scans from databases generated in different countries and medical centers. △ Less

Submitted 1 March, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

arXiv:2207.01138 [pdf, other]

ABAW: Learning from Synthetic Data & Multi-Task Learning Challenges

Authors: Dimitrios Kollias

Abstract: This paper describes the fourth Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with European Conference on Computer Vision (ECCV), 2022. The 4th ABAW Competition is a continuation of the Competitions held at IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. In the previous runs of this Competition, t… ▽ More This paper describes the fourth Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with European Conference on Computer Vision (ECCV), 2022. The 4th ABAW Competition is a continuation of the Competitions held at IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. In the previous runs of this Competition, the Challenges targeted Valence-Arousal Estimation, Expression Classification and Action Unit Detection. This year the Competition encompasses two different Challenges: i) a Multi-Task-Learning one in which the goal is to learn at the same time (i.e., in a multi-task learning setting) all the three above mentioned tasks; and ii) a Learning from Synthetic Data one in which the goal is to learn to recognise the basic expressions from artificially generated data and generalise to real data. The Aff-Wild2 database is a large scale in-the-wild database and the first one that contains annotations for valence and arousal, expressions and action units. This database is the basis for the above Challenges. In more detail: i) s-Aff-Wild2 -- a static version of Aff-Wild2 database -- has been constructed and utilized for the purposes of the Multi-Task-Learning Challenge; and ii) some specific frames-images from the Aff-Wild2 database have been used in an expression manipulation manner for creating the synthetic dataset, which is the basis for the Learning from Synthetic Data Challenge. In this paper, at first we present the two Challenges, along with the utilized corpora, then we outline the evaluation metrics and finally present the baseline systems per Challenge, as well as their derived results. More information regarding the Competition can be found in the competition's website: https://ibug.doc.ic.ac.uk/resources/eccv-2023-4th-abaw/. △ Less

Submitted 5 July, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.10659; text overlap with arXiv:1704.07863 by other authors

arXiv:2206.04732 [pdf, other]

AI-MIA: COVID-19 Detection & Severity Analysis through Medical Imaging

Authors: Dimitrios Kollias, Anastasios Arsenos, Stefanos Kollias

Abstract: This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four… ▽ More This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four Covid-19 severity conditions. We have split the database and the latter part of it in training, validation and test datasets. The former two datasets are used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. The baseline approach consists of a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database. △ Less

Submitted 13 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2106.07524

arXiv:2205.04442 [pdf, other]

MixAugment & Mixup: Augmentation Methods for Facial Expression Recognition

Authors: Andreas Psaroudakis, Dimitrios Kollias

Abstract: Automatic Facial Expression Recognition (FER) has attracted increasing attention in the last 20 years since facial expressions play a central role in human communication. Most FER methodologies utilize Deep Neural Networks (DNNs) that are powerful tools when it comes to data analysis. However, despite their power, these networks are prone to overfitting, as they often tend to memorize the training… ▽ More Automatic Facial Expression Recognition (FER) has attracted increasing attention in the last 20 years since facial expressions play a central role in human communication. Most FER methodologies utilize Deep Neural Networks (DNNs) that are powerful tools when it comes to data analysis. However, despite their power, these networks are prone to overfitting, as they often tend to memorize the training data. What is more, there are not currently a lot of in-the-wild (i.e. in unconstrained environment) large databases for FER. To alleviate this issue, a number of data augmentation techniques have been proposed. Data augmentation is a way to increase the diversity of available data by applying constrained transformations on the original data. One such technique, which has positively contributed to various classification tasks, is Mixup. According to this, a DNN is trained on convex combinations of pairs of examples and their corresponding labels. In this paper, we examine the effectiveness of Mixup for in-the-wild FER in which data have large variations in head poses, illumination conditions, backgrounds and contexts. We then propose a new data augmentation strategy which is based on Mixup, called MixAugment. According to this, the network is trained concurrently on a combination of virtual examples and real examples; all these examples contribute to the overall loss function. We conduct an extensive experimental study that proves the effectiveness of MixAugment over Mixup and various state-of-the-art methods. We further investigate the combination of dropout with Mixup and MixAugment, as well as the combination of other data augmentation techniques with MixAugment. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.10659 [pdf, ps, other]

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Multi-Task Learning Challenges

Authors: Dimitrios Kollias

Abstract: This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. This year the Competition… ▽ More This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. This year the Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Multi-Task-Learning. All the Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated in terms of valence-arousal, expressions and action units. In this paper, we present the four Challenges, with the utilized Competition corpora, we outline the evaluation metrics and present the baseline systems along with their obtained results. △ Less

Submitted 28 February, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

arXiv:2106.15318 [pdf, other]

Analysing Affective Behavior in the second ABAW2 Competition

Authors: Dimitrios Kollias, Irene Kotsia, Elnar Hajiyev, Stefanos Zafeiriou

Abstract: The Affective Behavior Analysis in-the-wild (ABAW2) 2021 Competition is the second -- following the first very successful ABAW Competition held in conjunction with IEEE FG 2020- Competition that aims at automatically analyzing affect. ABAW2 is split into three Challenges, each one addressing one of the three main behavior tasks of valence-arousal estimation, basic expression classification and act… ▽ More The Affective Behavior Analysis in-the-wild (ABAW2) 2021 Competition is the second -- following the first very successful ABAW Competition held in conjunction with IEEE FG 2020- Competition that aims at automatically analyzing affect. ABAW2 is split into three Challenges, each one addressing one of the three main behavior tasks of valence-arousal estimation, basic expression classification and action unit detection. All three Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated for all these three tasks. In this paper, we describe this Competition, to be held in conjunction with ICCV 2021. We present the three Challenges, with the utilized Competition corpora. We outline the evaluation metrics and present the baseline system with its results. More information regarding the Competition is provided in the Competition site: https://ibug.doc.ic.ac.uk/resources/iccv-2021-2nd-abaw. △ Less

Submitted 3 July, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2001.11409

arXiv:2106.07524 [pdf, other]

MIA-COV19D: COVID-19 Detection through 3-D Chest CT Image Analysis

Authors: Dimitrios Kollias, Anastasios Arsenos, Levon Soukissian, Stefanos Kollias

Abstract: Early and reliable COVID-19 diagnosis based on chest 3-D CT scans can assist medical specialists in vital circumstances. Deep learning methodologies constitute a main approach for chest CT scan analysis and disease prediction. However, large annotated databases are necessary for develo** deep learning models that are able to provide COVID-19 diagnosis across various medical environments in diffe… ▽ More Early and reliable COVID-19 diagnosis based on chest 3-D CT scans can assist medical specialists in vital circumstances. Deep learning methodologies constitute a main approach for chest CT scan analysis and disease prediction. However, large annotated databases are necessary for develo** deep learning models that are able to provide COVID-19 diagnosis across various medical environments in different countries. Due to privacy issues, publicly available COVID-19 CT datasets are highly difficult to obtain, which hinders the research and development of AI-enabled diagnosis methods of COVID-19 based on CT scans. In this paper we present the COV19-CT-DB database which is annotated for COVID-19, consisting of about 5,000 3-D CT scans, We have split the database in training, validation and test datasets. The former two datasets can be used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. We also present a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database. △ Less

Submitted 21 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2105.03790 [pdf, other]

Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study

Authors: Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou

Abstract: Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm, such as a DNN. MTL is based on the assumption that the tasks under consideration are related; therefore it exploits shared knowledge for improving performance on each individual task. Tasks are generally considered to be homogeneous, i.e., to refer to the same type of proble… ▽ More Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm, such as a DNN. MTL is based on the assumption that the tasks under consideration are related; therefore it exploits shared knowledge for improving performance on each individual task. Tasks are generally considered to be homogeneous, i.e., to refer to the same type of problem. Moreover, MTL is usually based on ground truth annotations with full, or partial overlap across tasks. In this work, we deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems. We explore task-relatedness as a means for co-training, in a weakly-supervised way, tasks that contain little, or even non-overlap** annotations. Task-relatedness is introduced in MTL, either explicitly through prior expert knowledge, or through data-driven studies. We propose a novel distribution matching approach, in which knowledge exchange is enabled between tasks, via matching of their predictions' distributions. Based on this approach, we build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks. We develop case studies for: i) continuous affect estimation, action unit detection, basic emotion recognition; ii) attribute detection, face identification. We illustrate that co-training via task relatedness alleviates negative transfer. Since FaceBehaviorNet learns features that encapsulate all aspects of facial behavior, we conduct zero-/few-shot learning to perform tasks beyond the ones that it has been trained for, such as compound emotion recognition. By conducting a very large experimental study, utilizing 10 databases, we illustrate that our approach outperforms, by large margins, the state-of-the-art in all tasks and in all databases, even in these which have not been used in its training. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:2103.15792, arXiv:1910.11111

arXiv:2103.15792 [pdf, other]

Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: Affect recognition based on subjects' facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been wide… ▽ More Affect recognition based on subjects' facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This paper exploits these advances and presents significant contributions for affect analysis and recognition in-the-wild. Affect analysis and recognition can be seen as a dual knowledge generation problem, involving: i) creation of new, large and rich in-the-wild databases and ii) design and training of novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2 and presents the design of two classes of deep neural networks trained with these databases. The first class refers to uni-task affect recognition, focusing on prediction of the valence and arousal dimensional variables. The second class refers to estimation of all main behavior tasks, i.e. valence-arousal prediction; categorical emotion classification in seven basic facial expressions; facial Action Unit detection. A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition over all existing in-the-wild databases. Large experimental studies illustrate the achieved performance improvement over the existing state-of-the-art in affect recognition. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2009.07044 [pdf, other]

Deep Transparent Prediction through Latent Representation Analysis

Authors: D. Kollias, N. Bouas, Y. Vlaxos, V. Brillakis, M. Seferis, I. Kollia, L. Sukissian, J. Wingate, S. Kollias

Abstract: The paper presents a novel deep learning approach, which extracts latent information from trained Deep Neural Networks (DNNs) and derives concise representations that are analyzed in an effective, unified way for prediction purposes. It is well known that DNNs are capable of analyzing complex data; however, they lack transparency in their decision making, in the sense that it is not straightforwar… ▽ More The paper presents a novel deep learning approach, which extracts latent information from trained Deep Neural Networks (DNNs) and derives concise representations that are analyzed in an effective, unified way for prediction purposes. It is well known that DNNs are capable of analyzing complex data; however, they lack transparency in their decision making, in the sense that it is not straightforward to justify their prediction, or to visualize the features on which the decision was based. Moreover, they generally require large amounts of data in order to learn and become able to adapt to different environments. This makes their use difficult in healthcare, where trust and personalization are key issues. Transparency combined with high prediction accuracy are the targeted goals of the proposed approach. It includes both supervised DNN training and unsupervised learning of latent variables extracted from the trained DNNs. Domain Adaptation from multiple sources is also presented as an extension, where the extracted latent variable representations are used to generate predictions in other, non-annotated, environments. Successful application is illustrated through a large experimental study in various fields: prediction of Parkinson's disease from MRI and DaTScans; prediction of COVID-19 and pneumonia from CT scans and X-rays; optical character verification in retail food packaging. △ Less

Submitted 20 September, 2020; v1 submitted 13 September, 2020; originally announced September 2020.

Comments: 16 pages, 8 figures, to be published at Foundations of Trustworthy AI integrating Learning, Optimisation and Reasoning (TAILOR) Workshop of European Conference on Artificial Intelligence (ECAI) 2020. arXiv admin note: substantial text overlap with arXiv:1911.10653

arXiv:2001.11409 [pdf, other]

Analysing Affective Behavior in the First ABAW 2020 Competition

Authors: Dimitrios Kollias, Attila Schulc, Elnar Hajiyev, Stefanos Zafeiriou

Abstract: The Affective Behavior Analysis in-the-wild (ABAW) 2020 Competition is the first Competition aiming at automatic analysis of the three main behavior tasks of valence-arousal estimation, basic expression recognition and action unit detection. It is split into three Challenges, each one addressing a respective behavior task. For the Challenges, we provide a common benchmark database, Aff-Wild2, whic… ▽ More The Affective Behavior Analysis in-the-wild (ABAW) 2020 Competition is the first Competition aiming at automatic analysis of the three main behavior tasks of valence-arousal estimation, basic expression recognition and action unit detection. It is split into three Challenges, each one addressing a respective behavior task. For the Challenges, we provide a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one annotated for all these three tasks. In this paper, we describe this Competition, to be held in conjunction with the IEEE Conference on Face and Gesture Recognition, May 2020, in Buenos Aires, Argentina. We present the three Challenges, with the utilized Competition corpora. We outline the evaluation metrics, present both the baseline system and the top-3 performing teams' methodologies per Challenge and finally present their obtained results. More information regarding the Competition, the leaderboard of each Challenge and details for accessing the utilized database, are provided in the Competition site: http://ibug.doc.ic.ac.uk/resources/fg-2020-competition-affective-behavior-analysis. △ Less

Submitted 15 April, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

arXiv:1910.11111 [pdf, other]

Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network

Authors: Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou

Abstract: Automatic facial behavior analysis has a long history of studies in the intersection of computer vision, physiology and psychology. However it is only recently, with the collection of large-scale datasets and powerful machine learning methods such as deep neural networks, that automatic facial behavior analysis started to thrive. Three of its iconic tasks are automatic recognition of basic express… ▽ More Automatic facial behavior analysis has a long history of studies in the intersection of computer vision, physiology and psychology. However it is only recently, with the collection of large-scale datasets and powerful machine learning methods such as deep neural networks, that automatic facial behavior analysis started to thrive. Three of its iconic tasks are automatic recognition of basic expressions (e.g. happy, sad, surprised), estimation of continuous emotions (e.g., valence and arousal), and detection of facial action units (activations of e.g. upper/inner eyebrows, nose wrinkles). Up until now these tasks have been mostly studied independently collecting a dataset for the task. We present the first and the largest study of all facial behaviour tasks learned jointly in a single multi-task, multi-domain and multi-label network, which we call FaceBehaviorNet. For this we utilize all publicly available datasets in the community (around 5M images) that study facial behaviour tasks in-the-wild. We demonstrate that training jointly an end-to-end network for all tasks has consistently better performance than training each of the single-task networks. Furthermore, we propose two simple strategies for coupling the tasks during training, co-annotation and distribution matching, and show the advantages of this approach. Finally we show that FaceBehaviorNet has learned features that encapsulate all aspects of facial behaviour, and can be successfully applied to perform tasks (compound emotion recognition) beyond the ones that it has been trained in a zero- and few-shot learning setting. △ Less

Submitted 28 May, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: filed as a patent

arXiv:1910.11090 [pdf, other]

Emotion Generation and Recognition: A StarGAN Approach

Authors: Aritra Banerjee, Dimitrios Kollias

Abstract: The main idea of this ISO is to use StarGAN (A type of GAN model) to perform training and testing on an emotion dataset resulting in a emotion recognition which can be generated by the valence arousal score of the 7 basic expressions. We have created an entirely new dataset consisting of 4K videos. This dataset consists of all the basic 7 types of emotions: Happy, Sad, Angry, Surprised, Fear, Disg… ▽ More The main idea of this ISO is to use StarGAN (A type of GAN model) to perform training and testing on an emotion dataset resulting in a emotion recognition which can be generated by the valence arousal score of the 7 basic expressions. We have created an entirely new dataset consisting of 4K videos. This dataset consists of all the basic 7 types of emotions: Happy, Sad, Angry, Surprised, Fear, Disgust, Neutral. We have performed face detection and alignment followed by annotating basic valence arousal values to the frames/images in the dataset depending on the emotions manually. Then the existing StarGAN model is trained on our created dataset after which some manual subjects were chosen to test the efficiency of the trained StarGAN model. △ Less

Submitted 12 October, 2019; originally announced October 2019.

arXiv:1910.05877 [pdf, other]

Interpretable Deep Neural Networks for Facial Expression and Dimensional Emotion Recognition in-the-wild

Authors: Valentin Richer, Dimitrios Kollias

Abstract: In this project, we created a database with two types of annotations used in the emotion recognition domain : Action Units and Valence Arousal to try to achieve better results than with only one model. The originality of the approach is also based on the type of architecture used to perform the prediction of the emotions : a categorical Generative Adversarial Network. This kind of dual network can… ▽ More In this project, we created a database with two types of annotations used in the emotion recognition domain : Action Units and Valence Arousal to try to achieve better results than with only one model. The originality of the approach is also based on the type of architecture used to perform the prediction of the emotions : a categorical Generative Adversarial Network. This kind of dual network can generate images based on the pictures from the new dataset thanks to its generative network and decide if an image is fake or real thanks to its discriminative network as well as help to predict the annotations for Action Units and Valence Arousal due to its categorical nature. GANs were trained on the Action Units model only, then the Valence Arousal model only and then on both the Action Units model and Valence Arousal model in order to test different parameters and understand their influence. The generative and discriminative aspects of the GANs have performed interesting results. △ Less

Submitted 13 December, 2019; v1 submitted 13 October, 2019; originally announced October 2019.

arXiv:1910.05784 [pdf, other]

Interpretable Deep Neural Networks for Dimensional and Categorical Emotion Recognition in-the-wild

Authors: Xia Yicheng, Dimitrios Kollias

Abstract: Emotions play an important role in people's life. Understanding and recognising is not only important for interpersonal communication, but also has promising applications in Human-Computer Interaction, automobile safety and medical research. This project focuses on extending the emotion recognition database, and training the CNN + RNN emotion recognition neural networks with emotion category repre… ▽ More Emotions play an important role in people's life. Understanding and recognising is not only important for interpersonal communication, but also has promising applications in Human-Computer Interaction, automobile safety and medical research. This project focuses on extending the emotion recognition database, and training the CNN + RNN emotion recognition neural networks with emotion category representation and valence \& arousal representation. The combined models are constructed by training the two representations simultaneously. The comparison and analysis between the three types of model are discussed. The inner-relationship between two emotion representations and the interpretability of the neural networks are investigated. The findings suggest that categorical emotion recognition performance can benefit from training with a combined model. And the map** of emotion category and valence \& arousal values can explain this phenomenon. △ Less

Submitted 13 December, 2019; v1 submitted 13 October, 2019; originally announced October 2019.

arXiv:1910.05774 [pdf, other]

Image Generation and Recognition (Emotions)

Authors: Hanne Carlsson, Dimitrios Kollias

Abstract: Generative Adversarial Networks (GANs) were proposed in 2014 by Goodfellow et al., and have since been extended into multiple computer vision applications. This report provides a thorough survey of recent GAN research, outlining the various architectures and applications, as well as methods for training GANs and dealing with latent space. This is followed by a discussion of potential areas for fut… ▽ More Generative Adversarial Networks (GANs) were proposed in 2014 by Goodfellow et al., and have since been extended into multiple computer vision applications. This report provides a thorough survey of recent GAN research, outlining the various architectures and applications, as well as methods for training GANs and dealing with latent space. This is followed by a discussion of potential areas for future GAN research, including: evaluating GANs, better understanding GANs, and techniques for training GANs. The second part of this report outlines the compilation of a dataset of images `in the wild' representing each of the 7 basic human emotions, and analyses experiments done when training a StarGAN on this dataset combined with the FER2013 dataset. △ Less

Submitted 13 December, 2019; v1 submitted 13 October, 2019; originally announced October 2019.

arXiv:1910.05376 [pdf, other]

AffWild Net and Aff-Wild Database

Authors: Alvertos Benroumpi, Dimitrios Kollias

Abstract: Emotions recognition is the task of recognizing people's emotions. Usually it is achieved by analyzing expression of peoples faces. There are two ways for representing emotions: The categorical approach and the dimensional approach by using valence and arousal values. Valence shows how negative or positive an emotion is and arousal shows how much it is activated. Recent deep learning models, that… ▽ More Emotions recognition is the task of recognizing people's emotions. Usually it is achieved by analyzing expression of peoples faces. There are two ways for representing emotions: The categorical approach and the dimensional approach by using valence and arousal values. Valence shows how negative or positive an emotion is and arousal shows how much it is activated. Recent deep learning models, that have to do with emotions recognition, are using the second approach, valence and arousal. Moreover, a more interesting concept, which is useful in real life is the "in the wild" emotions recognition. "In the wild" means that the images analyzed for the recognition task, come from from real life sources(online videos, online photos, etc.) and not from staged experiments. So, they introduce unpredictable situations in the images, that have to be modeled. The purpose of this project is to study the previous work that was done for the "in the wild" emotions recognition concept, design a new dataset which has as a standard the "Aff-wild" database, implement new deep learning models and evaluate the results. First, already existing databases and deep learning models are presented. Then, inspired by them a new database is created which includes 507.208 frames in total from 106 videos, which were gathered from online sources. Then, the data are tested in a CNN model based on CNN-M architecture, in order to be sure about their usability. Next, the main model of this project is implemented. That is a Regression GAN which can execute unsupervised and supervised learning at the same time. More specifically, it keeps the main functionality of GANs, which is to produce fake images that look as good as the real ones, while it can also predict valence and arousal values for both real and fake images. Finally, the database created earlier is applied to this model and the results are presented and evaluated. △ Less

Submitted 13 December, 2019; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1910.05318 [pdf, other]

Aff-Wild Database and AffWildNet

Authors: Mengyao Liu, Dimitrios Kollias

Abstract: In the context of HCI, building an automatic system to recognize affect of human facial expression in real-world condition is very crucial to make machine interact naturallisticaly with a man. However, existing facial emotion databases usually contain expression in the limited scenario under well-controlled condition. Aff-Wild is currently the largest database consisting of spontaneous facial expr… ▽ More In the context of HCI, building an automatic system to recognize affect of human facial expression in real-world condition is very crucial to make machine interact naturallisticaly with a man. However, existing facial emotion databases usually contain expression in the limited scenario under well-controlled condition. Aff-Wild is currently the largest database consisting of spontaneous facial expression in the wild annotated with valence and arousal. The first contribution of this project is the completion of extending Aff-Wild database which is fulfilled by collecting videos from YouTube on which the videos have spontaneous facial expressions in the wild, annotating videos with valence and arousal ranging in [-1,1], detecting faces in frames using FFLD2 detector and partitioning the whole data set into train, validate and test set, with 527056, 94223 and 135145 frames. The diversity is guaranteed regarding age, ethnicity and values of valence and arousal. The ratio of male to female is close to 1. Regarding the techniques used to build the automatic system, deep learning is outstanding since almost all winning methods in emotion challenges adopt DNN techniques. The second contribution of this project is that an end-to-end DNN is constructed to have joint CNN and RNN block and gives the estimation on valence and arousal for each frame in sequential data. VGGFace, ResNet, DenseNet with the corresponding pre-trained model for CNN block and LSTM, GRU, IndRNN, Attention mechanism for RNN block are experimented aiming to find the best combination. Fine tuning and transfer learning techniques are also tried out. By comparing the CCC evaluation value on test data, the best model is found to be pre-trained VGGFace connected with 2 layers GRU with attention mechanism. The models test performance is 0.555 CCC for valence with sequence length 80 and 0.499 CCC for arousal with sequence length 70. △ Less

Submitted 13 December, 2019; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1910.04855 [pdf, other]

Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task. Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a smal… ▽ More Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task. Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a small part is manually annotated, iv) they contain a small number of subjects, or v) they are not annotated for all main behavior tasks (valence-arousal estimation, action unit detection and basic expression classification). To address these, we substantially extend the largest available in-the-wild database (Aff-Wild) to study continuous emotions such as valence and arousal. Furthermore, we annotate parts of the database with basic expressions and action units. As a consequence, for the first time, this allows the joint study of all three types of behavior states. We call this database Aff-Wild2. We conduct extensive experiments with CNN and CNN-RNN architectures that use visual and audio modalities; these networks are trained on Aff-Wild2 and their performance is then evaluated on 10 publicly available emotion databases. We show that the networks achieve state-of-the-art performance for the emotion recognition tasks. Additionally, we adapt the ArcFace loss function in the emotion recognition context and use it for training two new networks on Aff-Wild2 and then re-train them in a variety of diverse expression recognition databases. The networks are shown to improve the existing state-of-the-art. The database, emotion recognition models and source code are available at http://ibug.doc.ic.ac.uk/resources/aff-wild2. △ Less

Submitted 25 September, 2019; originally announced October 2019.

Comments: oral presentation in BMVC 2019

arXiv:1910.01417 [pdf, other]

Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN componen… ▽ More This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities. Some of our developments have been submitted to the OMG-Emotion Challenge, ranking second among the technologies which used only visual information for valence estimation; ranking third overall. Through extensive experimentation, we further show that arousal estimation is greatly improved when low-level features are combined with high-level ones. △ Less

Submitted 10 April, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1811.08004 [pdf, other]

Photorealistic Facial Synthesis in the Dimensional Affect Space

Authors: Dimitrios Kollias, Shiyang Cheng, Maja Pantic, Stefanos Zafeiriou

Abstract: This paper presents a novel approach for synthesizing facial affect, which is based on our annotating 600,000 frames of the 4DFAB database in terms of valence and arousal. The input of this approach is a pair of these emotional state descriptors and a neutral 2D image of a person to whom the corresponding affect will be synthesized. Given this target pair, a set of 3D facial meshes is selected, wh… ▽ More This paper presents a novel approach for synthesizing facial affect, which is based on our annotating 600,000 frames of the 4DFAB database in terms of valence and arousal. The input of this approach is a pair of these emotional state descriptors and a neutral 2D image of a person to whom the corresponding affect will be synthesized. Given this target pair, a set of 3D facial meshes is selected, which is used to build a blendshape model and generate the new facial affect. To synthesize the affect on the 2D neutral image, 3DMM fitting is performed and the reconstructed face is deformed to generate the target facial expressions. Last, the new face is rendered into the original image. Both qualitative and quantitative experimental studies illustrate the generation of realistic images, when the neutral image is sampled from a variety of well known databases, such as the Aff-Wild, AFEW, Multi-PIE, AFEW-VA, BU-3DFE, Bosphorus. △ Less

Submitted 10 November, 2018; originally announced November 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1811.05027

arXiv:1811.07771 [pdf, other]

A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approa… ▽ More Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database. △ Less

Submitted 13 December, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

arXiv:1811.07770 [pdf, other]

Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described… ▽ More Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion), can be used for this purpose. Recent progress in the emotion recognition domain has been achieved through the development of deep neural architectures and the availability of very large training databases. To this end, Aff-Wild has been the first large-scale "in-the-wild" database, containing around 1,200,000 frames. In this paper, we build upon this database, extending it with 260 more subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the additional data, Aff-Wild2. The videos are downloaded from Youtube and have large variations in pose, age, illumination conditions, ethnicity and profession. Both database-specific as well as cross-database experiments are performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA database. The developed deep neural architectures are based on the joint training of state-of-the-art convolutional and recurrent neural networks with attention mechanism; thus exploiting both the invariant properties of convolutional features, while modeling temporal dynamics that arise in human behaviour via the recurrent layers. The obtained results show premise for utilization of the extended Aff-Wild, as well as of the developed deep neural architectures for visual analysis of human behaviour in terms of continuous emotion dimensions. △ Less

Submitted 13 December, 2019; v1 submitted 10 November, 2018; originally announced November 2018.

arXiv:1811.05027 [pdf, other]

Deep Neural Network Augmentation: Generating Faces for Affect Analysis

Authors: Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, Stefanos Zafeiriou

Abstract: This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs: i) a neutral 2D image of a person; ii) a basic facial… ▽ More This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs: i) a neutral 2D image of a person; ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA Space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, $600,000$ frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from thirteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with Generative Adversarial Networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training Deep Neural Networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases. △ Less

Submitted 16 July, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1809.04359 [pdf, other]

Training Deep Neural Networks with Different Datasets In-the-wild: The Emotion Recognition Paradigm

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance… ▽ More A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance when applied to the other data sets, without forgetting the learned knowledge from the original data set. Facial expression and emotion recognition in-the-wild is the test bed application that is used to demonstrate the improved performance achieved using the proposed approach. In this framework, we provide an experimental study on categorical emotion recognition using datasets from a very recent related emotion recognition challenge. △ Less

Submitted 12 September, 2018; originally announced September 2018.

arXiv:1805.01452 [pdf, other]

A Multi-component CNN-RNN Approach for Dimensional Emotion Recognition in-the-wild

Authors: Dimitrios Kollias, Stefanos Zafeiriou

Abstract: This paper presents our approach to the One-Minute Gradual-Emotion Recognition (OMG-Emotion) Challenge, focusing on dimensional emotion recognition through visual analysis of the provided emotion videos. The approach is based on a Convolutional and Recurrent (CNN-RNN) deep neural architecture we have developed for the relevant large AffWild Emotion Database. We extended and adapted this architectu… ▽ More This paper presents our approach to the One-Minute Gradual-Emotion Recognition (OMG-Emotion) Challenge, focusing on dimensional emotion recognition through visual analysis of the provided emotion videos. The approach is based on a Convolutional and Recurrent (CNN-RNN) deep neural architecture we have developed for the relevant large AffWild Emotion Database. We extended and adapted this architecture, by letting a combination of multiple features generated in the CNN component be explored by RNN subnets. Our target has been to obtain best performance on the OMG-Emotion visual validation data set, while learning the respective visual training data set. Extended experimentation has led to best architectures for the estimation of the values of the valence and arousal emotion dimensions over these data sets. △ Less

Submitted 13 December, 2019; v1 submitted 3 May, 2018; originally announced May 2018.

arXiv:1804.10938 [pdf, other]

doi 10.1007/s11263-019-01158-4

Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A. Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, Stefanos Zafeiriou

Abstract: Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the act… ▽ More Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge. △ Less

Submitted 1 February, 2019; v1 submitted 29 April, 2018; originally announced April 2018.

arXiv:1608.00668 [pdf, ps, other]

doi 10.1007/s11263-017-1034-6

Global Vertices and the Noising Paradox

Authors: Konstantinos A. Raftopoulos, Stefanos D. Kollias, Marin Ferecatu

Abstract: A theoretical and experimental analysis related to the identification of vertices of unknown shapes is presented. Shapes are seen as real functions of their closed boundary. Unlike traditional approaches, which see curvature as the rate of change of the tangent to the curve, an alternative global perspective of curvature is examined providing insight into the process of noise-enabled vertex locali… ▽ More A theoretical and experimental analysis related to the identification of vertices of unknown shapes is presented. Shapes are seen as real functions of their closed boundary. Unlike traditional approaches, which see curvature as the rate of change of the tangent to the curve, an alternative global perspective of curvature is examined providing insight into the process of noise-enabled vertex localization. The analysis leads to a paradox, that certain vertices can be localized better in the presence of noise. The concept of noising is thus considered and a relevant global method for localizing "Global Vertices" is investigated. Theoretical analysis reveals that induced noise can help localizing certain vertices if combined with global descriptors. Experiments with noise and a comparison to localized methods validate the theoretical results. △ Less

Submitted 1 August, 2016; originally announced August 2016.

Comments: 19 pages, 11 figures

Journal ref: Int J Comput Vis (2017)

arXiv:1607.08362 [pdf, ps, other]

doi 10.1007/s11263-017-1034-6

Incremental Noising and its Fractal Behavior

Authors: Konstantinos A. Raftopoulos, Marin Ferecatu, Dionyssios D. Sourlas, Stefanos D. Kollias

Abstract: This manuscript is about further elucidating the concept of noising. The concept of noising first appeared in \cite{CVPR14}, in the context of curvature estimation and vertex localization on planar shapes. There are indications that noising can play for global methods the role smoothing plays for local methods in this task. This manuscript is about investigating this claim by introducing increment… ▽ More This manuscript is about further elucidating the concept of noising. The concept of noising first appeared in \cite{CVPR14}, in the context of curvature estimation and vertex localization on planar shapes. There are indications that noising can play for global methods the role smoothing plays for local methods in this task. This manuscript is about investigating this claim by introducing incremental noising, in a recursive deterministic manner, analogous to how smoothing is extended to progressive smoothing in similar tasks. As investigating the properties and behavior of incremental noising is the purpose of this manuscript, a surprising connection between incremental noising and progressive smoothing is revealed by the experiments. To explain this phenomenon, the fractal and the space filling properties of the two methods respectively, are considered in a unifying context. △ Less

Submitted 1 August, 2016; v1 submitted 28 July, 2016; originally announced July 2016.

Comments: 10 pages, 5 figures

Journal ref: Int J Comput Vis (2017)

Showing 1–42 of 42 results for author: Kollias, D