-
VideoMambaPro: A Leap Forward for Mamba in Video Understanding
Authors:
Hui Lu,
Albert Ali Salah,
Ronald Poppe
Abstract:
Video understanding requires the extraction of rich spatio-temporal representations, which transformer models achieve through self-attention. Unfortunately, self-attention poses a computational burden. In NLP, Mamba has surfaced as an efficient alternative for transformers. However, Mamba's successes do not trivially extend to computer vision tasks, including those in video analysis. In this paper…
▽ More
Video understanding requires the extraction of rich spatio-temporal representations, which transformer models achieve through self-attention. Unfortunately, self-attention poses a computational burden. In NLP, Mamba has surfaced as an efficient alternative for transformers. However, Mamba's successes do not trivially extend to computer vision tasks, including those in video analysis. In this paper, we theoretically analyze the differences between self-attention and Mamba. We identify two limitations in Mamba's token processing: historical decay and element contradiction. We propose VideoMambaPro (VMP) that solves the identified limitations by adding masked backward computation and elemental residual connections to a VideoMamba backbone. VideoMambaPro shows state-of-the-art video action recognition performance compared to transformer models, and surpasses VideoMamba by clear margins: 7.9% and 8.1% top-1 on Kinetics-400 and Something-Something V2, respectively. Our VideoMambaPro-M model achieves 91.9% top-1 on Kinetics-400, only 0.2% below InternVideo2-6B but with only 1.2% of its parameters. The combination of high performance and efficiency makes VideoMambaPro an interesting alternative for transformer models.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Combining Twitter and Mobile Phone Data to Observe Border-Rush: The Turkish-European Border Opening
Authors:
Carlos Arcila Calderón,
Bilgeçağ Aydoğdu,
Tuba Bircan,
Bünyamin Gündüz,
Onur Önes,
Albert Ali Salah,
Alina Sîrbu
Abstract:
Following Turkey's 2020 decision to revoke border controls, many individuals journeyed towards the Greek, Bulgarian, and Turkish borders. However, the lack of verifiable statistics on irregular migration and discrepancies between media reports and actual migration patterns require further exploration. The objective of this study is to bridge this knowledge gap by harnessing novel data sources, spe…
▽ More
Following Turkey's 2020 decision to revoke border controls, many individuals journeyed towards the Greek, Bulgarian, and Turkish borders. However, the lack of verifiable statistics on irregular migration and discrepancies between media reports and actual migration patterns require further exploration. The objective of this study is to bridge this knowledge gap by harnessing novel data sources, specifically mobile phone and Twitter data, to construct estimators of cross-border mobility and to cultivate a qualitative comprehension of the unfolding events. By employing a migration diplomacy framework, we analyse emergent mobility patterns at the border. Our findings demonstrate the potential of mobile phone data for quantitative metrics and Twitter data for qualitative understanding. We underscore the ethical implications of leveraging Big Data, particularly considering the vulnerability of the population under study. This underscores the imperative for exhaustive research into the socio-political facets of human mobility, with the aim of discerning the potentialities, limitations, and risks inherent in these data sources and their integration. This scholarly endeavour contributes to a more nuanced understanding of migration dynamics and paves the way for the formulation of regulations that preclude misuse and oppressive surveillance, thereby ensuring a more accurate representation of migration realities.
△ Less
Submitted 22 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Authors:
Hui Lu,
Hu Jian,
Ronald Poppe,
Albert Ali Salah
Abstract:
Owing to their ability to extract relevant spatio-temporal video embeddings, Vision Transformers (ViTs) are currently the best performing models in video action understanding. However, their generalization over domains or datasets is somewhat limited. In contrast, Visual Language Models (VLMs) have demonstrated exceptional generalization performance, but are currently unable to process videos. Con…
▽ More
Owing to their ability to extract relevant spatio-temporal video embeddings, Vision Transformers (ViTs) are currently the best performing models in video action understanding. However, their generalization over domains or datasets is somewhat limited. In contrast, Visual Language Models (VLMs) have demonstrated exceptional generalization performance, but are currently unable to process videos. Consequently, they cannot extract spatio-temporal patterns that are crucial for action understanding. In this paper, we propose the Four-tiered Prompts (FTP) framework that takes advantage of the complementary strengths of ViTs and VLMs. We retain ViTs' strong spatio-temporal representation ability but improve the visual encodings to be more comprehensive and general by aligning them with VLM outputs. The FTP framework adds four feature processors that focus on specific aspects of human action in videos: action category, action components, action description, and context information. The VLMs are only employed during training, and inference incurs a minimal computation cost. Our approach consistently yields state-of-the-art performance. For instance, we achieve remarkable top-1 accuracy of 93.8% on Kinetics-400 and 83.4% on Something-Something V2, surpassing VideoMAEv2 by 2.8% and 2.6%, respectively.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions
Authors:
Hui Lu,
Albert Ali Salah,
Ronald Poppe
Abstract:
A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuou…
▽ More
A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuous visual tokens. In addition, for a query token, self-attention is learned along the trajectory. As such, our network can also focus on fine-grained spatio-temporal patterns, such as finger movements, of a specific region in motion. TCNet's correlation module uses a novel dynamic attention mechanism that filters out irrelevant frame regions. Additionally, it assigns dynamic key-value tokens from correlated regions to each query. Both innovations significantly reduce the computation cost and memory. We perform experiments on four large-scale datasets: PHOENIX14, PHOENIX14-T, CSL, and CSL-Daily, respectively. Our results demonstrate that TCNet consistently achieves state-of-the-art performance. For example, we improve over the previous state-of-the-art by 1.5% and 1.0% word error rate on PHOENIX14 and PHOENIX14-T, respectively.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Wrist movement classification for adaptive mobile phone based rehabilitation of children with motor skill impairments
Authors:
Kayleigh Schoorl,
Tamara Pinos Cisneros,
Albert Ali Salah,
Ben Schouten
Abstract:
Rehabilitation exercises performed by children with cerebral palsy are tedious and repetitive. To make them more engaging, we propose to use an exergame approach, where an adaptive application can help the child remain stimulated and interested during exercises. In this paper, we describe how the mobile phone sensors can be used to classify wrist movements of the user during the rehabilitation exe…
▽ More
Rehabilitation exercises performed by children with cerebral palsy are tedious and repetitive. To make them more engaging, we propose to use an exergame approach, where an adaptive application can help the child remain stimulated and interested during exercises. In this paper, we describe how the mobile phone sensors can be used to classify wrist movements of the user during the rehabilitation exercises to detect if the user is performing the correct exercise and illustrate the use of our approach in an actual mobile phone application. We also show how an adaptive difficulty system was added to the application to allow the system to adjust to the user. We present experimental results from a pilot with healthy subjects that were constrained to simulate restricted wrist movements, as well as from tests with a target group of children with cerebral palsy. Our results show that wrist movement classification is successfully achieved and results in improved interactions.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Compensation Sampling for Improved Convergence in Diffusion Models
Authors:
Hui Lu,
Albert ali Salah,
Ronald Poppe
Abstract:
Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address th…
▽ More
Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address this issue, we propose compensation sampling to guide the generation towards the target domain. We introduce a compensation term, implemented as a U-Net, which adds negligible computation overhead during training and, optionally, inference. Our approach is flexible and we demonstrate its application in unconditional generation, face inpainting, and face de-occlusion using benchmark datasets CIFAR-10, CelebA, CelebA-HQ, FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results in terms of image quality, while accelerating the denoising process to converge during training by up to an order of magnitude.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Elucidating the Exposure Bias in Diffusion Models
Authors:
Mang Ning,
Mingxiao Li,
Jianlin Su,
Albert Ali Salah,
Itir Onal Ertugrul
Abstract:
Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the predictio…
▽ More
Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.
△ Less
Submitted 10 April, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
A Survey on Computer Vision based Human Analysis in the COVID-19 Era
Authors:
Fevziye Irem Eyiokur,
Alperen Kantarcı,
Mustafa Ekrem Erakın,
Naser Damer,
Ferda Ofli,
Muhammad Imran,
Janez Križaj,
Albert Ali Salah,
Alexander Waibel,
Vitomir Štruc,
Hazım Kemal Ekenel
Abstract:
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also trig…
▽ More
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Fully-attentive and interpretable: vision and video vision transformers for pain detection
Authors:
Giacomo Fiorentini,
Itir Onal Ertugrul,
Albert Ali Salah
Abstract:
Pain is a serious and costly issue globally, but to be treated, it must first be detected. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain detection. In this paper, we propose the first fully-attentive automated pain detection pipeline that achieves state-of-the-art performance on binary pain detection from facial expressions. Th…
▽ More
Pain is a serious and costly issue globally, but to be treated, it must first be detected. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain detection. In this paper, we propose the first fully-attentive automated pain detection pipeline that achieves state-of-the-art performance on binary pain detection from facial expressions. The model is trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining 3 noteworthy models. We analyse the attention maps of one of our models, finding reasonable interpretations for its predictions. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer, with no success. Our presented models, ViT-1 (F1 score 0.55 +- 0.15), ViViT-1 (F1 score 0.55 +- 0.13), and ViViT-2 (F1 score 0.49 +- 0.04), all outperform earlier works, showing the potential of vision transformers for pain detection. Code is available at https://github.com/IPDTFE/ViT-McMaster
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Automatic Analysis of Human Body Representations in Western Art
Authors:
Shu Zhao,
Almıla Akdağ Salah,
Albert Ali Salah
Abstract:
The way the human body is depicted in classical and modern paintings is relevant for art historical analyses. Each artist has certain themes and concerns, resulting in different poses being used more heavily than others. In this paper, we propose a computer vision pipeline to analyse human pose and representations in paintings, which can be used for specific artists or periods. Specifically, we co…
▽ More
The way the human body is depicted in classical and modern paintings is relevant for art historical analyses. Each artist has certain themes and concerns, resulting in different poses being used more heavily than others. In this paper, we propose a computer vision pipeline to analyse human pose and representations in paintings, which can be used for specific artists or periods. Specifically, we combine two pose estimation approaches (OpenPose and DensePose, respectively) and introduce methods to deal with occlusion and perspective issues. For normalisation, we map the detected poses and contours to Leonardo da Vinci's Vitruvian Man, the classical depiction of body proportions. We propose a visualisation approach for illustrating the articulation of joints in a set of paintings. Combined with a hierarchical clustering of poses, our approach reveals common and uncommon poses used by artists. Our approach improves over purely skeleton based analyses of human body in paintings.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Video-based estimation of pain indicators in dogs
Authors:
Hongyi Zhu,
Yasemin Salgırlı,
Pınar Can,
Durmuş Atılgan,
Albert Ali Salah
Abstract:
Dog owners are typically capable of recognizing behavioral cues that reveal subjective states of their dogs, such as pain. But automatic recognition of the pain state is very challenging. This paper proposes a novel video-based, two-stream deep neural network approach for this problem. We extract and preprocess body keypoints, and compute features from both keypoints and the RGB representation ove…
▽ More
Dog owners are typically capable of recognizing behavioral cues that reveal subjective states of their dogs, such as pain. But automatic recognition of the pain state is very challenging. This paper proposes a novel video-based, two-stream deep neural network approach for this problem. We extract and preprocess body keypoints, and compute features from both keypoints and the RGB representation over the video. We propose an approach to deal with self-occlusions and missing keypoints. We also present a unique video-based dog behavior dataset, collected by veterinary professionals, and annotated for presence of pain, and report good classification results with the proposed approach. This study is one of the first works on machine learning based estimation of dog pain state.
△ Less
Submitted 26 November, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
State of the Art of Audio- and Video-Based Solutions for AAL
Authors:
Slavisa Aleksic,
Michael Atanasov,
Jean Calleja Agius,
Kenneth Camilleri,
Anto Cartolovni,
Pau Climent-Peerez,
Sara Colantonio,
Stefania Cristina,
Vladimir Despotovic,
Hazim Kemal Ekenel,
Ekrem Erakin,
Francisco Florez-Revuelta,
Danila Germanese,
Nicole Grech,
Steinunn Gróa Sigurðardóttir,
Murat Emirzeoglu,
Ivo Iliev,
Mladjan Jovanovic,
Martin Kampel,
William Kearns,
Andrzej Klimczuk,
Lambros Lambrinos,
Jennifer Lumetzberger,
Wiktor Mucha,
Sophie Noiret
, et al. (14 additional authors not shown)
Abstract:
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and…
▽ More
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted.
△ Less
Submitted 5 July, 2022; v1 submitted 26 June, 2022;
originally announced July 2022.
-
Going Deeper than Tracking: a Survey of Computer-Vision Based Recognition of Animal Pain and Affective States
Authors:
Sofia Broomé,
Marcelo Feighelstein,
Anna Zamansky,
Gabriel Carreira Lencioni,
Pia Haubro Andersen,
Francisca Pessanha,
Marwa Mahmoud,
Hedvig Kjellström,
Albert Ali Salah
Abstract:
Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go 'deeper' than tracking, and address automated recognition of animals' internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a com…
▽ More
Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go 'deeper' than tracking, and address automated recognition of animals' internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a comprehensive survey of computer vision-based research on recognition of affective states and pain in animals, addressing both facial and bodily behavior analysis. We summarize the efforts that have been presented so far within this topic -- classifying them across different dimensions, highlight challenges and research gaps, and provide best practice recommendations for advancing the field, and some future directions for research.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Federated learning for violence incident prediction in a simulated cross-institutional psychiatric setting
Authors:
Thomas Borger,
Pablo Mosteiro,
Heysem Kaya,
Emil Rijcken,
Albert Ali Salah,
Floortje Scheepers,
Marco Spruit
Abstract:
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient's likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availability is limited as hospitals typically do not sha…
▽ More
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient's likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availability is limited as hospitals typically do not share their data for privacy preservation. Federated Learning (FL) can overcome the problem of data limitation by training models in a decentralised manner, without disclosing data between collaborators. However, although several FL approaches exist, none of these train Natural Language Processing models on clinical notes. In this work, we investigate the application of Federated Learning to clinical Natural Language Processing, applied to the task of Violence Risk Assessment by simulating a cross-institutional psychiatric setting. We train and compare four models: two local models, a federated model and a data-centralised model. Our results indicate that the federated model outperforms the local models and has similar performance as the data-centralised model. These findings suggest that Federated Learning can be used successfully in a cross-institutional setting and is a step towards new applications of Federated Learning based on clinical notes
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Speech Analysis for Automatic Mania Assessment in Bipolar Disorder
Authors:
Pınar Baki,
Heysem Kaya,
Elvan Çiftçi,
Hüseyin Güleç,
Albert Ali Salah
Abstract:
Bipolar disorder is a mental disorder that causes periods of manic and depressive episodes. In this work, we classify recordings from Bipolar Disorder corpus that contain 7 different tasks, into hypomania, mania, and remission classes using only speech features. We perform our experiments on splitted tasks from the interviews. Best results achieved on the model trained with 6th and 7th tasks toget…
▽ More
Bipolar disorder is a mental disorder that causes periods of manic and depressive episodes. In this work, we classify recordings from Bipolar Disorder corpus that contain 7 different tasks, into hypomania, mania, and remission classes using only speech features. We perform our experiments on splitted tasks from the interviews. Best results achieved on the model trained with 6th and 7th tasks together gives 0.53 UAR (unweighted average recall) result which is higher than the baseline results of the corpus.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms
Authors:
Norman Poh,
Thirimachos Bourlai,
Josef Kittler,
Lorene Allano,
Fernando Alonso-Fernandez,
Onkar Ambekar,
John Baker,
Bernadette Dorizzi,
Omolara Fatukasi,
Julian Fierrez,
Harald Ganster,
Javier Ortega-Garcia,
Donald Maurer,
Albert Ali Salah,
Tobias Scheidat,
Claus Vielhauer
Abstract:
Automatically verifying the identity of a person by means of biometrics is an important application in day-to-day activities such as accessing banking services and security control in airports. To increase the system reliability, several biometric devices are often used. Such a combined system is known as a multimodal biometric system. This paper reports a benchmarking study carried out within the…
▽ More
Automatically verifying the identity of a person by means of biometrics is an important application in day-to-day activities such as accessing banking services and security control in airports. To increase the system reliability, several biometric devices are often used. Such a combined system is known as a multimodal biometric system. This paper reports a benchmarking study carried out within the framework of the BioSecure DS2 (Access Control) evaluation campaign organized by the University of Surrey, involving face, fingerprint, and iris biometrics for person authentication, targeting the application of physical access control in a medium-size establishment with some 500 persons. While multimodal biometrics is a well-investigated subject, there exists no benchmark for a fusion algorithm comparison. Working towards this goal, we designed two sets of experiments: quality-dependent and cost-sensitive evaluation. The quality-dependent evaluation aims at assessing how well fusion algorithms can perform under changing quality of raw images principally due to change of devices. The cost-sensitive evaluation, on the other hand, investigates how well a fusion algorithm can perform given restricted computation and in the presence of software and hardware failures, resulting in errors such as failure-to-acquire and failure-to-match. Since multiple capturing devices are available, a fusion algorithm should be able to handle this nonideal but nevertheless realistic scenario. In both evaluations, each fusion algorithm is provided with scores from each biometric comparison subsystem as well as the quality measures of both template and query data. The response to the call of the campaign proved very encouraging, with the submission of 22 fusion systems. To the best of our knowledge, this is the first attempt to benchmark quality-based multimodal fusion algorithms.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition
Authors:
Gizem Soğancıoğlu,
Oxana Verkholyak,
Heysem Kaya,
Dmitrii Fedotov,
Tobias Cadèe,
Albert Ali Salah,
Alexey Karpov
Abstract:
Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComPar…
▽ More
Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set performances of various models, we also propose alternative training and decision fusion strategies to better estimate and improve the generalization performance.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Mobile phone data and COVID-19: Missing an opportunity?
Authors:
Nuria Oliver,
Emmanuel Letouzé,
Harald Sterly,
Sébastien Delataille,
Marco De Nadai,
Bruno Lepri,
Renaud Lambiotte,
Richard Benjamins,
Ciro Cattuto,
Vittoria Colizza,
Nicolas de Cordes,
Samuel P. Fraiberger,
Till Koebe,
Sune Lehmann,
Juan Murillo,
Alex Pentland,
Phuong N Pham,
Frédéric Pivetta,
Albert Ali Salah,
Jari Saramäki,
Samuel V. Scarpino,
Michele Tizzoni,
Stefaan Verhulst,
Patrick Vinck
Abstract:
This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of…
▽ More
This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Data for Refugees: The D4R Challenge on Mobility of Syrian Refugees in Turkey
Authors:
Albert Ali Salah,
Alex Pentland,
Bruno Lepri,
Emmanuel Letouze,
Patrick Vinck,
Yves-Alexandre de Montjoye,
Xiaowen Dong,
Ozge Dagdelen
Abstract:
The Data for Refugees (D4R) Challenge is a non-profit challenge initiated to improve the conditions of the Syrian refugees in Turkey by providing a special database to scientific community for enabling research on urgent problems concerning refugees, including health, education, unemployment, safety, and social integration. The collected database is based on anonymised mobile Call Detail Record (C…
▽ More
The Data for Refugees (D4R) Challenge is a non-profit challenge initiated to improve the conditions of the Syrian refugees in Turkey by providing a special database to scientific community for enabling research on urgent problems concerning refugees, including health, education, unemployment, safety, and social integration. The collected database is based on anonymised mobile Call Detail Record (CDR) of phone calls and SMS messages from one million Turk Telekom customers. It indicates broad activity and mobility patterns of refugees and citizens in Turkey for one year. The data collection period is from 1 January 2017 to 31 December 2017. The project is initiated by Turk Telekom, in partnership with the Turkish Academic and Research Council (TUBITAK) and Bogazici University, and in collaboration with several academic and non-governmental organizations, including UNHCR Turkey, UNICEF, and International Organization for Migration.
△ Less
Submitted 14 October, 2018; v1 submitted 2 July, 2018;
originally announced July 2018.
-
Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos
Authors:
Hugo Jair Escalante,
Heysem Kaya,
Albert Ali Salah,
Sergio Escalera,
Yagmur Gucluturk,
Umut Guclu,
Xavier Baro,
Isabelle Guyon,
Julio Jacques Junior,
Meysam Madadi,
Stephane Ayache,
Evelyne Viegas,
Furkan Gurpinar,
Achmadnoer Sukma Wicaksana,
Cynthia C. S. Liem,
Marcel A. J. van Gerven,
Rob van Lier
Abstract:
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in…
▽ More
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, the evaluation protocol, and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.
△ Less
Submitted 28 September, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Adaptive Mixtures of Factor Analyzers
Authors:
Heysem Kaya,
Albert Ali Salah
Abstract:
A mixture of factor analyzers is a semi-parametric density estimator that generalizes the well-known mixtures of Gaussians model by allowing each Gaussian in the mixture to be represented in a different lower-dimensional manifold. This paper presents a robust and parsimonious model selection algorithm for training a mixture of factor analyzers, carrying out simultaneous clustering and locally line…
▽ More
A mixture of factor analyzers is a semi-parametric density estimator that generalizes the well-known mixtures of Gaussians model by allowing each Gaussian in the mixture to be represented in a different lower-dimensional manifold. This paper presents a robust and parsimonious model selection algorithm for training a mixture of factor analyzers, carrying out simultaneous clustering and locally linear, globally nonlinear dimensionality reduction. Permitting different number of factors per mixture component, the algorithm adapts the model complexity to the data complexity. We compare the proposed algorithm with related automatic model selection algorithms on a number of benchmarks. The results indicate the effectiveness of this fast and robust approach in clustering, manifold learning and class-conditional modeling.
△ Less
Submitted 22 October, 2015; v1 submitted 10 July, 2015;
originally announced July 2015.
-
UDC in Action
Authors:
Richard Smiraglia,
Andrea Scharnhorst,
Almila Akdag Salah,
Cheng Gao
Abstract:
The UDC (Universal Decimal Classification) is not only a classification language with a long history; it also presents a complex cognitive system worthy of the attention of complexity theory. The elements of the UDC: classes, auxiliaries, and operations are combined into symbolic strings, which in essence represent a complex networks of concepts. This network forms a backbone of ordering of knowle…
▽ More
The UDC (Universal Decimal Classification) is not only a classification language with a long history; it also presents a complex cognitive system worthy of the attention of complexity theory. The elements of the UDC: classes, auxiliaries, and operations are combined into symbolic strings, which in essence represent a complex networks of concepts. This network forms a backbone of ordering of knowledge and at the same time allows expression of different perspectives on various products of human knowledge production. In this paper we look at UDC strings derived from the holdings of libraries. In particular we analyse the subject headings of holdings of the university library in Leuven, and an extraction of UDC numbers from the OCLC WorldCat. Comparing those sets with the Master Reference File, we look into the length of strings, the occurrence of different auxiliary signs, and the resulting connections between UDC classes. We apply methods and representations from complexity theory. Map** out basic statistics on UDC classes as used in libraries from a point of view of complexity theory bears different benefits. Deploying its structure could serve as an overview and basic information for users among the nature and focus of specific collections. A closer view into combined UDC numbers reveals the complex nature of the UDC as an example for a knowledge ordering system, which deserves future exploration from a complexity theoretical perspective.
△ Less
Submitted 17 June, 2013;
originally announced June 2013.
-
Map** EINS -- An exercise in map** the Network of Excellence in Internet Science
Authors:
Almila Akdag Salah,
Sally Wyatt,
Samir Passi,
Andrea Scharnhorst
Abstract:
This paper demonstrates the application of bibliometric map** techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the map** of EINS as case -- an FP7 funded Network of Excellence. Finally, we discuss how these techniques c…
▽ More
This paper demonstrates the application of bibliometric map** techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the map** of EINS as case -- an FP7 funded Network of Excellence. Finally, we discuss how these techniques can be used to serve as knowledge maps for interdisciplinary working experts.
△ Less
Submitted 16 July, 2013; v1 submitted 21 April, 2013;
originally announced April 2013.
-
The evolution of classification systems: Ontogeny of the UDC
Authors:
Almila Akdag Salah,
Cheng Gao,
Krzysztof Suchecki,
Andrea Scharnhorst,
Richard P. Smiraglia
Abstract:
To classify is to put things in meaningful groups, but the criteria for doing so can be problematic. Study of evolution of classification includes ontogenetic analysis of change in classification over time. We present an empirical analysis of the UDC over the entire period of its development. We demonstrate stability in main classes, with major change driven by 20th century scientific developments…
▽ More
To classify is to put things in meaningful groups, but the criteria for doing so can be problematic. Study of evolution of classification includes ontogenetic analysis of change in classification over time. We present an empirical analysis of the UDC over the entire period of its development. We demonstrate stability in main classes, with major change driven by 20th century scientific developments. But we also demonstrate a vast increase in the complexity of auxiliaries. This study illustrates an alternative to Tennis' "scheme-versioning" method.
△ Less
Submitted 17 April, 2012;
originally announced April 2012.
-
Evolution of Wikipedia's Category Structure
Authors:
Krzysztof Suchecki,
Alkim Almila Akdag Salah,
Cheng Gao,
Andrea Scharnhorst
Abstract:
Wikipedia, as a social phenomenon of collaborative knowledge creating, has been studied extensively from various points of views. The category system of Wikipedia, introduced in 2004, has attracted relatively little attention. In this study, we focus on the documentation of knowledge, and the transformation of this documentation with time. We take Wikipedia as a proxy for knowledge in general and…
▽ More
Wikipedia, as a social phenomenon of collaborative knowledge creating, has been studied extensively from various points of views. The category system of Wikipedia, introduced in 2004, has attracted relatively little attention. In this study, we focus on the documentation of knowledge, and the transformation of this documentation with time. We take Wikipedia as a proxy for knowledge in general and its category system as an aspect of the structure of this knowledge. We investigate the evolution of the category structure of the English Wikipedia from its birth in 2004 to 2008. We treat the category system as if it is a hierarchical Knowledge Organization System, capturing the changes in the distributions of the top categories. We investigate how the clustering of articles, defined by the category system, matches the direct link network between the articles and show how it changes over time. We find the Wikipedia category network mostly stable, but with occasional reorganization. We show that the clustering matches the link structure quite well, except short periods preceding the reorganizations.
△ Less
Submitted 4 March, 2012;
originally announced March 2012.
-
Need to categorize: A comparative look at the categories of the Universal Decimal Classification system (UDC) and Wikipedia
Authors:
Almila Akdag Salah,
Cheng Gao,
Krzysztof Suchecki,
Andrea Scharnhorst
Abstract:
This study analyzes the differences between the category structure of the Universal Decimal Classification (UDC) system (which is one of the widely used library classification systems in Europe) and Wikipedia. In particular, we compare the emerging structure of category-links to the structure of classes in the UDC. With this comparison we would like to scrutinize the question of how do knowledge m…
▽ More
This study analyzes the differences between the category structure of the Universal Decimal Classification (UDC) system (which is one of the widely used library classification systems in Europe) and Wikipedia. In particular, we compare the emerging structure of category-links to the structure of classes in the UDC. With this comparison we would like to scrutinize the question of how do knowledge maps of the same domain differ when they are created socially (i.e. Wikipedia) as opposed to when they are created formally (UDC) using classificatio theory. As a case study, we focus on the category of "Arts".
△ Less
Submitted 30 May, 2011;
originally announced May 2011.
-
The structure of the Arts & Humanities Citation Index: A map** on the basis of aggregated citations among 1,157 journals
Authors:
Loet Leydesdorff,
Björn Hammarfelt,
Alkim Almila Akdag Salah
Abstract:
Using the Arts & Humanities Citation Index (A&HCI) 2008, we apply map** techniques previously developed for map** journal structures in the Science and Social Science Citation Indices. Citation relations among the 110,718 records were aggregated at the level of 1,157 journals specific to the A&HCI, and the journal structures are questioned on whether a cognitive structure can be reconstructed…
▽ More
Using the Arts & Humanities Citation Index (A&HCI) 2008, we apply map** techniques previously developed for map** journal structures in the Science and Social Science Citation Indices. Citation relations among the 110,718 records were aggregated at the level of 1,157 journals specific to the A&HCI, and the journal structures are questioned on whether a cognitive structure can be reconstructed and visualized. Both cosine-normalization (bottom up) and factor analysis (top down) suggest a division into approximately twelve subsets. The relations among these subsets are explored using various visualization techniques. However, we were not able to retrieve this structure using the ISI Subject Categories, including the 25 categories which are specific to the A&HCI. We discuss options for validation such as against the categories of the Humanities Indicators of the American Academy of Arts and Sciences, the panel structure of the European Reference Index for the Humanities (ERIH), and compare our results with the curriculum organization of the Humanities Section of the College of Letters and Sciences of UCLA as an example of institutional organization.
△ Less
Submitted 20 July, 2011; v1 submitted 9 February, 2011;
originally announced February 2011.
-
The Development of the Journal Environment of Leonardo
Authors:
Alkim Almila Akdag Salah,
Loet Leydesdorff
Abstract:
We present animations based on the aggregated journal-journal citations of Leonardo during the period 1974-2008. Leonardo is mainly cited by journals outside the arts domain for cultural reasons, for example, in neuropsychology and physics. Articles in Leonardo itself cite a large number of journals, but with a focus on the arts. Animations at this level of aggregation enable us to show the histor…
▽ More
We present animations based on the aggregated journal-journal citations of Leonardo during the period 1974-2008. Leonardo is mainly cited by journals outside the arts domain for cultural reasons, for example, in neuropsychology and physics. Articles in Leonardo itself cite a large number of journals, but with a focus on the arts. Animations at this level of aggregation enable us to show the history of the journal from a network perspective.
△ Less
Submitted 8 September, 2010;
originally announced September 2010.
-
Maps on the basis of the Arts & Humanities Citation Index: The journals Leonardo and Art Journal versus "Digital Humanities" as a topic
Authors:
Loet Leydesdorff,
Alkim Almila Akdag Salah
Abstract:
The possibilities of using the Arts & Humanities Citation Index (A&HCI) for journal map** have not been sufficiently recognized because of the absence of a Journal Citations Report (JCR) for this database. A quasi-JCR for the A&HCI (2008) was constructed from the data contained in the Web-of-Science and is used for the evaluation of two journals as examples: Leonardo and Art Journal. The maps…
▽ More
The possibilities of using the Arts & Humanities Citation Index (A&HCI) for journal map** have not been sufficiently recognized because of the absence of a Journal Citations Report (JCR) for this database. A quasi-JCR for the A&HCI (2008) was constructed from the data contained in the Web-of-Science and is used for the evaluation of two journals as examples: Leonardo and Art Journal. The maps on the basis of the aggregated journal-journal citations within this domain can be compared with maps including references to journals in the Science Citation Index and Social Science Citation Index. Art journals are cited by (social) science journals more than by other art journals, but these journals draw upon one another in terms of their own references. This cultural impact in terms of being cited is not found when documents with a topic such as "digital humanities" are analyzed. This community of practice functions more as an intellectual organizer than a journal.
△ Less
Submitted 16 December, 2009;
originally announced December 2009.