Search | arXiv e-print repository

Step length measurement in the wild using FMCW radar

Authors: Parthipan Siva, Alexander Wong, Patricia Hewston, George Ioannidis, Dr. Jonathan Adachi, Dr. Alexander Rabinovich, Andrea Lee, Alexandra Papaioannou

Abstract: With an aging population, numerous assistive and monitoring technologies are under development to enable older adults to age in place. To facilitate aging in place predicting risk factors such as falls, and hospitalization and providing early interventions are important. Much of the work on ambient monitoring for risk prediction has centered on gait speed analysis, utilizing privacy-preserving sen… ▽ More With an aging population, numerous assistive and monitoring technologies are under development to enable older adults to age in place. To facilitate aging in place predicting risk factors such as falls, and hospitalization and providing early interventions are important. Much of the work on ambient monitoring for risk prediction has centered on gait speed analysis, utilizing privacy-preserving sensors like radar. Despite compelling evidence that monitoring step length, in addition to gait speed, is crucial for predicting risk, radar-based methods have not explored step length measurement in the home. Furthermore, laboratory experiments on step length measurement using radars are limited to proof of concept studies with few healthy subjects. To address this gap, a radar-based step length measurement system for the home is proposed based on detection and tracking using radar point cloud, followed by Doppler speed profiling of the torso to obtain step lengths in the home. The proposed method was evaluated in a clinical environment, involving 35 frail older adults, to establish its validity. Additionally, the method was assessed in people's homes, with 21 frail older adults who had participated in the clinical assessment. The proposed radar-based step length measurement method was compared to the gold standard Zeno Walkway Gait Analysis System, revealing a 4.5cm/8.3% error in a clinical setting. Furthermore, it exhibited excellent reliability (ICC(2,k)=0.91, 95% CI 0.82 to 0.96) in uncontrolled home settings. The method also proved accurate in uncontrolled home settings, as indicated by a strong agreement (ICC(3,k)=0.81 (95% CI 0.53 to 0.92)) between home measurements and in-clinic assessments. △ Less

Submitted 3 January, 2024; originally announced January 2024.

ACM Class: I.5.4; C.3; J.7

arXiv:2310.03952 [pdf, other]

ILSH: The Imperial Light-Stage Head Dataset for Human Head View Synthesis

Authors: Jiali Zheng, Youngkyoon Jang, Athanasios Papaioannou, Christos Kampouris, Rolandos Alexandros Potamias, Foivos Paraperas Papantoniou, Efstathios Galanakis, Ales Leonardis, Stefanos Zafeiriou

Abstract: This paper introduces the Imperial Light-Stage Head (ILSH) dataset, a novel light-stage-captured human head dataset designed to support view synthesis academic challenges for human heads. The ILSH dataset is intended to facilitate diverse approaches, such as scene-specific or generic neural rendering, multiple-view geometry, 3D vision, and computer graphics, to further advance the development of p… ▽ More This paper introduces the Imperial Light-Stage Head (ILSH) dataset, a novel light-stage-captured human head dataset designed to support view synthesis academic challenges for human heads. The ILSH dataset is intended to facilitate diverse approaches, such as scene-specific or generic neural rendering, multiple-view geometry, 3D vision, and computer graphics, to further advance the development of photo-realistic human avatars. This paper details the setup of a light-stage specifically designed to capture high-resolution (4K) human head images and describes the process of addressing challenges (preprocessing, ethical issues) in collecting high-quality data. In addition to the data collection, we address the split of the dataset into train, validation, and test sets. Our goal is to design and support a fair view synthesis challenge task for this novel dataset, such that a similar level of performance can be maintained and expected when using the test set, as when using the validation set. The ILSH dataset consists of 52 subjects captured using 24 cameras with all 82 lighting sources turned on, resulting in a total of 1,248 close-up head images, border masks, and camera pose pairs. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: ICCV 2023 Workshop, 9 pages, 6 figures

arXiv:2309.10825 [pdf, other]

Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning

Authors: Simone Foti, Alexander J. Rickart, Bong** Koo, Eimear O' Sullivan, Lara S. van de Lande, Athanasios Papaioannou, Roman Khonsari, Danail Stoyanov, N. u. Owase Jeelani, Silvia Schievano, David J. Dunaway, Matthew J. Clarkson

Abstract: The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muen… ▽ More The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2001.05172 [pdf, other]

Physics Informed Deep Learning for Transport in Porous Media. Buckley Leverett Problem

Authors: Cedric G. Fraces, Adrien Papaioannou, Hamdi Tchelepi

Abstract: We present a new hybrid physics-based machine-learning approach to reservoir modeling. The methodology relies on a series of deep adversarial neural network architecture with physics-based regularization. The network is used to simulate the dynamic behavior of physical quantities (i.e. saturation) subject to a set of governing laws (e.g. mass conservation) and corresponding boundary and initial co… ▽ More We present a new hybrid physics-based machine-learning approach to reservoir modeling. The methodology relies on a series of deep adversarial neural network architecture with physics-based regularization. The network is used to simulate the dynamic behavior of physical quantities (i.e. saturation) subject to a set of governing laws (e.g. mass conservation) and corresponding boundary and initial conditions. A residual equation is formed from the governing partial-differential equation and used as part of the training. Derivatives of the estimated physical quantities are computed using automatic differentiation algorithms. This allows the model to avoid overfitting, by reducing the variance and permits extrapolation beyond the range of the training data including uncertainty implicitely derived from the distribution output of the generative adversarial networks. The approach is used to simulate a 2 phase immiscible transport problem (Buckley Leverett). From a very limited dataset, the model learns the parameters of the governing equation and is able to provide an accurate physical solution, both in terms of shock and rarefaction. We demonstrate how this method can be applied in the context of a forward simulation for continuous problems. The use of these models for the inverse problem is also presented, where the model simultaneously learns the physical laws and determines key uncertainty subsurface parameters. The proposed methodology is a simple and elegant way to instill physical knowledge to machine-learning algorithms. This alleviates the two most significant shortcomings of machine-learning algorithms: the requirement for large datasets and the reliability of extrapolation. The principles presented in this paper can be generalized in innumerable ways in the future and should lead to a new class of algorithms to solve both forward and inverse physical problems. △ Less

Submitted 15 January, 2020; originally announced January 2020.

Comments: 21 pages, 15 figures

arXiv:1909.02215 [pdf, other]

doi 10.1007/978-3-030-58526-6_25

Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks

Authors: Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, Stefanos Zafeiriou

Abstract: Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was… ▽ More Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this paper are compressed due to size limitations, full-resolution results and the accompanying video can be found in the supplementary documents. The code and models are available at the project page: https://github.com/barisgecer/TBGAN. △ Less

Submitted 1 December, 2020; v1 submitted 5 September, 2019; originally announced September 2019.

Comments: Check project page: https://github.com/barisgecer/TBGAN for the full resolution results and the accompanying video

Journal ref: In: European conference on computer vision. Springer, Cham, 2020. p. 415-433

arXiv:1905.00307 [pdf, other]

3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation

Authors: Stylianos Moschoglou, Stylianos Ploumpis, Mihalis Nicolaou, Athanasios Papaioannou, Stefanos Zafeiriou

Abstract: Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes).… ▽ More Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes). This can be primarily attributed to two facts, namely that (a) publicly available 3D face databases are scarce as well as limited in terms of sample size and variability (e.g., few subjects, little diversity in race and gender), and (b) mesh convolutions for deep networks present several challenges that are not entirely tackled in the literature, leading to operator approximations and model instability, often failing to preserve high-frequency components of the distribution. As a result, linear methods such as Principal Component Analysis (PCA) have been mainly utilized towards 3D shape analysis, despite being unable to capture non-linearities and high frequency details of the 3D face - such as eyelid and lip variations. In this work, we present 3DFaceGAN, the first GAN tailored towards modeling the distribution of 3D facial surfaces, while retaining the high frequency details of 3D face shapes. We conduct an extensive series of both qualitative and quantitative experiments, where the merits of 3DFaceGAN are clearly demonstrated against other, state-of-the-art methods in tasks such as 3D shape representation, generation, and translation. △ Less

Submitted 9 May, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: 15 pages, 12 figures. Submitted to International Journal of Computer Vision (IJCV), special issue: Generative Adversarial Networks for Computer Vision

arXiv:1904.07002 [pdf, other]

Synthesising 3D Facial Motion from "In-the-Wild" Speech

Authors: Panagiotis Tzirakis, Athanasios Papaioannou, Alexander Lattas, Michail Tarasiou, Björn Schuller, Stefanos Zafeiriou

Abstract: Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the sp… ▽ More Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading Words (LRW) a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-war** technique, named Deep Canonical Attentional War** (DCAW), that can simultaneously learn hierarchical non-linear representations and a war** path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1804.10938 [pdf, other]

doi 10.1007/s11263-019-01158-4

Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A. Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, Stefanos Zafeiriou

Abstract: Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the act… ▽ More Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge. △ Less

Submitted 1 February, 2019; v1 submitted 29 April, 2018; originally announced April 2018.

Showing 1–8 of 8 results for author: Papaioannou, A