Search | arXiv e-print repository

doi 10.1016/j.future.2022.04.014

Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence

Authors: Jorge Ejarque, Rosa M. Badia, Loïc Albertin, Giovanni Aloisio, Enrico Baglione, Yolanda Becerra, Stefan Boschert, Julian R. Berlin, Alessandro D'Anca, Donatello Elia, François Exertier, Sandro Fiore, José Flich, Arnau Folch, Steven J Gibbons, Nikolay Koldunov, Francesc Lordan, Stefano Lorito, Finn Løvholt, Jorge Macías, Fabrizio Marozzo, Alberto Michelini, Marisol Monterrubio-Velasco, Marta Pienkowska, Josep de la Puente , et al. (12 additional authors not shown)

Abstract: The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena,… ▽ More The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project. △ Less

Submitted 13 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Journal ref: Future Generation Computer Systems, Volume 134, Pages 414-429, ISSN 0167-739X, Elsevier, 2022

arXiv:2201.05991 [pdf, other]

doi 10.1109/TPAMI.2023.3243465

Video Transformers: A Survey

Authors: Javier Selva, Anders S. Johansen, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund, Albert Clapés

Abstract: Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for visio… ▽ More Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey, we analyze the main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled at the input level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition, we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity. △ Less

Submitted 13 February, 2023; v1 submitted 16 January, 2022; originally announced January 2022.

arXiv:2109.09487 [pdf]

Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

Authors: David Curto, Albert Clapés, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, David Gallardo-Pujol, Georgina Guilera, David Leiva, Thomas B. Moeslund, Sergio Escalera, Cristina Palmero

Abstract: Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to mo… ▽ More Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: Accepted to the 2021 ICCV Workshop on Understanding Social Behavior in Dyadic and Small Group Interactions

arXiv:2012.14259 [pdf, other]

Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset

Authors: Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, Albert Clapés, Alexa Moseguí, Zejian Zhang, David Gallardo, Georgina Guilera, David Leiva, Sergio Escalera

Abstract: This paper introduces UDIVA, a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload. The dataset consists of 90.5 hours of dyadic interactions among 147 participants distributed in 188 sessions, recorded using multiple audiovisual and physiological sensors. Currently, it… ▽ More This paper introduces UDIVA, a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload. The dataset consists of 90.5 hours of dyadic interactions among 147 participants distributed in 188 sessions, recorded using multiple audiovisual and physiological sensors. Currently, it includes sociodemographic, self- and peer-reported personality, internal state, and relationship profiling from participants. As an initial analysis on UDIVA, we propose a transformer-based method for self-reported personality inference in dyadic scenarios, which uses audiovisual data and different sources of context from both interlocutors to regress a target person's personality traits. Preliminary results from an incremental study show consistent improvements when using all available context information. △ Less

Submitted 28 December, 2020; originally announced December 2020.

Comments: Accepted to the 11th International Workshop on Human Behavior Understanding workshop at Winter Conference on Applications of Computer Vision 2021

arXiv:2001.01935 [pdf, other]

Efficient ML Direction of Arrival Estimation assuming Unknown Sensor Noise Powers

Authors: J. Selva

Abstract: This paper presents an efficient method for computing maximum likelihood (ML) direction of arrival (DOA) estimates assuming unknown sensor noise powers. The method combines efficient Alternate Projection (AP) procedures with Newton iterations. The efficiency of the method lies in the fact that all its intermediate steps have low complexity. The main contribution of this paper is the method's last… ▽ More This paper presents an efficient method for computing maximum likelihood (ML) direction of arrival (DOA) estimates assuming unknown sensor noise powers. The method combines efficient Alternate Projection (AP) procedures with Newton iterations. The efficiency of the method lies in the fact that all its intermediate steps have low complexity. The main contribution of this paper is the method's last step, in which a concentrated cost function is maximized in both the DOAs and noise powers in a few iterations through a Newton procedure. This step has low complexity because it employs closed-form expressions of the cost function's gradients and Hessians, which are presented in the paper. The method's total computational burden is of just a few mega-flops in typical cases. We present the method for the deterministic and stochastic ML estimators. An analysis of the deterministic ML cost function's gradient reveals an unexpected drawback of its associated estimator: if the noise powers are unknown, then it is either degenerate or inconsistent. The root-mean-square (RMS) error performance and computational burden of the method are assessed numerically. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Submitted to the IEEE Transactions on Signal Processing

arXiv:1805.03064 [pdf, other]

Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues

Authors: Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera

Abstract: Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in stil… ▽ More Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on EYEDIAP dataset, further improved by 4% when the temporal modality is included. △ Less

Submitted 17 September, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

Comments: Proc. of British Machine Vision Conference (BMVC), BMVC 2018. Errata: in pg.5 the camera matrices of the transformation matrix W should be interchanged (correct version: W=C_n*M*(C_o)^-1)

arXiv:1712.00311 [pdf, other]

Folded Recurrent Neural Networks for Future Video Prediction

Authors: Marc Oliu, Javier Selva, Sergio Escalera

Abstract: Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a doub… ▽ More Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double map** between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and hel** to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach. △ Less

Submitted 16 March, 2018; v1 submitted 1 December, 2017; originally announced December 2017.

Comments: Submitted to European Conference on Computer Vision

arXiv:1706.08280 [pdf, other]

Wideband Subspace Estimation Through Projection Matrix Approximation

Authors: J. Selva

Abstract: In this paper, we present a wideband subspace estimation method that characterizes the signal subspace through its orthogonal projection matrix at each frequency. Fundamentally, the method models this projection matrix as a function of frequency that can be approximated by a polynomial. It provides two improvements: a reduction in the number of parameters required to represent the signal subspace… ▽ More In this paper, we present a wideband subspace estimation method that characterizes the signal subspace through its orthogonal projection matrix at each frequency. Fundamentally, the method models this projection matrix as a function of frequency that can be approximated by a polynomial. It provides two improvements: a reduction in the number of parameters required to represent the signal subspace along a given frequency band and a quality improvement in wideband direction-of-arrival (DOA) estimators such as Incoherent Multiple Signal Classification (IC-MUSIC) and Modified Test of Orthogonality of Projected Subspaces (MTOPS). In rough terms, the method fits a polynomial to a set of projection matrix estimates, obtained at a set of frequencies, and then uses the polynomial as a representation of the signal subspace. The paper includes the derivation of asymptotic bounds for the bias and root-mean-square (RMS) error of the projection matrix estimate and a numerical assessment of the method and its combination with the previous two DOA estimators. △ Less

Submitted 26 November, 2021; v1 submitted 26 June, 2017; originally announced June 2017.

Comments: Submitted to the IEEE Transactions on Signal Processing

arXiv:1604.06236 [pdf, other]

Non-iterative Type 4 and 5 Nonuniform FFT Methods in the One-Dimensional Case

Authors: J. Selva

Abstract: The so-called non-uniform FFT (NFFT) is a family of algorithms for efficiently computing the Fourier transform of finite-length signals, whenever the time or frequency grid is nonuniformly spaced. Following the usual classification, there exist five NFFT types. Types 1 and 2 make it possible to pass from the time to the frequency domain with nonuniform input and output grids respectively. Type 3 a… ▽ More The so-called non-uniform FFT (NFFT) is a family of algorithms for efficiently computing the Fourier transform of finite-length signals, whenever the time or frequency grid is nonuniformly spaced. Following the usual classification, there exist five NFFT types. Types 1 and 2 make it possible to pass from the time to the frequency domain with nonuniform input and output grids respectively. Type 3 allows for both input and output nonuniform grids. Finally, types 4 and 5 are the inverses of types 1 and 2 and are expensive computationally, given that they involve iterative methods. In this paper, we solve this last drawback in the one-dimensional case by presenting non-iterative type 4 and 5 NFFT methods that just involve three NFFTs of types 1 or 2 plus some additional FFTs. The methods are based on exploiting the structure of the Lagrange interpolation formula. The paper includes several numerical examples in which the proposed methods are compared with the Gaussian elimination (GE) and conjugate gradient (CG) methods, both in terms of round-off error and computational burden. △ Less

Submitted 21 April, 2016; originally announced April 2016.

Comments: Submitted to the IEEE Trans. on Signal Processing

arXiv:1408.3717 [pdf, other]

doi 10.1109/TSP.2015.2419178

FFT Interpolation from Nonuniform Samples Lying in a Regular Grid

Authors: J. Selva

Abstract: This paper presents a method to interpolate a periodic band-limited signal from its samples lying at nonuniform positions in a regular grid, which is based on the FFT and has the same complexity order as this last algorithm. This kind of interpolation is usually termed "the missing samples problem" in the literature, and there exists a wide variety of iterative and direct methods for its solution.… ▽ More This paper presents a method to interpolate a periodic band-limited signal from its samples lying at nonuniform positions in a regular grid, which is based on the FFT and has the same complexity order as this last algorithm. This kind of interpolation is usually termed "the missing samples problem" in the literature, and there exists a wide variety of iterative and direct methods for its solution. The one presented in this paper is a direct method that exploits the properties of the so-called erasure polynomial, and it provides a significant improvement on the most efficient method in the literature, which seems to be the burst error recovery (BER) technique of Marvasti's et al. The numerical stability and complexity of the method are evaluated numerically and compared with the pseudo-inverse and BER solutions. △ Less

Submitted 18 December, 2014; v1 submitted 16 August, 2014; originally announced August 2014.

Comments: Submitted to the IEEE Transactions on Signal Processing

arXiv:1403.1091 [pdf, other]

Signal Estimation from Nonuniform Samples with RMS Error Bound -- Application to OFDM Channel Estimation

Authors: J. Selva

Abstract: We present a channel spectral estimator for OFDM signals containing pilot carriers, assuming a known delay spread or a bound on this parameter. The estimator is based on modeling the channel's spectrum as a band-limited function, instead of as the discrete Fourier transform of a tapped delay line (TDL). Its main advantage is its immunity to the truncation mismatch in usual TDL models (Gibbs phenom… ▽ More We present a channel spectral estimator for OFDM signals containing pilot carriers, assuming a known delay spread or a bound on this parameter. The estimator is based on modeling the channel's spectrum as a band-limited function, instead of as the discrete Fourier transform of a tapped delay line (TDL). Its main advantage is its immunity to the truncation mismatch in usual TDL models (Gibbs phenomenon). In order to assess the estimator, we compare it with the well-known TDL maximum likelihood (ML) estimator in terms of root-mean-square (RMS) error. The main result is that the proposed estimator improves on the ML estimator significantly, whenever the average spectral sampling rate is above the channel's delay spread. The improvement increases with the spectral oversampling ratio. △ Less

Submitted 15 May, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

Comments: Submitted to the IEEE Signal Processing Letters

arXiv:1104.3069 [pdf, ps, other]

Efficient Maximum Likelihood Estimation of a 2-D Complex Sinusoidal Based on Barycentric Interpolation

Authors: J. Selva

Abstract: This paper presents an efficient method to compute the maximum likelihood (ML) estimation of the parameters of a complex 2-D sinusoidal, with the complexity order of the FFT. The method is based on an accurate barycentric formula for interpolating band-limited signals, and on the fact that the ML cost function can be viewed as a signal of this type, if the time and frequency variables are switched… ▽ More This paper presents an efficient method to compute the maximum likelihood (ML) estimation of the parameters of a complex 2-D sinusoidal, with the complexity order of the FFT. The method is based on an accurate barycentric formula for interpolating band-limited signals, and on the fact that the ML cost function can be viewed as a signal of this type, if the time and frequency variables are switched. The method consists in first computing the DFT of the data samples, and then locating the maximum of the cost function by means of Newton's algorithm. The fact is that the complexity of the latter step is small and independent of the data size, since it makes use of the barycentric formula for obtaining the values of the cost function and its derivatives. Thus, the total complexity order is that of the FFT. The method is validated in a numerical example. △ Less

Submitted 15 April, 2011; originally announced April 2011.

Comments: To appear in the International Conference on Acoustic, Speech, and Signal Processing, ICASSP-2011

arXiv:1009.6053 [pdf, ps, other]

doi 10.1109/TSP.2011.2170171

Efficient Sampling of Band-limited Signals from Sine Wave Crossings

Authors: J. Selva

Abstract: This correspondence presents an efficient method for reconstructing a band-limited signal in the discrete domain from its crossings with a sine wave. The method makes it possible to design A/D converters that only deliver the crossing timings, which are then used to interpolate the input signal at arbitrary instants. Potentially, it may allow for reductions in power consumption and complexity in t… ▽ More This correspondence presents an efficient method for reconstructing a band-limited signal in the discrete domain from its crossings with a sine wave. The method makes it possible to design A/D converters that only deliver the crossing timings, which are then used to interpolate the input signal at arbitrary instants. Potentially, it may allow for reductions in power consumption and complexity in these converters. The reconstruction in the discrete domain is based on a recently-proposed modification of the Lagrange interpolator, which is readily implementable with linear complexity and efficiently, given that it re-uses known schemes for variable fractional-delay (VFD) filters. As a spin-off, the method allows one to perform spectral analysis from sine wave crossings with the complexity of the FFT. Finally, the results in the correspondence are validated in several numerical examples. △ Less

Submitted 20 September, 2011; v1 submitted 30 September, 2010; originally announced September 2010.

Comments: To appear in the IEEE Transactions on Signal Processing

arXiv:1003.2880 [pdf, ps, other]

doi 10.1109/TSP.2010.2057248

Regularized sampling of multiband signals

Authors: J. Selva

Abstract: This paper presents a regularized sampling method for multiband signals, that makes it possible to approach the Landau limit, while kee** the sensitivity to noise at a low level. The method is based on band-limited windowing, followed by trigonometric approximation in consecutive time intervals. The key point is that the trigonometric approximation "inherits" the multiband property, that is, its… ▽ More This paper presents a regularized sampling method for multiband signals, that makes it possible to approach the Landau limit, while kee** the sensitivity to noise at a low level. The method is based on band-limited windowing, followed by trigonometric approximation in consecutive time intervals. The key point is that the trigonometric approximation "inherits" the multiband property, that is, its coefficients are formed by bursts of non-zero elements corresponding to the multiband components. It is shown that this method can be well combined with the recently proposed synchronous multi-rate sampling (SMRS) scheme, given that the resulting linear system is sparse and formed by ones and zeroes. The proposed method allows one to trade sampling efficiency for noise sensitivity, and is specially well suited for bounded signals with unbounded energy like those in communications, navigation, audio systems, etc. Besides, it is also applicable to finite energy signals and periodic band-limited signals (trigonometric polynomials). The paper includes a subspace method for blindly estimating the support of the multiband signal as well as its components, and the results are validated through several numerical examples. △ Less

Submitted 25 June, 2010; v1 submitted 15 March, 2010; originally announced March 2010.

Comments: The title and introduction have changed. Submitted to the IEEE Transactions on Signal Processing

Showing 1–14 of 14 results for author: Selva, J