Skip to main content

Showing 1–14 of 14 results for author: Ghorbani, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14634  [pdf, other

    cs.CV

    UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues

    Authors: Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad

    Abstract: We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image… ▽ More

    Submitted 14 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 18 pages, 12 figures

  2. arXiv:2404.12625  [pdf, other

    cs.CV

    SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

    Authors: Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad

    Abstract: We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavil… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures

  3. arXiv:2209.07556  [pdf, other

    cs.GR cs.LG cs.SD

    ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

    Authors: Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus F. Troje, Marc-André Carbonneau

    Abstract: We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scalin… ▽ More

    Submitted 23 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  4. Estimating Pose from Pressure Data for Smart Beds with Deep Image-based Pose Estimators

    Authors: Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

    Abstract: In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure d… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: The version of record of this article, first published in Applied Intelligence, is available online at Publisher's website https://doi.org/10.1007/s10489-021-02418-y. arXiv admin note: substantial text overlap with arXiv:1908.08919

    Report number: 1573-7497

    Journal ref: Applied Intelligence (2021): 1-15

  5. arXiv:2011.04084  [pdf, other

    eess.AS cs.SD eess.IV

    Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

    Authors: Shahram Ghorbani, Yashesh Gaur, Yu Shi, **yu Li

    Abstract: In this study, we try to address the problem of leveraging visual signals to improve Automatic Speech Recognition (ASR), also known as visual context-aware ASR (VC-ASR). We explore novel VC-ASR approaches to leverage video and text representations extracted by a self-supervised pre-trained text-video embedding model. Firstly, we propose a multi-stream attention architecture to leverage signals fro… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted at SLT 2021

  6. Probabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable Model

    Authors: Saeed Ghorbani, Calden Wloka, Ali Etemad, Marcus A. Brubaker, Nikolaus F. Troje

    Abstract: We present a probabilistic framework to generate character animations based on weak control signals, such that the synthesized motions are realistic while retaining the stochastic nature of human movement. The proposed architecture, which is designed as a hierarchical recurrent model, maps each sub-sequence of motions into a stochastic latent code using a variational autoencoder extended over the… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Journal ref: Computer Graphics Forum, 39 (2002), 39-Issue 8

  7. arXiv:2010.09084  [pdf, other

    cs.CV cs.LG

    Gait Recognition using Multi-Scale Partial Representation Transformation with Capsules

    Authors: Alireza Sepas-Moghaddam, Saeed Ghorbani, Nikolaus F. Troje, Ali Etemad

    Abstract: Gait recognition, referring to the identification of individuals based on the manner in which they walk, can be very challenging due to the variations in the viewpoint of the camera and the appearance of individuals. Current methods for gait recognition have been dominated by deep learning models, notably those based on partial feature representations. In this context, we propose a novel deep netw… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

    Comments: Accepted to International Conference on Pattern Recognition (ICPR) 2020

  8. arXiv:2007.09131  [pdf, other

    eess.AS cs.SD eess.SP

    SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Map**

    Authors: Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, **g Huang

    Abstract: The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications. One of the most popular variants of these FCNs is the `U-Net', which is an encoder-decoder network with skip connections. In this study, we propose `SkipConvNet' where we replace each skip connection with multiple convolutional modules to provide decoder wi… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Submitted to Interspeech2020

  9. MoVi: A Large Multipurpose Motion and Video Dataset

    Authors: Saeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, Nikolaus F. Troje

    Abstract: Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined ever… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

  10. arXiv:2001.01656  [pdf, other

    eess.AS cs.SD

    Audio-visual Recognition of Overlapped speech for the LRS2 dataset

    Authors: Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

    Abstract: Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 5 pages, 5 figures, submitted to icassp2019

  11. arXiv:1911.05126  [pdf, ps, other

    cs.NI

    KPsec: Secure End-to-End Communications for Multi-Hop Wireless Networks

    Authors: Mohammed Gharib, Ali Owfi, Soudeh Ghorbani

    Abstract: The security of cyber-physical systems, from self-driving cars to medical devices, depends on their underlying multi-hop wireless networks. Yet, the lack of trusted central infrastructures and limited nodes' resources make securing these networks challenging. Recent works on key pre-distribution schemes, where nodes communicate over encrypted overlay paths, provide an appealing solution because of… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: 20 pages, 10 figures, 3 tables, testbed experiment, exhaustive performance evaluation

  12. arXiv:1910.00565  [pdf, ps, other

    eess.AS cs.CL cs.LG

    Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition

    Authors: Shahram Ghorbani, Soheil Khorram, John H. L. Hansen

    Abstract: Training acoustic models with sequentially incoming data -- while both leveraging new data and avoiding the forgetting effect-- is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at ASRU, 2019

  13. In-bed Pressure-based Pose Estimation using Image Space Representation Learning

    Authors: Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

    Abstract: Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics. However, pose estimation models fail to generalize when facing images acquired from in-bed pressure sensing systems. In this paper, we address this challenge by presenting a novel end-to-end framework capable of accurately locating… ▽ More

    Submitted 18 May, 2021; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: \c{opyright}2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3965-3969). IEEE

  14. Auto-labelling of Markers in Optical Motion Capture by Permutation Learning

    Authors: Saeed Ghorbani, Ali Etemad, Nikolaus F. Troje

    Abstract: Optical marker-based motion capture is a vital tool in applications such as motion and behavioural analysis, animation, and biomechanics. Labelling, that is, assigning optical markers to the pre-defined positions on the body is a time consuming and labour intensive postprocessing part of current motion capture pipelines. The problem can be considered as a ranking process in which markers shuffled… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

    Journal ref: Computer Graphics International Conference, pp. 167-178. Springer, Cham, 2019