Skip to main content

Showing 1–50 of 100 results for author: Etemad, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07450  [pdf, other

    cs.CV cs.LG

    Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

    Authors: Shuvendu Roy, Yasaman Parhizkar, Franklin Ogidi, Vahid Reza Khazaie, Michael Colacci, Ali Etemad, Elham Dolatabadi, Arash Afkanpour

    Abstract: We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Through this study, we aim to answer the following research questions: (i) How transferable are general-domain representations to the medical domain? (ii) Is multimodal contrastive training sufficient, or does it benefit from unimodal training as well? (iii) What is the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2405.20082  [pdf, other

    cs.LG cs.AI

    Segment, Shuffle, and Stitch: A Simple Mechanism for Improving Time-Series Representations

    Authors: Shivam Grover, Amin Jalali, Ali Etemad

    Abstract: Existing approaches for learning representations of time-series keep the temporal arrangement of the time-steps intact with the presumption that the original order is the most optimal for learning. However, non-adjacent sections of real-world time-series may have strong dependencies. Accordingly we raise the question: Is there an alternative arrangement for time-series which could enable more effe… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2405.18654  [pdf, other

    cs.CV

    Mitigating Object Hallucination via Data Augmented Contrastive Tuning

    Authors: Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister

    Abstract: Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations wh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.16625  [pdf, other

    cs.CV

    Few-shot Tuning of Foundation Models for Class-incremental Learning

    Authors: Shuvendu Roy, Elham Dolatabadi, Arash Afkanpour, Ali Etemad

    Abstract: For the first time, we explore few-shot tuning of vision foundation models for class-incremental learning. Unlike existing few-shot class incremental learning (FSCIL) methods, which train an encoder on a base session to ensure forward compatibility for future continual learning, foundation models are generally trained on large unlabelled data without such considerations. This renders prior methods… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2405.16304  [pdf, other

    cs.LG cs.AI

    Federated Unsupervised Domain Generalization using Global and Local Alignment of Gradients

    Authors: Farhad Pourpanah, Mahdiyar Molahasani, Milad Soltany, Michael Greenspan, Ali Etemad

    Abstract: We address the problem of federated domain generalization in an unsupervised setting for the first time. We first theoretically establish a connection between domain shift and alignment of gradients in unsupervised federated learning and show that aligning the gradients at both client and server levels can facilitate the generalization of the model to new (target) domains. Building on this insight… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 23 pages, 4 figure

  6. arXiv:2404.17098  [pdf, other

    cs.HC cs.AI

    CLARE: Cognitive Load Assessment in REaltime with Multimodal Data

    Authors: Anubhav Bhatti, Prithila Angkan, Behnam Behinaein, Zunayed Mahmud, Dirk Rodenburg, Heather Braund, P. James Mclellan, Aaron Ruberto, Geoffery Harrison, Daryl Wilson, Adam Szulewski, Dan Howes, Ali Etemad, Paul Hungler

    Abstract: We present a novel multimodal dataset for Cognitive Load Assessment in REaltime (CLARE). The dataset contains physiological and gaze data from 24 participants with self-reported cognitive load scores as ground-truth labels. The dataset consists of four modalities, namely, Electrocardiography (ECG), Electrodermal Activity (EDA), Electroencephalogram (EEG), and Gaze tracking. To map diverse levels o… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, 6 tables

  7. arXiv:2404.14634  [pdf, other

    cs.CV

    UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues

    Authors: Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad

    Abstract: We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image… ▽ More

    Submitted 14 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 18 pages, 12 figures

  8. arXiv:2404.12625  [pdf, other

    cs.CV

    SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

    Authors: Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad

    Abstract: We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavil… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures

  9. arXiv:2403.14392  [pdf, other

    cs.CV cs.LG

    A Bag of Tricks for Few-Shot Class-Incremental Learning

    Authors: Shuvendu Roy, Chunjong Park, Aldi Fahrezi, Ali Etemad

    Abstract: We present a bag of tricks framework for few-shot class-incremental learning (FSCIL), which is a challenging form of continual learning that involves continuous adaptation to new tasks with limited samples. FSCIL requires both stability and adaptability, i.e., preserving proficiency in previously learned tasks while learning new ones. Our proposed bag of tricks brings together eight key and highly… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  10. arXiv:2403.10561   

    cs.LG cs.AI

    A collection of the accepted papers for the Human-Centric Representation Learning workshop at AAAI 2024

    Authors: Dimitris Spathis, Aaqib Saeed, Ali Etemad, Sana Tonekaboni, Stefanos Laskaridis, Shohreh Deldari, Chi Ian Tang, Patrick Schwab, Shyam Tailor

    Abstract: This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website.

    Submitted 14 March, 2024; originally announced March 2024.

  11. arXiv:2401.14107  [pdf, other

    cs.LG eess.SP

    Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement

    Authors: Aaqib Saeed, Dimitris Spathis, Jungwoo Oh, Edward Choi, Ali Etemad

    Abstract: Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the us… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  12. arXiv:2312.01187  [pdf, other

    cs.CV cs.LG stat.ML

    SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer

    Authors: Renan A. Rojas-Gomez, Karan Singhal, Ali Etemad, Alex Bijamov, Warren R. Morningstar, Philip Andrew Mansfield

    Abstract: Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Tr… ▽ More

    Submitted 3 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  13. arXiv:2311.06852  [pdf, other

    cs.CV

    Contrastive Learning of View-Invariant Representations for Facial Expressions Recognition

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: Although there has been much progress in the area of facial expression recognition (FER), most existing methods suffer when presented with images that have been captured from viewing angles that are non-frontal and substantially different from those used in the training process. In this paper, we propose ViewFX, a novel view-invariant FER framework based on contrastive learning, capable of accurat… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted in ACM Transactions on Multimedia Computing, Communications, and Applications

  14. arXiv:2310.15388  [pdf, other

    cs.CV cs.LG

    Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training

    Authors: Divij Gupta, Ali Etemad

    Abstract: Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solu… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted in IEEE Internet of Things Journal 2023

  15. arXiv:2309.04849  [pdf, other

    cs.CL cs.AI cs.LG

    Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations

    Authors: Debaditya Shome, Ali Etemad

    Abstract: We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal knowledge distillation during training to learn strong linguistic and prosodic representations of emotion from speech. During inference, our method only uses a stream of speech signals to perform unimodal SER thus reducing computation overhead and avoiding run-time transcription and prosodic featur… ▽ More

    Submitted 14 March, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  16. arXiv:2309.01274  [pdf, other

    cs.CV

    Diffusion Models with Deterministic Normalizing Flow Priors

    Authors: Mohsen Zand, Ali Etemad, Michael Greenspan

    Abstract: For faster sampling and higher sample quality, we propose DiNof ($\textbf{Di}$ffusion with $\textbf{No}$rmalizing $\textbf{f}$low priors), a technique that makes use of normalizing flows and diffusion models. We use normalizing flows to parameterize the noisy data at any arbitrary step of the diffusion process and utilize it as the prior in the reverse diffusion process. More specifically, the for… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: 12 pages, 7 figures

  17. arXiv:2308.16801  [pdf, other

    cs.CV

    Multiscale Residual Learning of Graph Convolutional Sequence Chunks for Human Motion Prediction

    Authors: Mohsen Zand, Ali Etemad, Michael Greenspan

    Abstract: A new method is proposed for human motion prediction by learning temporal and spatial dependencies. Recently, multiscale graphs have been developed to model the human body at higher abstraction levels, resulting in more stable motion prediction. Current methods however predetermine scale levels and combine spatially proximal joints to generate coarser scales based on human priors, even though move… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 13 pages

  18. arXiv:2308.13568  [pdf, other

    eess.SP cs.LG

    Region-Disentangled Diffusion Model for High-Fidelity PPG-to-ECG Translation

    Authors: Debaditya Shome, Pritam Sarkar, Ali Etemad

    Abstract: The high prevalence of cardiovascular diseases (CVDs) calls for accessible and cost-effective continuous cardiac monitoring tools. Despite Electrocardiography (ECG) being the gold standard, continuous monitoring remains a challenge, leading to the exploration of Photoplethysmography (PPG), a promising but more basic alternative available in consumer wearables. This notion has recently spurred inte… ▽ More

    Submitted 27 December, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted at AAAI 2024

  19. arXiv:2308.00246  [pdf, other

    cs.LG cs.AI cs.HC

    EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning

    Authors: Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad

    Abstract: Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer l… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: This paper has been accepted to the 25th International Conference on Multimodal Interaction (ICMI 2023). 8 pages, 6 figures, 6 tables

  20. arXiv:2307.03786  [pdf, other

    cs.CV

    Context-aware Pedestrian Trajectory Prediction with Multimodal Transformer

    Authors: Haleh Damirchi, Michael Greenspan, Ali Etemad

    Abstract: We propose a novel solution for predicting future trajectories of pedestrians. Our method uses a multimodal encoder-decoder transformer architecture, which takes as input both pedestrian locations and ego-vehicle speeds. Notably, our decoder predicts the entire future trajectory in a single-pass and does not perform one-step-ahead prediction, which makes the method effective for embedded edge depl… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  21. arXiv:2307.02744  [pdf, other

    cs.CV

    Active Learning with Contrastive Pre-training for Facial Expression Recognition

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: Deep learning has played a significant role in the success of facial expression recognition (FER), thanks to large models and vast amounts of labelled data. However, obtaining labelled data requires a tremendous amount of human effort, time, and financial resources. Even though some prior works have focused on reducing the need for large amounts of labelled data using different unsupervised method… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted at 11th International Conference on Affective Computing and Intelligent Interaction (ACII'2023)

  22. arXiv:2306.15117  [pdf, other

    cs.CV cs.AI

    Continual Learning for Out-of-Distribution Pedestrian Detection

    Authors: Mahdiyar Molahasani, Ali Etemad, Michael Greenspan

    Abstract: A continual learning solution is proposed to address the out-of-distribution generalization problem for pedestrian detection. While recent pedestrian detection models have achieved impressive performance on various datasets, they remain sensitive to shifts in the distribution of the inference data. Our method adopts and modifies Elastic Weight Consolidation to a backbone object detection network,… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  23. arXiv:2306.13275  [pdf, other

    cs.LG cs.CV

    Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework

    Authors: Mahdiyar Molahasani, Michael Greenspan, Ali Etemad

    Abstract: The Long-Tailed Recognition (LTR) problem emerges in the context of learning from highly imbalanced datasets, in which the number of samples among different classes is heavily skewed. LTR methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We propose a theorem where under the assumption of strong convexity of the loss function, the weights of a learn… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  24. Exploring the Landscape of Ubiquitous In-home Health Monitoring: A Comprehensive Survey

    Authors: Farhad Pourpanah, Ali Etemad

    Abstract: Ubiquitous in-home health monitoring systems have become popular in recent years due to the rise of digital health technologies and the growing demand for remote health monitoring. These systems enable individuals to increase their independence by allowing them to monitor their health from the home and by allowing more control over their well-being. In this study, we perform a comprehensive survey… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Accepted in ACM Transactions on Computing for Healthcare

  25. arXiv:2306.06881  [pdf, other

    cs.CV

    Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

    Authors: Sayantan Das, Mojtaba Kolahdouzi, Levent Özparlak, Will Hickie, Ali Etemad

    Abstract: We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup. Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from c… ▽ More

    Submitted 9 February, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by IEEE International Joint Conference on Biometrics (IJCB 2023)

  26. arXiv:2306.06696  [pdf, other

    cs.CV

    Toward Fair Facial Expression Recognition with Improved Distribution Alignment

    Authors: Mojtaba Kolahdouzi, Ali Etemad

    Abstract: We present a novel approach to mitigate bias in facial expression recognition (FER) models. Our method aims to reduce sensitive attribute information such as gender, age, or race, in the embeddings produced by FER models. We employ a kernel mean shrinkage estimator to estimate the kernel mean of the distributions of the embeddings associated with different sensitive attribute groups, such as young… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  27. arXiv:2306.02014  [pdf, other

    cs.CV cs.LG

    Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts

    Authors: Pritam Sarkar, Ahmad Beirami, Ali Etemad

    Abstract: Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distributi… ▽ More

    Submitted 30 October, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Spotlight

  28. arXiv:2306.01229  [pdf, other

    cs.CV

    Exploring the Boundaries of Semi-Supervised Facial Expression Recognition: Learning from In-Distribution, Out-of-Distribution, and Unconstrained Data

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: Deep learning-based methods have been the key driving force behind much of the recent success of facial expression recognition (FER) systems. However, the need for large amounts of labelled data remains a challenge. Semi-supervised learning offers a way to overcome this limitation, allowing models to learn from a small amount of labelled data along with a large unlabelled dataset. While semi-super… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  29. arXiv:2306.01222  [pdf, other

    cs.LG cs.CV

    Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: We propose UnMixMatch, a semi-supervised learning framework which can learn effective representations from unconstrained unlabelled data in order to scale up performance. Most existing semi-supervised methods rely on the assumption that labelled and unlabelled samples are drawn from the same distribution, which limits the potential for improvement through the use of free-living unlabeled data. Con… ▽ More

    Submitted 12 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted in AAAI Conference on Artificial Intelligence (AAAI-24)

  30. arXiv:2306.01195  [pdf, other

    cs.CV

    Consistency-guided Prompt Learning for Vision-Language Models

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstre… ▽ More

    Submitted 27 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Published as a conference paper at ICLR 2024

  31. arXiv:2306.01141  [pdf, other

    cs.CV eess.IV

    Privacy-Preserving Remote Heart Rate Estimation from Facial Videos

    Authors: Divij Gupta, Ali Etemad

    Abstract: Remote Photoplethysmography (rPPG) is the process of estimating PPG from facial videos. While this approach benefits from contactless interaction, it is reliant on videos of faces, which often constitutes an important privacy concern. Recent research has revealed that deep learning techniques are vulnerable to attacks, which can result in significant data breaches making deep rPPG estimation even… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted in IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023

  32. An Ensemble Semi-Supervised Adaptive Resonance Theory Model with Explanation Capability for Pattern Classification

    Authors: Farhad Pourpanah, Chee Peng Lim, Ali Etemad, Q. M. Jonathan Wu

    Abstract: Most semi-supervised learning (SSL) models entail complex structures and iterative training processes as well as face difficulties in interpreting their predictions to users. To address these issues, this paper proposes a new interpretable SSL model using the supervised and unsupervised Adaptive Resonance Theory (ART) family of networks, which is denoted as SSL-ART. Firstly, SSL-ART adopts an unsu… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: 13 pages, 8 figures

  33. arXiv:2304.06427  [pdf, other

    cs.LG cs.AI eess.SP

    In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection

    Authors: Sahar Soltanieh, Javad Hashemi, Ali Etemad

    Abstract: This paper presents a systematic investigation into the effectiveness of Self-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmia detection. We begin by conducting a novel analysis of the data distributions on three popular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To the best of our knowledge, our study is the first to quantitatively explore and characteriz… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: This paper has been published in the IEEE Journal of Biomedical and Health Informatics (JBHI). Copyright IEEE. Please cite as: S. Soltanieh, J. Hashemi and A. Etemad, "In-Distribution and Out-of-Distribution Self-Supervised ECG Representation Learning for Arrhythmia Detection," in IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 2, pp. 789-800, Feb. 2024

  34. arXiv:2304.04273  [pdf, other

    cs.LG cs.HC eess.SP

    Multimodal Brain-Computer Interface for In-Vehicle Driver Cognitive Load Measurement: Dataset and Baselines

    Authors: Prithila Angkan, Behnam Behinaein, Zunayed Mahmud, Anubhav Bhatti, Dirk Rodenburg, Paul Hungler, Ali Etemad

    Abstract: Through this paper, we introduce a novel driver cognitive load assessment dataset, CL-Drive, which contains Electroencephalogram (EEG) signals along with other physiological signals such as Electrocardiography (ECG) and Electrodermal Activity (EDA) as well as eye tracking data. The data was collected from 21 subjects while driving in an immersive vehicle simulator, in various driving conditions, t… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: 16 pages, 9 figures, 11 tables. This work has been accepted to the IEEE Transactions on Intelligent Transportation Systems. \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

  35. arXiv:2303.08026  [pdf, other

    cs.SD cs.AI eess.AS

    A Study on Bias and Fairness In Deep Speaker Recognition

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: With the ubiquity of smart devices that use speaker recognition (SR) systems as a means of authenticating individuals and personalizing their services, fairness of SR systems has becomes an important point of focus. In this paper we study the notion of fairness in recent SR systems based on 3 popular and relevant definitions, namely Statistical Parity, Equalized Odds, and Equal Opportunity. We exa… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  36. arXiv:2303.05691  [pdf, other

    cs.CV cs.LG

    Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

    Authors: Vandad Davoodnia, Ali Etemad

    Abstract: Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  37. arXiv:2302.13170  [pdf, other

    cs.CV

    Partial Label Learning for Emotion Recognition from EEG

    Authors: Guangyi Zhang, Ali Etemad

    Abstract: Fully supervised learning has recently achieved promising performance in various electroencephalography (EEG) learning tasks by training on large datasets with ground truth labels. However, labeling EEG data for affective experiments is challenging, as it can be difficult for participants to accurately distinguish between similar emotions, resulting in ambiguous labeling (reporting multiple emotio… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: 10 pages, 6 figures

  38. arXiv:2302.02845  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Representation Learning by Distilling Video as Privileged Information

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Deep audio representation learning using multi-modal audio-visual data often leads to a better performance compared to uni-modal approaches. However, in real-world scenarios both modalities are not always available at the time of inference, leading to performance degradation by models trained for multi-modal inference. In this work, we propose a novel approach for deep audio representation learnin… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  39. arXiv:2211.14912  [pdf, other

    cs.CV

    Impact of Labelled Set Selection and Supervision Policies on Semi-supervised Learning

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important. Existing literature on semi-supervised learning randomly sample a limited number of data points for labelling. All these labelled samples are then used along with the unlabelled data throughout the training proces… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  40. arXiv:2211.13929  [pdf, other

    cs.CV

    XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

    Authors: Pritam Sarkar, Ali Etemad

    Abstract: We present XKD, a novel self-supervised framework to learn meaningful representations from unlabelled videos. XKD is trained with two pseudo objectives. First, masked data reconstruction is performed to learn modality-specific representations from audio and visual streams. Next, self-supervised cross-modal knowledge distillation is performed between the two modalities through a teacher-student set… ▽ More

    Submitted 24 December, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: AAAI 2024

  41. arXiv:2209.06322  [pdf, other

    cs.CV

    FaceTopoNet: Facial Expression Recognition using Face Topology Learning

    Authors: Mojtaba Kolahdouzi, Alireza Sepas-Moghaddam, Ali Etemad

    Abstract: Prior work has shown that the order in which different components of the face are learned using a sequential learner can play an important role in the performance of facial expression recognition systems. We propose FaceTopoNet, an end-to-end deep model for facial expression recognition, which is capable of learning an effective tree topology of the face. Our model then traverses the learned tree… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  42. arXiv:2209.00990  [pdf, other

    eess.SP cs.CV cs.LG

    Self-Supervised Human Activity Recognition with Localized Time-Frequency Contrastive Representation Learning

    Authors: Setareh Rahimi Taghanaki, Michael Rainbow, Ali Etemad

    Abstract: In this paper, we propose a self-supervised learning solution for human activity recognition with smartphone accelerometer data. We aim to develop a model that learns strong representations from accelerometer signals, in order to perform robust human activity classification, while reducing the model's reliance on class labels. Specifically, we intend to enable cross-dataset transfer learning such… ▽ More

    Submitted 26 August, 2022; originally announced September 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  43. arXiv:2209.00760  [pdf, other

    cs.CV

    Temporal Contrastive Learning with Curriculum

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. More specifically, ConCur starts the contrastive training with easy positive samples (temporally close and semantically similar clips), and as the training progresses, it increases the temporal span effectively sampling hard positives (t… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

  44. arXiv:2208.00544  [pdf, other

    cs.CV

    Analysis of Semi-Supervised Methods for Facial Expression Recognition

    Authors: Shuvendu Roy, Ali Etemad

    Abstract: Training deep neural networks for image recognition often requires large-scale human annotated data. To reduce the reliance of deep neural solutions on labeled data, state-of-the-art semi-supervised methods have been proposed in the literature. Nonetheless, the use of such semi-supervised methods has been quite rare in the field of facial expression recognition (FER). In this paper, we present a c… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: Accepted at IEEE 10th International Conference on Affective Computing and Intelligent Interaction (ACII), 2022

  45. arXiv:2207.10006  [pdf, other

    cs.SD eess.AS

    Fine-grained Early Frequency Attention for Deep Speaker Recognition

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Attention mechanisms have emerged as important tools that boost the performance of deep models by allowing them to focus on key parts of learned embeddings. However, current attention mechanisms used in speaker recognition tasks fail to consider fine-grained information items such as frequency bins in input spectral representations used by the deep networks. To address this issue, we propose the n… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted In IJCNN 2022

  46. arXiv:2207.06985  [pdf, other

    cs.CV

    ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

    Authors: Mohsen Zand, Ali Etemad, Michael Greenspan

    Abstract: We present ObjectBox, a novel single-stage anchor-free and highly generalizable object detection approach. As opposed to both existing anchor-based and anchor-free detectors, which are more biased toward specific object scales in their label assignments, we use only object center locations as positive samples and treat all objects equally in different feature levels regardless of the objects' size… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 Oral

  47. Multistream Gaze Estimation with Anatomical Eye Region Isolation by Synthetic to Real Transfer Learning

    Authors: Zunayed Mahmud, Paul Hungler, Ali Etemad

    Abstract: We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information through a multistream framework. Our proposed solution comprises two components, first a network for isolating anatomical eye regions, and a second network for multistream gaze estimation. The eye region isolation is performed with a U-Net style network which we train… ▽ More

    Submitted 12 February, 2024; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 15 pages, 7 figures, 14 tables. This work has been accepted to the IEEE Transactions on Artificial Intelligence $©$ 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Report number: 2691-4581

    Journal ref: IEEE Transactions on Artificial Intelligence, 2024

  48. arXiv:2206.07656  [pdf, other

    eess.SP cs.AI cs.LG

    Analysis of Augmentations for Contrastive ECG Representation Learning

    Authors: Sahar Soltanieh, Ali Etemad, Javad Hashemi

    Abstract: This paper systematically investigates the effectiveness of various augmentations for contrastive self-supervised learning of electrocardiogram (ECG) signals and identifies the best parameters. The baseline of our proposed self-supervised framework consists of two main parts: the contrastive learning and the downstream task. In the first stage, we train an encoder using a number of augmentations t… ▽ More

    Submitted 30 May, 2022; originally announced June 2022.

    Comments: This paper has been accepted to IJCNN 2022 conference

  49. Estimating Pose from Pressure Data for Smart Beds with Deep Image-based Pose Estimators

    Authors: Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

    Abstract: In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure d… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: The version of record of this article, first published in Applied Intelligence, is available online at Publisher's website https://doi.org/10.1007/s10489-021-02418-y. arXiv admin note: substantial text overlap with arXiv:1908.08919

    Report number: 1573-7497

    Journal ref: Applied Intelligence (2021): 1-15

  50. arXiv:2206.04625  [pdf, other

    cs.LG cs.CV eess.SP

    AttX: Attentive Cross-Connections for Fusion of Wearable Signals in Emotion Recognition

    Authors: Anubhav Bhatti, Behnam Behinaein, Paul Hungler, Ali Etemad

    Abstract: We propose cross-modal attentive connections, a new dynamic and effective technique for multimodal representation learning from wearable data. Our solution can be integrated into any stage of the pipeline, i.e., after any convolutional layer or block, to create intermediate connections between individual streams responsible for processing each modality. Additionally, our method benefits from two p… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 13 pages, 8 figures