Skip to main content

Showing 1–10 of 10 results for author: Mortazavi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08859  [pdf, other

    cs.CV

    Fusion of regional and sparse attention in Vision Transformers

    Authors: Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara

    Abstract: Modern vision transformers leverage visually inspired local interaction between pixels through attention computed within window or grid regions, in contrast to the global attention employed in the original ViT. Regional attention restricts pixel interactions within specific regions, while sparse attention disperses them across sparse grids. These differing approaches pose a challenge between maint… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted as a Workshop Paper at T4V@CVPR2024. arXiv admin note: substantial text overlap with arXiv:2403.04200

  2. arXiv:2405.19359  [pdf, other

    eess.SP cs.LG

    Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction

    Authors: Nabil Ibtehaz, Masood Mortazavi

    Abstract: Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG si… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted as a Workshop Paper at TS4H@ICLR2024

    Journal ref: ICLR 2024 Workshop on Learning from Time Series For Health

  3. arXiv:2403.04200  [pdf, other

    cs.CV

    ACC-ViT : Atrous Convolution's Comeback in Vision Transformers

    Authors: Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara

    Abstract: Transformers have elevated to the state-of-the-art vision architectures through innovations in attention mechanism inspired from visual perception. At present two classes of attentions prevail in vision transformers, regional and sparse attention. The former bounds the pixel interactions within a region; the latter spreads them across sparse grids. The opposing natures of them have resulted in a d… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2310.15318  [pdf, other

    cs.LG cs.AI

    HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks

    Authors: Yihong Ma, Ning Yan, Jiayu Li, Masood Mortazavi, Nitesh V. Chawla

    Abstract: Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits… ▽ More

    Submitted 23 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to WWW 2024 as research paper

  5. arXiv:2306.16541  [pdf, other

    cs.CV cs.GR cs.LG cs.MM

    Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

    Authors: Chuanyue Shen, Letian Zhang, Zhangsihao Yang, Masood Mortazavi, Xiyun Song, Liang Peng, Heather Yu

    Abstract: Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023 ECV Workshop

  6. arXiv:2211.02052  [pdf, ps, other

    cs.LG cs.AI

    Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

    Authors: Masood S. Mortazavi, Tiancheng Qin, Ning Yan

    Abstract: Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditiona… ▽ More

    Submitted 17 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

    ACM Class: A.1; C.3; C.4; G.3; H.1; I.2; I.6; J.6

  7. arXiv:2103.16083  [pdf, other

    cs.CV

    Fully Convolutional Scene Graph Generation

    Authors: Hengyue Liu, Ning Yan, Masood S. Mortazavi, Bir Bhanu

    Abstract: This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously. Most of the scene graph generation frameworks use a pre-trained two-stage object detector, like Faster R-CNN, and build scene graphs using bounding box features. Such pipeline usually has a large number of parameters and low inference speed. Unlike these approaches, FCS… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021 Oral

  8. arXiv:2010.15288  [pdf, other

    cs.LG cs.CV cs.IT cs.MM

    Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks

    Authors: Masood S. Mortazavi

    Abstract: Semantically-aligned $(speech, image)$ datasets can be used to explore "visually-grounded speech". In a majority of existing investigations, features of an image signal are extracted using neural networks "pre-trained" on other tasks (e.g., classification on ImageNet). In still others, pre-trained networks are used to extract audio features prior to semantic embedding. Without "transfer learning"… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    MSC Class: 68T01; 68T05; 68T07; 68T10; 62P15 ACM Class: I.2; I.2.0; I.2.6; I.2.7; I.2.11; I.5; I.5.1; I.5.2; I.5.4; I.4.10; H.5.1; H.5.2; H.3.3

    Journal ref: Proceedings of INTERSPEECH 2020

  9. arXiv:2003.03877  [pdf, other

    cs.CV

    FoCL: Feature-Oriented Continual Learning for Generative Models

    Authors: Qicheng Lao, Mehrzad Mortazavi, Marzieh Tahaei, Francis Dutil, Thomas Fevens, Mohammad Havaei

    Abstract: In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distribu… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

  10. arXiv:2003.02314  [pdf, other

    cs.CV cs.LG eess.IV

    The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study

    Authors: Masood S. Mortazavi, Ning Yan

    Abstract: In-painting networks use existing pixels to generate appropriate pixels to fill "holes" placed on parts of an image. A 2-D in-painting network's input usually consists of (1) a three-channel 2-D image, and (2) an additional channel for the "holes" to be in-painted in that image. In this paper, we study the robustness of a given in-painting neural network against variations in hole geometry distrib… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.