Skip to main content

Showing 1–16 of 16 results for author: Farazi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14896  [pdf, other

    eess.IV cs.CV

    SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper to 2024 MICCAI

  2. arXiv:2302.03830  [pdf, other

    cs.CV

    TetCNN: Convolutional Neural Networks on Tetrahedral Meshes

    Authors: Mohammad Farazi, Zhangsihao Yang, Wenhui Zhu, Peijie Qiu, Yalin Wang

    Abstract: Convolutional neural networks (CNN) have been broadly studied on images, videos, graphs, and triangular meshes. However, it has seldom been studied on tetrahedral meshes. Given the merits of using volumetric meshes in applications like brain image analysis, we introduce a novel interpretable graph CNN framework for the tetrahedral mesh structure. Inspired by ChebyNet, our model exploits the volume… ▽ More

    Submitted 13 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to Information Processing in Medical Imaging (IPMI 2023) conference

    MSC Class: 68T07 ACM Class: I.3.5; I.4.0

  3. arXiv:2302.03003  [pdf, other

    eess.IV cs.CV stat.ML

    OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

    Authors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob M. Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang

    Abstract: Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an… ▽ More

    Submitted 8 April, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

  4. arXiv:2302.02991  [pdf, other

    eess.IV cs.CV stat.ML

    Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images

    Authors: Wenhui Zhu, Peijie Qiu, Mohammad Farazi, Keshav Nandakumar, Oana M. Dumitrascu, Yalin Wang

    Abstract: Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we propo… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)

  5. arXiv:2210.09466  [pdf, other

    cs.CV cs.LG

    Anisotropic Multi-Scale Graph Convolutional Network for Dense Shape Correspondence

    Authors: Mohammad Farazi, Wenhui Zhu, Zhangsihao Yang, Yalin Wang

    Abstract: This paper studies 3D dense shape correspondence, a key shape analysis application in computer vision and graphics. We introduce a novel hybrid geometric deep learning-based model that learns geometrically meaningful and discretization-independent features with a U-Net model as the primary node feature extraction module, followed by a successive spectral-based graph convolutional network. To creat… ▽ More

    Submitted 10 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 10 pages, 8 figures, 2 tables. Accepted as a conference paper to WACV2023

    MSC Class: 68T07 ACM Class: I.3.5; I.4.0

  6. arXiv:2206.09111  [pdf, other

    cs.CV cs.AI

    VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection

    Authors: Yu Cui, Moshiur Farazi

    Abstract: Visual Relationship Detection (VRD) impels a computer vision model to 'see' beyond an individual object instance and 'understand' how different objects in a scene are related. The traditional way of VRD is first to detect objects in an image and then separately predict the relationship between the detected object instances. Such a disjoint approach is prone to predict redundant relationship tags (… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Published at International Conference on Pattern Recognition (ICPR) 2022, Montreal Quebec

  7. arXiv:2206.08558  [pdf, other

    cs.LG

    How You Start Matters for Generalization

    Authors: Sameera Ramasinghe, Lachlan MacDonald, Moshiur Farazi, Hemanth Saratchandran, Simon Lucey

    Abstract: Characterizing the remarkable generalization properties of over-parameterized neural networks remains an open problem. In this paper, we promote a shift of focus towards initialization rather than neural architecture or (stochastic) gradient descent to explain this implicit regularization. Through a Fourier lens, we derive a general result for the spectral bias of neural networks and show that the… ▽ More

    Submitted 10 July, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

  8. Recursive Training for Zero-Shot Semantic Segmentation

    Authors: Ce Wang, Moshiur Farazi, Nick Barnes

    Abstract: General purpose semantic segmentation relies on a backbone CNN network to extract discriminative features that help classify each image pixel into a 'seen' object class (ie., the object classes available during training) or a background class. Zero-shot semantic segmentation is a challenging task that requires a computer vision model to identify image pixels belonging to an object class which it h… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Journal ref: 2021 International Joint Conference on Neural Networks (IJCNN)

  9. Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM

    Authors: Zahidul Islam, Mohammad Rukonuzzaman, Raiyan Ahmed, Md. Hasanul Kabir, Moshiur Farazi

    Abstract: Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one… ▽ More

    Submitted 20 April, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

    Comments: Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)

    Journal ref: 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-8

  10. Improving Action Quality Assessment using Weighted Aggregation

    Authors: Shafkat Farabi, Hasibul Himel, Fakhruddin Gazzali, Md. Bakhtiar Hasan, Md. Hasanul Kabir, Moshiur Farazi

    Abstract: Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA divide RGB videos into short clips, transform these clips to higher-level representations using Convolutional 3D (C3D) networks, and aggregate them through averaging. These higher-level repres… ▽ More

    Submitted 11 March, 2022; v1 submitted 21 February, 2021; originally announced February 2021.

    Comments: Accepted for presentation at the IbPRIA 2022: 10th Iberian Conference on Pattern Recognition and Image Analysis Aveiro, Portugal. May 4-6, 2022

  11. arXiv:2011.13055  [pdf, other

    cs.CV

    Rethinking conditional GAN training: An approach using geometrically structured latent manifolds

    Authors: Sameera Ramasinghe, Moshiur Farazi, Salman Khan, Nick Barnes, Stephen Gould

    Abstract: Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds. Although efforts have been made to improve results, they can suffer from unpleasant side-effects such as the topology mismatch between latent and output spaces. In contrast, we tackle this problem from a geomet… ▽ More

    Submitted 2 June, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

  12. arXiv:2010.01725  [pdf, other

    cs.CV cs.AI

    Attention Guided Semantic Relationship Parsing for Visual Question Answering

    Authors: Moshiur Farazi, Salman Khan, Nick Barnes

    Abstract: Humans explain inter-object relationships with semantic labels that demonstrate a high-level understanding required to perform complex Vision-Language tasks such as Visual Question Answering (VQA). However, existing VQA models represent relationships as a combination of object-level visual features which constrain a model to express interactions between objects in a single domain, while the model… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

  13. arXiv:2001.07059  [pdf, other

    cs.CV cs.CC

    Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

    Authors: Moshiur R. Farazi, Salman H. Khan, Nick Barnes

    Abstract: Visual Question Answering (VQA) has emerged as a Visual Turing Test to validate the reasoning ability of AI agents. The pivot to existing VQA models is the joint embedding that is learned by combining the visual features from an image and the semantic features from a given question. Consequently, a large body of literature has focused on develo** complex joint embedding strategies coupled with v… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  14. Question-Agnostic Attention for Visual Question Answering

    Authors: Moshiur R Farazi, Salman H Khan, Nick Barnes

    Abstract: Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block). The resulting multimodal representations define an intermediate feature spa… ▽ More

    Submitted 5 September, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

    Comments: To appear in the proceedings of International Conference on Pattern Recognition (ICPR) 2020

  15. arXiv:1811.12772  [pdf, other

    cs.CV

    From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

    Authors: Moshiur R Farazi, Salman H Khan, Nick Barnes

    Abstract: Current Visual Question Answering (VQA) systems can answer intelligent questions about `Known' visual content. However, their performance drops significantly when questions about visually and linguistically `Unknown' concepts are presented during inference (`Open-world' scenario). A practical VQA system should be able to deal with novel concepts in real world settings. To address this problem, we… ▽ More

    Submitted 30 November, 2018; originally announced November 2018.

  16. arXiv:1805.04247  [pdf, other

    cs.CV cs.AI cs.CL

    Reciprocal Attention Fusion for Visual Question Answering

    Authors: Moshiur R Farazi, Salman H Khan

    Abstract: Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is furthe… ▽ More

    Submitted 22 July, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: To appear in the British Machine Vision Conference (BMVC), September 2018

    Journal ref: Proceedings of the British Machine Vision Conference (250) 2018