Skip to main content

Showing 1–50 of 192 results for author: Khan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01440  [pdf, other

    cs.LG

    GAT-Steiner: Rectilinear Steiner Minimal Tree Prediction Using GNNs

    Authors: Bugra Onal, Eren Dogan, Muhammad Hadir Khan, Matthew R. Guthaus

    Abstract: The Rectilinear Steiner Minimum Tree (RSMT) problem is a fundamental problem in VLSI placement and routing and is known to be NP-hard. Traditional RSMT algorithms spend a significant amount of time on finding Steiner points to reduce the total wire length or use heuristics to approximate producing sub-optimal results. We show that Graph Neural Networks (GNNs) can be used to predict optimal Steiner… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Preprint for The 2024 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  2. arXiv:2406.17190  [pdf, other

    cs.SD cs.LG eess.AS

    Sound Tagging in Infant-centric Home Soundscapes

    Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

    Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in IEEE/ACM CHASE 2024

  3. arXiv:2406.14498  [pdf, other

    cs.CL

    LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

    Authors: Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, Bashima Islam

    Abstract: Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review at ARR (for EMNLP 2024)

  4. arXiv:2406.08775  [pdf, other

    cs.CV

    ALINA: Advanced Line Identification and Notation Algorithm

    Authors: Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Siddhartha Bhattacharyya, Natasha Neogi, Raja Muthalagu

    Abstract: Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annot… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Paper has been accepted to The 3rd CVPR Workshop on Vision Datasets Understanding, 2024

  5. arXiv:2406.06533  [pdf, other

    cs.AR cs.AI

    Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC)

    Authors: Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra

    Abstract: Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain… ▽ More

    Submitted 20 April, 2024; originally announced June 2024.

    Comments: Published in DVCon Europe 2023

  6. arXiv:2405.19292  [pdf, other

    cs.MA

    Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets

    Authors: Hamzah I. Khan, Adam J. Thorpe, David Fridovich-Keil

    Abstract: Autonomous agents operating around human actors must consider how their behaviors might affect those humans, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods to address this problem use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivation… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.14497  [pdf, other

    cs.CV

    Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

    Authors: Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

    Abstract: In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set o… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.14323  [pdf, other

    cs.CY

    SmartCS: Enabling the Creation of ML-Powered Computer Vision Mobile Apps for Citizen Science Applications without Coding

    Authors: Fahim Hasan Khan, Akila de Silva, Gregory Dusek, James Davis, Alex Pang

    Abstract: It is undeniable that citizen science contributes to the advancement of various fields of study. There are now software tools that facilitate the development of citizen science apps. However, apps developed with these tools rely on individual human skills to correctly collect useful data. Machine learning (ML)-aided apps provide on-field guidance to citizen scientists on data collection tasks. How… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2405.13518  [pdf, other

    cs.CV

    PerSense: Personalized Instance Segmentation in Dense Images

    Authors: Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan

    Abstract: Leveraging large-scale pre-training, vision foundational models showcase notable performance benefits. While recent years have witnessed significant advancements in segmentation algorithms, existing models still face challenges to automatically segment personalized instances in dense and crowded scenarios. The primary factor behind this limitation stems from bounding box-based detections, which ar… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Technical report of PerSense

  10. arXiv:2405.12986  [pdf

    eess.IV cs.AI cs.CV

    A Novel Feature Map Enhancement Technique Integrating Residual CNN and Transformer for Alzheimer Diseases Diagnosis

    Authors: Saddam Hussain Khan

    Abstract: Alzheimer diseases (ADs) involves cognitive decline and abnormal brain protein accumulation, necessitating timely diagnosis for effective treatment. Therefore, CAD systems leveraging deep learning advancements have demonstrated success in AD detection but pose computational intricacies and the dataset minor contrast, structural, and texture variations. In this regard, a novel hybrid FME-Residual-H… ▽ More

    Submitted 25 May, 2024; v1 submitted 30 March, 2024; originally announced May 2024.

    Comments: 28 Pages, 11 Figures, 3 Tables

  11. arXiv:2405.11829  [pdf, other

    cs.LG cs.CV

    Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning

    Authors: Hikmat Khan, Ghulam Rasool, Nidhal Carla Bouaynaya

    Abstract: Continual learning focuses on learning non-stationary data distribution without forgetting previous knowledge. Rehearsal-based approaches are commonly used to combat catastrophic forgetting. However, these approaches suffer from a problem called "rehearsal memory overfitting, " where the model becomes too specialized on limited memory samples and loses its ability to generalize effectively. As a r… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  12. arXiv:2405.07698  [pdf, other

    cs.CV

    oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving

    Authors: Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Dheeraj Varma Chittari Macharavtu, Andreas Dengel

    Abstract: Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  13. arXiv:2405.06919  [pdf, other

    cs.CY cs.CL

    Automating Thematic Analysis: How LLMs Analyse Controversial Topics

    Authors: Awais Hameed Khan, Hiruni Kegalle, Rhea D'Silva, Ned Watt, Daniel Whelan-Shamy, Lida Ghahremanlou, Liam Magee

    Abstract: Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support 'sensemaking', making sense of a complex environment or subject by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support t… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures

    ACM Class: K.4.2

  14. arXiv:2404.14588  [pdf

    cs.LG cs.CV

    Brain-Inspired Continual Learning-Robust Feature Distillation and Re-Consolidation for Class Incremental Learning

    Authors: Hikmat Khan, Nidhal Carla Bouaynaya, Ghulam Rasool

    Abstract: Artificial intelligence (AI) and neuroscience share a rich history, with advancements in neuroscience sha** the development of AI systems capable of human-like knowledge retention. Leveraging insights from neuroscience and existing research in adversarial and continual learning, we introduce a novel framework comprising two core concepts: feature distillation and re-consolidation. Our framework,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  16. arXiv:2404.09342  [pdf, other

    cs.CV cs.SD eess.AS

    Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

    Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACM Multimedia Conference - Grand Challenge

  17. arXiv:2404.01352  [pdf, other

    physics.flu-dyn cs.AI cs.CV cs.GR

    VortexViz: Finding Vortex Boundaries by Learning from Particle Trajectories

    Authors: Akila de Silva, Nicholas Tee, Omkar Ghanekar, Fahim Hasan Khan, Gregory Dusek, James Davis, Alex Pang

    Abstract: Vortices are studied in various scientific disciplines, offering insights into fluid flow behavior. Visualizing the boundary of vortices is crucial for understanding flow phenomena and detecting flow irregularities. This paper addresses the challenge of accurately extracting vortex boundaries using deep learning techniques. While existing methods primarily train on velocity components, we propose… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Under review

  18. arXiv:2403.16194  [pdf, other

    cs.CV

    Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

    Authors: Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood, Muhammad Haris Khan

    Abstract: Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of develo** a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of d… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024

  19. arXiv:2403.11674  [pdf, other

    cs.CV

    Towards Generalizing to Unseen Domains with Few Labels

    Authors: Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana, Muhammad Haris Khan

    Abstract: We approach the challenge of addressing semi-supervised domain generalization (SSDG). Specifically, our aim is to obtain a model that learns domain-generalizable features by leveraging a limited subset of labelled data alongside a substantially larger pool of unlabeled data. Existing domain generalization (DG) methods which are unable to exploit unlabeled data perform poorly compared to semi-super… ▽ More

    Submitted 7 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  20. arXiv:2403.02782  [pdf, other

    cs.CV

    Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

    Authors: Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan

    Abstract: In this paper, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. Existing works have attained partial success by extensively leveraging various sources of information av… ▽ More

    Submitted 15 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures, (supplementary material: 9 pages, 5 figures), accepted to CVPR 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 , Pages 18816-18826

  21. arXiv:2402.01781  [pdf, other

    cs.CL cs.AI cs.LG

    When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

    Authors: Norah Alzahrani, Hisham Abdullah Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq, M Saiful Bari, Haidar Khan

    Abstract: Large Language Model (LLM) leaderboards based on benchmark rankings are regularly used to guide practitioners in model selection. Often, the published leaderboard rankings are taken at face value - we show this is a (potentially costly) mistake. Under existing leaderboards, the relative performance of LLMs is highly sensitive to (often minute) details. We show that for popular multiple-choice ques… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: updated with ACL 2024 camera ready version

  22. arXiv:2402.00128  [pdf, other

    cs.CV

    Real-time Traffic Object Detection for Autonomous Driving

    Authors: Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Andreas Dengel

    Abstract: With recent advances in computer vision, it appears that autonomous driving will be part of modern society sooner rather than later. However, there are still a significant number of concerns to address. Although modern computer vision techniques demonstrate superior performance, they tend to prioritize accuracy over efficiency, which is a crucial aspect of real-time applications. Large object dete… ▽ More

    Submitted 29 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  23. arXiv:2401.13965  [pdf, other

    cs.CV

    Improving Pseudo-labelling and Enhancing Robustness for Semi-Supervised Domain Generalization

    Authors: Adnan Khan, Mai A. Shaaban, Muhammad Haris Khan

    Abstract: Beyond attaining domain generalization (DG), visual recognition models should also be data-efficient during learning by leveraging limited labels. We study the problem of Semi-Supervised Domain Generalization (SSDG) which is crucial for real-world applications like automated healthcare. SSDG requires learning a cross-domain generalizable model when the given training data is only partially labelle… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  24. arXiv:2401.13785  [pdf, other

    cs.CV

    Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction

    Authors: Sathira Silva, Savindu Bhashitha Wannigama, Gihan Jayatilaka, Muhammad Haris Khan, Roshan Ragel

    Abstract: Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks capture finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues such as tri-perspective view embeddings (T… ▽ More

    Submitted 4 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  25. arXiv:2401.11621  [pdf

    q-fin.ST cs.CE cs.LG

    A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting

    Authors: Riaz Ud Din, Salman Ahmed, Saddam Hussain Khan

    Abstract: Forecasting speculative stock prices is essential for effective investment risk management that drives the need for the development of innovative algorithms. However, the speculative nature, volatility, and complex sequential dependencies within financial markets present inherent challenges which necessitate advanced techniques. This paper proposes a novel framework, CAB-XDE (customized attention… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 30 pages, 16 Figures, 4 Tables

  26. arXiv:2401.11358  [pdf, other

    cs.CV

    ANNA: A Deep Learning Based Dataset in Heterogeneous Traffic for Autonomous Vehicles

    Authors: Mahedi Kamal, Tasnim Fariha, Afrina Kabir Zinia, Md. Abu Syed, Fahim Hasan Khan, Md. Mahbubur Rahman

    Abstract: Recent breakthroughs in artificial intelligence offer tremendous promise for the development of self-driving applications. Deep Neural Networks, in particular, are being utilized to support the operation of semi-autonomous cars through object identification and semantic segmentation. To assess the inadequacy of the current dataset in the context of autonomous and semi-autonomous cars, we created a… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  27. arXiv:2401.09354  [pdf

    eess.AS cs.AI cs.SD

    Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

    Authors: Hania Khan, Aleena Fatima Khalid, Zaryab Hassan

    Abstract: This research investigates the transferability of Automatic Speech Recognition (ASR)-robust Natural Language Understanding (NLU) models from controlled experimental conditions to practical, real-world applications. Focused on smart home automation commands in Urdu, the study assesses model performance under diverse noise profiles, linguistic variations, and ASR error scenarios. Leveraging the Urdu… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  28. arXiv:2312.00634  [pdf

    eess.IV cs.CV

    A Recent Survey of Vision Transformers for Medical Image Segmentation

    Authors: Asifullah Khan, Zunaira Rauf, Abdul Rehman Khan, Saima Rathore, Saddam Hussain Khan, Najmus Saher Shah, Umair Farooq, Hifsa Asif, Aqsa Asif, Umme Zahoora, Rafi Ullah Khalil, Suleman Qamar, Umme Hani Asif, Faiza Babar Khan, Abdul Majid, Jeonghwan Gwak

    Abstract: Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, inte… ▽ More

    Submitted 18 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

  29. arXiv:2311.10754  [pdf

    eess.IV cs.CV

    A Recent Survey of the Advancements in Deep Learning Techniques for Monkeypox Disease Detection

    Authors: Saddam Hussain Khan, Rashid Iqbal, Saeeda Naz

    Abstract: Monkeypox (MPox) is a zoonotic infectious disease induced by the MPox Virus, part of the poxviridae orthopoxvirus group initially discovered in Africa and gained global attention in mid-2022 with cases reported outside endemic areas. Symptoms include headaches, chills, fever, smallpox, measles, and chickenpox-like skin manifestations and the WHO officially announced MPox as a global public health… ▽ More

    Submitted 23 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 53 pages, 16 figures, 7 tables

  30. Smell of Fire Increases Behavioural Realism in Virtual Reality: A Case Study on a Recreated MGM Grand Hotel Fire

    Authors: Humayun Khan, Daniel Nilsson

    Abstract: Virtual reality allows creating highly immersive visual and auditory experiences, making users feel physically present in the environment. This makes it an ideal platform to simulate dangerous scenarios, including fire evacuation, and study human behaviour without exposing users to harmful elements. However, human perception of the surroundings is based on the integration of multiple sensory cues… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted at IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2023, 9 pages

  31. arXiv:2311.09086  [pdf, other

    cs.CL cs.AI cs.SI

    The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

    Authors: Arnav Arora, Maha **adoss, Cheshta Arora, Denny George, Brindaalakshmi, Haseena Dawood Khan, Kirti Rawat, Div, Ritash, Seema Mathur, Shivani Yadav, Shehla Rashid Shora, Rie Raut, Sumit Pawar, Apurva Paithane, Sonia, Vivek, Dharini Priscilla, Khairunnisha, Grace Banu, Ambika Tandon, Rishav Thakker, Rahul Dev Korra, Aatman Vaidya, Tarunima Prabhakar

    Abstract: Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  32. arXiv:2311.04815  [pdf, other

    cs.CV

    Domain Adaptive Object Detection via Balancing Between Self-Training and Adversarial Learning

    Authors: Muhammad Akhtar Munir, Muhammad Haris Khan, M. Saquib Sarfraz, Mohsen Ali

    Abstract: Deep learning based object detectors struggle generalizing to a new target domain bearing significant variations in object and background. Most current methods align domains by using image or instance-level adversarial feature alignment. This often suffers due to unwanted background and lacks class-specific alignment. A straightforward approach to promote class-level alignment is to use high confi… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 45, Issue: 12, December 2023); Extended version of our conference paper, arXiv link: arXiv:2110.00249

  33. arXiv:2311.03570  [pdf, other

    cs.CV

    Cal-DETR: Calibrated Detection Transformer

    Authors: Muhammad Akhtar Munir, Salman Khan, Muhammad Haris Khan, Mohsen Ali, Fahad Shahbaz Khan

    Abstract: Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little at… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS 2023

  34. Leadership Inference for Multi-Agent Interactions

    Authors: Hamzah Khan, David Fridovich-Keil

    Abstract: Effectively predicting intent and behavior requires inferring leadership in multi-agent interactions. Dynamic games provide an expressive theoretical framework for modeling these interactions. Employing this framework, we propose a novel method to infer the leader in a two-agent game by observing the agents' behavior in complex, long-horizon interactions. We make two contributions. First, we intro… ▽ More

    Submitted 8 April, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures, accepted to IEEE Robotics and Automation Letters

  35. arXiv:2310.17729  [pdf

    cs.LG cs.AI cs.CV

    Improving Traffic Density Forecasting in Intelligent Transportation Systems Using Gated Graph Neural Networks

    Authors: Razib Hayat Khan, Jonayet Miah, S M Yasir Arafat, M M Mahbubul Syeed, Duc M Ca

    Abstract: This study delves into the application of graph neural networks in the realm of traffic forecasting, a crucial facet of intelligent transportation systems. Accurate traffic predictions are vital for functions like trip planning, traffic control, and vehicle routing in such systems. Three prominent GNN architectures Graph Convolutional Networks (Graph Sample and Aggregation) and Gated Graph Neural… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  36. arXiv:2310.17255  [pdf, other

    cs.CV

    Generalizing to Unseen Domains in Diabetic Retinopathy Classification

    Authors: Chamuditha Jayanga Galappaththige, Gayal Kuruppu, Muhammad Haris Khan

    Abstract: Diabetic retinopathy (DR) is caused by long-standing diabetes and is among the fifth leading cause for visual impairments. The process of early diagnosis and treatments could be helpful in curing the disease, however, the detection procedure is rather challenging and mostly tedious. Therefore, automated diabetic retinopathy classification using deep learning techniques has gained interest in the m… ▽ More

    Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted at WACV 2024

  37. arXiv:2310.17032  [pdf, other

    quant-ph cs.LG

    Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting

    Authors: Saad Zafar Khan, Nazeefa Muzammil, Salman Ghafoor, Haibat Khan, Syed Mohammad Hasan Zaidi, Abdulah Jeza Aljohani, Imran Aziz

    Abstract: Accurate solar power forecasting is pivotal for the global transition towards sustainable energy systems. This study conducts a meticulous comparison between Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. The primary objective is to evaluate the potential advantages of QLSTMs, leveraging their exponential representa… ▽ More

    Submitted 9 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 33 pages, 9 figures

  38. arXiv:2310.10935  [pdf, other

    cs.CL cs.LG

    Intent Detection and Slot Filling for Home Assistants: Dataset and Analysis for Bangla and Sylheti

    Authors: Fardin Ahsan Sakib, A H M Rezaul Karim, Saadat Hasan Khan, Md Mushfiqur Rahman

    Abstract: As voice assistants cement their place in our technologically advanced society, there remains a need to cater to the diverse linguistic landscape, including colloquial forms of low-resource languages. Our study introduces the first-ever comprehensive dataset for intent detection and slot filling in formal Bangla, colloquial Bangla, and Sylheti languages, totaling 984 samples across 10 unique inten… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted at the First Workshop on Bangla Language Processing, 2023

  39. arXiv:2309.11301  [pdf, other

    cs.CV

    Generalizing Across Domains in Diabetic Retinopathy via Variational Autoencoders

    Authors: Sharon Chokuwa, Muhammad H. Khan

    Abstract: Domain generalization for Diabetic Retinopathy (DR) classification allows a model to adeptly classify retinal images from previously unseen domains with various imaging conditions and patient demographics, thereby enhancing its applicability in a wide range of clinical environments. In this study, we explore the inherent capacity of variational autoencoders to disentangle the latent space of fundu… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at MICCAI 2023 1st International Workshop on Foundation Models for General Medical AI (MedAGI)

  40. arXiv:2309.10518  [pdf, other

    cs.CV

    Unsupervised Landmark Discovery Using Consistency Guided Bottleneck

    Authors: Mamona Awan, Muhammad Haris Khan, Sanoojan Baliah, Muhammad Ahmad Waseem, Salman Khan, Fahad Shahbaz Khan, Arif Mahmood

    Abstract: We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted ORAL at BMVC 2023 ; Code: https://github.com/MamonaAwan/CGB_ULD

    ACM Class: I.4

  41. arXiv:2309.02636  [pdf, other

    cs.CV cs.LG

    Multiclass Alignment of Confidence and Certainty for Network Calibration

    Authors: Vinith Kugathasan, Muhammad Haris Khan

    Abstract: Deep neural networks (DNNs) have made great strides in pushing the state-of-the-art in several challenging domains. Recent studies reveal that they are prone to making overconfident predictions. This greatly reduces the overall trust in model predictions, especially in safety-critical applications. Early work in improving model calibration employs post-processing techniques which rely on limited p… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted at GCPR 2023

  42. arXiv:2308.14212  [pdf, other

    cs.CV

    Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy

    Authors: Sanoojan Baliah, Fadillah A. Maani, Santosh Sanjeev, Muhammad Haris Khan

    Abstract: Diabetic Retinopathy (DR), a leading cause of vision impairment, requires early detection and treatment. Develo** robust AI models for DR classification holds substantial potential, but a key challenge is ensuring their generalization in unfamiliar domains with varying data distributions. To address this, our paper investigates cross-domain generalization, also known as domain generalization (DG… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  43. arXiv:2308.10192  [pdf, ps, other

    eess.IV cs.CV

    EDDense-Net: Fully Dense Encoder Decoder Network for Joint Segmentation of Optic Cup and Disc

    Authors: Mehwish Mehmood, Khuram Naveed, Khursheed Aurangzeb, Haroon Ahmed Khan, Musaed Alhussein, Syed Saud Naqvi

    Abstract: Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation networ… ▽ More

    Submitted 23 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  44. arXiv:2307.13386  [pdf, other

    cs.SE cs.LG

    BotHawk: An Approach for Bots Detection in Open Source Software Projects

    Authors: Fenglin Bi, Zhiwei Zhu, Wei Wang, Xiaoya Xia, Hassan Ali Khan, Peng Pu

    Abstract: Social coding platforms have revolutionized collaboration in software development, leading to using software bots for streamlining operations. However, The presence of open-source software (OSS) bots gives rise to problems including impersonation, spamming, bias, and security risks. Identifying bot accounts and behavior is a challenging task in the OSS project. This research aims to investigate bo… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Dataset, Bots Detection, Classification. Open-source Software Bots

  45. arXiv:2307.08930  [pdf, other

    cs.CV cs.AI

    Unsupervised Deep Graph Matching Based on Cycle Consistency

    Authors: Siddharth Tourani, Carsten Rother, Muhammad Haris Khan, Bogdan Savchynskyy

    Abstract: We contribute to the sparsely populated area of unsupervised deep graph matching with application to keypoint matching in images. Contrary to the standard \emph{supervised} approach, our method does not require ground truth correspondences between keypoint pairs. Instead, it is self-supervised by enforcing consistency of matchings between images of the same object category. As the matching and the… ▽ More

    Submitted 11 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: 12 pages, 5 figures, 3 papers

  46. arXiv:2307.08260  [pdf, other

    cs.SE cs.CL

    Extending the Frontier of ChatGPT: Code Generation and Debugging

    Authors: Fardin Ahsan Sakib, Saadat Hasan Khan, A. H. M. Rezaul Karim

    Abstract: Large-scale language models (LLMs) have emerged as a groundbreaking innovation in the realm of question-answering and conversational agents. These models, leveraging different deep learning architectures such as Transformers, are trained on vast corpora to predict sentences based on given queries. Among these LLMs, ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial inte… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  47. arXiv:2306.17104  [pdf, other

    cs.CV

    Deep Ensemble for Rotorcraft Attitude Prediction

    Authors: Hikmat Khan, Nidhal Carla Bouaynaya, Ghulam Rasool, Tyler Travis, Lacey Thompson, Charles C. Johnson

    Abstract: Historically, the rotorcraft community has experienced a higher fatal accident rate than other aviation segments, including commercial and general aviation. Recent advancements in artificial intelligence (AI) and the application of these technologies in different areas of our lives are both intriguing and encouraging. When developed appropriately for the aviation domain, AI techniques provide an o… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  48. arXiv:2306.17091  [pdf, other

    cs.LG cs.CV

    The Importance of Robust Features in Mitigating Catastrophic Forgetting

    Authors: Hikmat Khan, Nidhal C. Bouaynaya, Ghulam Rasool

    Abstract: Continual learning (CL) is an approach to address catastrophic forgetting, which refers to forgetting previously learned knowledge by neural networks when trained on new tasks or data distributions. The adversarial robustness has decomposed features into robust and non-robust types and demonstrated that models trained on robust features significantly enhance adversarial robustness. However, no stu… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  49. arXiv:2306.08271  [pdf, other

    cs.CV

    Multiclass Confidence and Localization Calibration for Object Detection

    Authors: Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan

    Abstract: Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make overconfident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Project page - https://bimsarapathiraja.github.io/mccl-project-page/

  50. arXiv:2306.06494  [pdf, other

    cs.CV cs.AI

    Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

    Authors: Li Xu, Bo Liu, Ameer Hamza Khan, Lu Fan, Xiao-Ming Wu

    Abstract: With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datasets such as MSCOCO, vision-language pre-training (VLP) has become an active area of research and proven to be effective for various VL tasks such as visual-question answering. However, studies on VLP in the medical domain have so far been scanty. To provide a comprehensive perspective on VLP for medi… ▽ More

    Submitted 24 August, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: Published as oral paper in CHIL 2023