-
PathoWAve: A Deep Learning-based Weight Averaging Method for Improving Domain Generalization in Histopathology Images
Authors:
Parastoo Sotoudeh Sharifi,
M. Omair Ahmad,
M. N. S. Swamy
Abstract:
Recent advancements in deep learning (DL) have significantly advanced medical image analysis. In the field of medical image processing, particularly in histopathology image analysis, the variation in staining protocols and differences in scanners present significant domain shift challenges, undermine the generalization capabilities of models to the data from unseen domains, prompting the need for…
▽ More
Recent advancements in deep learning (DL) have significantly advanced medical image analysis. In the field of medical image processing, particularly in histopathology image analysis, the variation in staining protocols and differences in scanners present significant domain shift challenges, undermine the generalization capabilities of models to the data from unseen domains, prompting the need for effective domain generalization (DG) strategies to improve the consistency and reliability of automated cancer detection tools in diagnostic decision-making. In this paper, we introduce Pathology Weight Averaging (PathoWAve), a multi-source DG strategy for addressing domain shift phenomenon of DL models in histopathology image analysis. Integrating specific weight averaging technique with parallel training trajectories and a strategically combination of regular augmentations with histopathology-specific data augmentation methods, PathoWAve enables a comprehensive exploration and precise convergence within the loss landscape. This method significantly enhanced generalization capabilities of DL models across new, unseen histopathology domains. To the best of our knowledge, PathoWAve is the first proposed weight averaging method for DG in histopathology image analysis. Our quantitative results on Camelyon17 WILDS dataset demonstrate PathoWAve's superiority over previous proposed methods to tackle the domain shift phenomenon in histopathology image processing. Our code is available at \url{https://github.com/ParastooSotoudeh/PathoWAve}.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Neural Active Learning Meets the Partial Monitoring Framework
Authors:
Maxime Heuillet,
Ola Ahmad,
Audrey Durand
Abstract:
We focus on the online-based active learning (OAL) setting where an agent operates over a stream of observations and trades-off between the costly acquisition of information (labelled observations) and the cost of prediction errors. We propose a novel foundation for OAL tasks based on partial monitoring, a theoretical framework specialized in online learning from partially informative actions. We…
▽ More
We focus on the online-based active learning (OAL) setting where an agent operates over a stream of observations and trades-off between the costly acquisition of information (labelled observations) and the cost of prediction errors. We propose a novel foundation for OAL tasks based on partial monitoring, a theoretical framework specialized in online learning from partially informative actions. We show that previously studied binary and multi-class OAL tasks are instances of partial monitoring. We expand the real-world potential of OAL by introducing a new class of cost-sensitive OAL tasks. We propose NeuralCBP, the first PM strategy that accounts for predictive uncertainty with deep neural networks. Our extensive empirical evaluation on open source datasets shows that NeuralCBP has favorable performance against state-of-the-art baselines on multiple binary, multi-class and cost-sensitive OAL tasks.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Improving Graph Machine Learning Performance Through Feature Augmentation Based on Network Control Theory
Authors:
Anwar Said,
Obaid Ullah Ahmad,
Waseem Abbas,
Mudassir Shabbir,
Xenofon Koutsoukos
Abstract:
Network control theory (NCT) offers a robust analytical framework for understanding the influence of network topology on dynamic behaviors, enabling researchers to decipher how certain patterns of external control measures can steer system dynamics towards desired states. Distinguished from other structure-function methodologies, NCT's predictive capabilities can be coupled with deploying Graph Ne…
▽ More
Network control theory (NCT) offers a robust analytical framework for understanding the influence of network topology on dynamic behaviors, enabling researchers to decipher how certain patterns of external control measures can steer system dynamics towards desired states. Distinguished from other structure-function methodologies, NCT's predictive capabilities can be coupled with deploying Graph Neural Networks (GNNs), which have demonstrated exceptional utility in various network-based learning tasks. However, the performance of GNNs heavily relies on the expressiveness of node features, and the lack of node features can greatly degrade their performance. Furthermore, many real-world systems may lack node-level information, posing a challenge for GNNs.To tackle this challenge, we introduce a novel approach, NCT-based Enhanced Feature Augmentation (NCT-EFA), that assimilates average controllability, along with other centrality indices, into the feature augmentation pipeline to enhance GNNs performance. Our evaluation of NCT-EFA, on six benchmark GNN models across two experimental setting. solely employing average controllability and in combination with additional centrality metrics. showcases an improved performance reaching as high as 11%. Our results demonstrate that incorporating NCT into feature enrichment can substantively extend the applicability and heighten the performance of GNNs in scenarios where node-level information is unavailable.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL
Authors:
Osama Ahmad,
Zawar Hussain,
Hammad Naeem
Abstract:
This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with c…
▽ More
This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Control-based Graph Embeddings with Data Augmentation for Contrastive Learning
Authors:
Obaid Ullah Ahmad,
Anwar Said,
Mudassir Shabbir,
Waseem Abbas,
Xenofon Koutsoukos
Abstract:
In this paper, we study the problem of unsupervised graph representation learning by harnessing the control properties of dynamical networks defined on graphs. Our approach introduces a novel framework for contrastive learning, a widely prevalent technique for unsupervised representation learning. A crucial step in contrastive learning is the creation of 'augmented' graphs from the input graphs. T…
▽ More
In this paper, we study the problem of unsupervised graph representation learning by harnessing the control properties of dynamical networks defined on graphs. Our approach introduces a novel framework for contrastive learning, a widely prevalent technique for unsupervised representation learning. A crucial step in contrastive learning is the creation of 'augmented' graphs from the input graphs. Though different from the original graphs, these augmented graphs retain the original graph's structural characteristics. Here, we propose a unique method for generating these augmented graphs by leveraging the control properties of networks. The core concept revolves around perturbing the original graph to create a new one while preserving the controllability properties specific to networks and graphs. Compared to the existing methods, we demonstrate that this innovative approach enhances the effectiveness of contrastive learning frameworks, leading to superior results regarding the accuracy of the classification tasks. The key innovation lies in our ability to decode the network structure using these control properties, opening new avenues for unsupervised graph representation learning.
△ Less
Submitted 17 April, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Improvising Age Verification Technologies in Canada: Technical, Regulatory and Social Dynamics
Authors:
Azfar Adib,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
Age verification, which is a mandatory legal requirement for delivering certain age-appropriate services or products, has recently been emphasized around the globe to ensure online safety for children. The rapid advancement of artificial intelligence has facilitated the recent development of some cutting-edge age-verification technologies, particularly using biometrics. However, successful deploym…
▽ More
Age verification, which is a mandatory legal requirement for delivering certain age-appropriate services or products, has recently been emphasized around the globe to ensure online safety for children. The rapid advancement of artificial intelligence has facilitated the recent development of some cutting-edge age-verification technologies, particularly using biometrics. However, successful deployment and mass acceptance of these technologies are significantly dependent on the corresponding socio-economic and regulatory context. This paper reviews such key dynamics for improvising age-verification technologies in Canada. It is particularly essential for such technologies to be inclusive, transparent, adaptable, privacy-preserving, and secure. Effective collaboration between academia, government, and industry entities can help to meet the growing demands for age-verification services in Canada while maintaining a user-centric approach.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Randomized Confidence Bounds for Stochastic Partial Monitoring
Authors:
Maxime Heuillet,
Ola Ahmad,
Audrey Durand
Abstract:
The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to se…
▽ More
The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.
△ Less
Submitted 15 May, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Successive Data Injection in Conditional Quantum GAN Applied to Time Series Anomaly Detection
Authors:
Benjamin Kalfon,
Soumaya Cherkaoui,
Jean-Frédéric Laprade,
Ola Ahmad,
Shengrui Wang
Abstract:
Classical GAN architectures have shown interesting results for solving anomaly detection problems in general and for time series anomalies in particular, such as those arising in communication networks. In recent years, several quantum GAN architectures have been proposed in the literature. When detecting anomalies in time series using QGANs, huge challenges arise due to the limited number of qubi…
▽ More
Classical GAN architectures have shown interesting results for solving anomaly detection problems in general and for time series anomalies in particular, such as those arising in communication networks. In recent years, several quantum GAN architectures have been proposed in the literature. When detecting anomalies in time series using QGANs, huge challenges arise due to the limited number of qubits compared to the size of the data. To address these challenges, we propose a new high-dimensional encoding approach, named Successive Data Injection (SuDaI). In this approach, we explore a larger portion of the quantum state than that in the conventional angle encoding, the method used predominantly in the literature, through repeated data injections into the quantum state. SuDaI encoding allows us to adapt the QGAN for anomaly detection with network data of a much higher dimensionality than with the existing known QGANs implementations. In addition, SuDaI encoding applies to other types of high-dimensional time series and can be used in contexts beyond anomaly detection and QGANs, opening up therefore multiple fields of application.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Learning adjacency matrix for dynamic graph neural network
Authors:
Osama Ahmad,
Omer Abdul Jalil,
Usman Nazir,
Murtaza Taj
Abstract:
In recent work, [1] introduced the concept of using a Block Adjacency Matrix (BA) for the representation of spatio-temporal data. While their method successfully concatenated adjacency matrices to encapsulate spatio-temporal relationships in a single graph, it formed a disconnected graph. This limitation hampered the ability of Graph Convolutional Networks (GCNs) to perform message passing across…
▽ More
In recent work, [1] introduced the concept of using a Block Adjacency Matrix (BA) for the representation of spatio-temporal data. While their method successfully concatenated adjacency matrices to encapsulate spatio-temporal relationships in a single graph, it formed a disconnected graph. This limitation hampered the ability of Graph Convolutional Networks (GCNs) to perform message passing across nodes belonging to different time steps, as no temporal links were present. To overcome this challenge, we introduce an encoder block specifically designed to learn these missing temporal links. The encoder block processes the BA and predicts connections between previously unconnected subgraphs, resulting in a Spatio-Temporal Block Adjacency Matrix (STBAM). This enriched matrix is then fed into a Graph Neural Network (GNN) to capture the complex spatio-temporal topology of the network. Our evaluations on benchmark datasets, surgVisDom and C2D2, demonstrate that our method, with slightly higher complexity, achieves superior results compared to state-of-the-art results. Our approach's computational overhead remains significantly lower than conventional non-graph-based methodologies for spatio-temporal data.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Controllability Backbone in Networks
Authors:
Obaid Ullah Ahmad,
Waseem Abbas,
Mudassir Shabbir
Abstract:
This paper studies the controllability backbone problem in dynamical networks defined over graphs. The main idea of the controllability backbone is to identify a small subset of edges in a given network such that any subnetwork containing those edges/links has at least the same network controllability as the original network while assuming the same set of input/leader vertices. We consider the str…
▽ More
This paper studies the controllability backbone problem in dynamical networks defined over graphs. The main idea of the controllability backbone is to identify a small subset of edges in a given network such that any subnetwork containing those edges/links has at least the same network controllability as the original network while assuming the same set of input/leader vertices. We consider the strong structural controllability (SSC) in our work, which is useful but computationally challenging. Thus, we utilize two lower bounds on the network's SSC based on the zero forcing notion and graph distances. We provide algorithms to compute controllability backbones while preserving these lower bounds. We thoroughly analyze the proposed algorithms and compute the number of edges in the controllability backbones. Finally, we compare and numerically evaluate our methods on random graphs.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning
Authors:
Julien Nicolas,
Florent Chiaroni,
Imtiaz Ziko,
Ola Ahmad,
Christian Desrosiers,
Jose Dolz
Abstract:
Despite the recent progress in incremental learning, addressing catastrophic forgetting under distributional drift is still an open and important problem. Indeed, while state-of-the-art domain incremental learning (DIL) methods perform satisfactorily within known domains, their performance largely degrades in the presence of novel domains. This limitation hampers their generalizability, and restri…
▽ More
Despite the recent progress in incremental learning, addressing catastrophic forgetting under distributional drift is still an open and important problem. Indeed, while state-of-the-art domain incremental learning (DIL) methods perform satisfactorily within known domains, their performance largely degrades in the presence of novel domains. This limitation hampers their generalizability, and restricts their scalability to more realistic settings where train and test data are drawn from different distributions. To address these limitations, we present a novel DIL approach based on a mixture of prompt-tuned CLIP models (MoP-CLIP), which generalizes the paradigm of S-Prompting to handle both in-distribution and out-of-distribution data at inference. In particular, at the training stage we model the features distribution of every class in each domain, learning individual text and visual prompts to adapt to a given domain. At inference, the learned distributions allow us to identify whether a given test sample belongs to a known domain, selecting the correct prompt for the classification task, or from an unseen domain, leveraging a mixture of the prompt-tuned CLIP models. Our empirical evaluation reveals the poor performance of existing DIL methods under domain shift, and suggests that the proposed MoP-CLIP performs competitively in the standard DIL settings while outperforming state-of-the-art methods in OOD scenarios. These results demonstrate the superiority of MoP-CLIP, offering a robust and general solution to the problem of domain incremental learning.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Causal Analysis for Robust Interpretability of Neural Networks
Authors:
Ola Ahmad,
Nicolas Bereux,
Loïc Baret,
Vahid Hashemi,
Freddy Lecue
Abstract:
Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs,…
▽ More
Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs, model overfitting, or misspecification). Moreover, this process has proven to result in noisy and unstable attributions that prevent any transparent understanding of the model's behavior. In this paper, we develop a robust interventional-based method grounded by causal analysis to capture cause-effect mechanisms in pre-trained neural networks and their relation to the prediction. Our novel approach relies on path interventions to infer the causal mechanisms within hidden layers and isolate relevant and necessary information (to model prediction), avoiding noisy ones. The result is task-specific causal explanatory graphs that can audit model behavior and express the actual causes underlying its performance. We apply our method to vision models trained on classification tasks. On image classification tasks, we provide extensive quantitative experiments to show that our approach can capture more stable and faithful explanations than standard attribution-based methods. Furthermore, the underlying causal graphs reveal the neural interactions in the model, making it a valuable tool in other applications (e.g., model repair).
△ Less
Submitted 20 June, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
DarSwin: Distortion Aware Radial Swin Transformer
Authors:
Akshaya Athwale,
Ichrak Shili,
Émile Bergeron,
Arman Afrasiyabi,
Justin Lagüe,
Ola Ahmad,
Jean-François Lalonde
Abstract:
Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed imag…
▽ More
Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/
△ Less
Submitted 7 January, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
An initial Theory to Understand and Manage Requirements Engineering Debt in Practice
Authors:
Julian Frattini,
Davide Fucci,
Daniel Mendez,
Rodrigo Spinola,
Vladimir Mandic,
Nebojsa Tausan,
Muhammad Ovais Ahmad,
Javier Gonzalez-Huerta
Abstract:
Context: Advances in technical debt research demonstrate the benefits of applying the financial debt metaphor to support decision-making in software development activities. Although decision-making during requirements engineering has significant consequences, the debt metaphor in requirements engineering is inadequately explored. Objective: We aim to conceptualize how the debt metaphor applies to…
▽ More
Context: Advances in technical debt research demonstrate the benefits of applying the financial debt metaphor to support decision-making in software development activities. Although decision-making during requirements engineering has significant consequences, the debt metaphor in requirements engineering is inadequately explored. Objective: We aim to conceptualize how the debt metaphor applies to requirements engineering by organizing concepts related to practitioners' understanding and managing of requirements engineering debt (RED). Method: We conducted two in-depth expert interviews to identify key requirements engineering debt concepts and construct a survey instrument. We surveyed 69 practitioners worldwide regarding their perception of the concepts and developed an initial analytical theory. Results: We propose a RED theory that aligns key concepts from technical debt research but emphasizes the specific nature of requirements engineering. In particular, the theory consists of 23 falsifiable propositions derived from the literature, the interviews, and survey results. Conclusions: The concepts of requirements engineering debt are perceived to be similar to their technical debt counterpart. Nevertheless, measuring and tracking requirements engineering debt are immature in practice. Our proposed theory serves as the first guide toward further research in this area.
△ Less
Submitted 8 March, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Team performance and large scale agile software development
Authors:
Muhammad Ovais Ahmad,
Hadi Ghanbari,
Tomas Gustavsson
Abstract:
Software development is a team work and largely dependent on open social interaction and continuous learning of individuals. Drawing on well established theoretical concepts proposed by social psychology and organizational science disciplines, we develop a theoretical framework proposing that team climate has a significant influence on team learning and ultimately affects team performance. Our stu…
▽ More
Software development is a team work and largely dependent on open social interaction and continuous learning of individuals. Drawing on well established theoretical concepts proposed by social psychology and organizational science disciplines, we develop a theoretical framework proposing that team climate has a significant influence on team learning and ultimately affects team performance. Our study consists of two goals. First to understand the preconditions of team learning and second to investigate the relationship between team learning, psychological safety, and team performance in large scale agile software development projects. We plan to conduct a survey with software professionals in Sweden from three companies partners in pur large-scale agile research project.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide Field-of-View Image Recognition
Authors:
Ola Ahmad,
Freddy Lecue
Abstract:
Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the-art approaches for object recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels. However, they are limited by the Euclidean geometry and their accuracy degrades under strong distortions caused by fisheye projections. In…
▽ More
Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the-art approaches for object recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels. However, they are limited by the Euclidean geometry and their accuracy degrades under strong distortions caused by fisheye projections. In this work, we demonstrate that learning the shape of convolution kernels in non-Euclidean spaces is better than existing deformable kernel methods. In particular, we propose a new approach that learns deformable kernel parameters (positions) in hyperbolic space. FisheyeHDK is a hybrid CNN architecture combining hyperbolic and Euclidean convolution layers for positions and features learning. First, we provide an intuition of hyperbolic space for wide FoV images. Using synthetic distortion profiles, we demonstrate the effectiveness of our approach. We select two datasets - Cityscapes and BDD100K 2020 - of perspective images which we transform to fisheye equivalents at different scaling factors (analog to focal lengths). Finally, we provide an experiment on data collected by a real fisheye camera. Validations and experiments show that our approach improves existing deformable kernel methods for CNN adaptation on fisheye images.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
SEMOUR: A Scripted Emotional Speech Repository for Urdu
Authors:
Nimra Zaheer,
Obaid Ullah Ahmad,
Ammar Ahmed,
Muhammad Shehryar Khan,
Mudassir Shabbir
Abstract:
Designing reliable Speech Emotion Recognition systems is a complex task that inevitably requires sufficient data for training purposes. Such extensive datasets are currently available in only a few languages, including English, German, and Italian. In this paper, we present SEMOUR, the first scripted database of emotion-tagged speech in the Urdu language, to design an Urdu Speech Recognition Syste…
▽ More
Designing reliable Speech Emotion Recognition systems is a complex task that inevitably requires sufficient data for training purposes. Such extensive datasets are currently available in only a few languages, including English, German, and Italian. In this paper, we present SEMOUR, the first scripted database of emotion-tagged speech in the Urdu language, to design an Urdu Speech Recognition System. Our gender-balanced dataset contains 15,040 unique instances recorded by eight professional actors eliciting a syntactically complex script. The dataset is phonetically balanced, and reliably exhibits a varied set of emotions as marked by the high agreement scores among human raters in experiments. We also provide various baseline speech emotion prediction scores on the database, which could be used for various applications like personalized robot assistants, diagnosis of psychological disorders, and getting feedback from a low-tech-enabled population, etc. On a random test sample, our model correctly predicts an emotion with a state-of-the-art 92% accuracy.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Adaptable Deformable Convolutions for Semantic Segmentation of Fisheye Images in Autonomous Driving Systems
Authors:
Clément Playout,
Ola Ahmad,
Freddy Lecue,
Farida Cheriet
Abstract:
Advanced Driver-Assistance Systems rely heavily on perception tasks such as semantic segmentation where images are captured from large field of view (FoV) cameras. State-of-the-art works have made considerable progress toward applying Convolutional Neural Network (CNN) to standard (rectilinear) images. However, the large FoV cameras used in autonomous vehicles produce fisheye images characterized…
▽ More
Advanced Driver-Assistance Systems rely heavily on perception tasks such as semantic segmentation where images are captured from large field of view (FoV) cameras. State-of-the-art works have made considerable progress toward applying Convolutional Neural Network (CNN) to standard (rectilinear) images. However, the large FoV cameras used in autonomous vehicles produce fisheye images characterized by strong geometric distortion. This work demonstrates that a CNN trained on standard images can be readily adapted to fisheye images, which is crucial in real-world applications where time-consuming real-time data transformation must be avoided. Our adaptation protocol mainly relies on modifying the support of the convolutions by using their deformable equivalents on top of pre-existing layers. We prove that tuning an optimal support only requires a limited amount of labeled fisheye images, as a small number of training samples is sufficient to significantly improve an existing model's performance on wide-angle images. Furthermore, we show that finetuning the weights of the network is not necessary to achieve high performance once the deformable components are learned. Finally, we provide an in-depth analysis of the effect of the deformable convolutions, bringing elements of discussion on the behavior of CNN models.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge
Authors:
Yue Sun,
Kun Gao,
Zhengwang Wu,
Zhihao Lei,
Ying Wei,
Jun Ma,
** Yang,
Xue Feng,
Li Zhao,
Trung Le Phan,
Jitae Shin,
Tao Zhong,
Yu Zhang,
Lequan Yu,
Caizi Li,
Ramesh Basnet,
M. Omair Ahmad,
M. N. S. Swamy,
Wenao Ma,
Qi Dou,
Toan Duc Bui,
Camilo Bermudez Noguera,
Bennett Landman,
Ian H. Gotlib,
Kathryn L. Humphreys
, et al. (8 additional authors not shown)
Abstract:
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i…
▽ More
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue.
△ Less
Submitted 11 July, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Design and Hardware Implementation of a Separable Image Steganographic Scheme Using Public-key Cryptosystem
Authors:
Salah Harb,
M. Omair Ahmad,
M. N. S Swamy
Abstract:
In this paper, a novel and efficient hardware implementation of steganographic cryptosystem based on a public-key cryptography is proposed. Digital images are utilized as carriers of secret data between sender and receiver parties in the communication channel. The proposed public-key cryptosystem offers a separable framework that allows to embed or extract secret data and encrypt or decrypt the ca…
▽ More
In this paper, a novel and efficient hardware implementation of steganographic cryptosystem based on a public-key cryptography is proposed. Digital images are utilized as carriers of secret data between sender and receiver parties in the communication channel. The proposed public-key cryptosystem offers a separable framework that allows to embed or extract secret data and encrypt or decrypt the carrier using the public-private key pair, independently. Paillier cryptographic system is adopted to encrypt and decrypt pixels of the digital image. To achieve efficiency, a proposed efficient parallel montgomery exponentiation core is designed and implemented for performing the underlying field operations in the Paillier cryptosystem. The hardware implementation results of the proposed steganographic cryptosystem show an efficiency in terms of area (resources), performance (speed) and power consumption. Our steganographic cryptosystem represents a small footprint making it well-suited for the embedded systems and real-time processing engines in applications such as medical scanning devices, autopilot cars and drones.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Applying r-spatiogram in object tracking for occlusion handling
Authors:
Niloufar Salehi Dastjerdi,
M. Omair Ahmad
Abstract:
Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we u…
▽ More
Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Interpretable Fully Convolutional Classification of Intrapapillary Capillary Loops for Real-Time Detection of Early Squamous Neoplasia
Authors:
Luis C. Garcia-Peraza-Herrera,
Martin Everson,
Wenqi Li,
Inmanol Luengo,
Lorenz Berger,
Omer Ahmad,
Laurence Lovat,
Hsiu-Po Wang,
Wen-Lun Wang,
Rehan Haidry,
Danail Stoyanov,
Tom Vercauteren,
Sebastien Ourselin
Abstract:
In this work, we have concentrated our efforts on the interpretability of classification results coming from a fully convolutional neural network. Motivated by the classification of oesophageal tissue for real-time detection of early squamous neoplasia, the most frequent kind of oesophageal cancer in Asia, we present a new dataset and a novel deep learning method that by means of deep supervision…
▽ More
In this work, we have concentrated our efforts on the interpretability of classification results coming from a fully convolutional neural network. Motivated by the classification of oesophageal tissue for real-time detection of early squamous neoplasia, the most frequent kind of oesophageal cancer in Asia, we present a new dataset and a novel deep learning method that by means of deep supervision and a newly introduced concept, the embedded Class Activation Map (eCAM), focuses on the interpretability of results as a design constraint of a convolutional network. We present a new approach to visualise attention that aims to give some insights on those areas of the oesophageal tissue that lead a network to conclude that the images belong to a particular class and compare them with those visual features employed by clinicians to produce a clinical diagnosis. In comparison to a baseline method which does not feature deep supervision but provides attention by grafting Class Activation Maps, we improve the F1-score from 87.3% to 92.7% and provide more detailed attention maps.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Speech Enhancement in Adverse Environments Based on Non-stationary Noise-driven Spectral Subtraction and SNR-dependent Phase Compensation
Authors:
Md Tauhidul Islam,
Asaduzzaman,
Celia Shahnaz,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based…
▽ More
A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based on the low frequency information of the noisy speech is introduced. We argue that this method of noise estimation is capable of estimating the non-stationary noise accurately. The phase spectrum of the noisy speech is modified in the second step consisting of phase spectrum compensation, where an SNR-dependent approach is incorporated to determine the amount of compensation to be imposed on the phase spectrum. A modified complex spectrum is obtained by aggregating the magnitude from the spectral subtraction step and modified phase spectrum from the phase compensation step, which is found to be a better representation of enhanced speech spectrum. Speech files available in the NOIZEUS database are used to carry extensive simulations for evaluation of the proposed method.
△ Less
Submitted 18 February, 2018;
originally announced March 2018.
-
Enhancement of Noisy Speech with Low Speech Distortion Based on Probabilistic Geometric Spectral Subtraction
Authors:
Md Tauhidul Islam,
Celia Shahnaz,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on short time magnitude spectrum is presented in this paper. A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prev…
▽ More
A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on short time magnitude spectrum is presented in this paper. A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prevents the speech distortion. The noise compensated magnitude spectrum is then recombined with the unchanged phase spectrum to produce a modified complex spectrum prior to synthesize an enhanced frame. Extensive simulations are carried out using the speech files available in the NOIZEUS database in order to evaluate the performance of the proposed method.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients with an Erlang-2 PDF for Real Time Enhancement of Noisy Speech
Authors:
Md Tauhidul Islam,
Celia Shahnaz,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based…
▽ More
In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based on a combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients. The proposed custom thresholding function works as a mu-law or a semisoft thresholding function or their combination based on the probability of speech presence and absence in a subband of the PWP transformed noisy speech. By using the speech files available in NOIZEUS database, a number of simulations are performed to evaluate the performance of the proposed method for speech signals in the presence of Gaussian white and street noises. The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests.
△ Less
Submitted 9 February, 2018;
originally announced February 2018.
-
A Divide and Conquer Strategy for Musical Noise-free Speech Enhancement in Adverse Environments
Authors:
Md Tauhidul Islam,
Celia Shahnaz,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
A divide and conquer strategy for enhancement of noisy speeches in adverse environments involving lower levels of SNR is presented in this paper, where the total system of speech enhancement is divided into two separate steps. The first step is based on noise compensation on short time magnitude and the second step is based on phase compensation. The magnitude spectrum is compensated based on a mo…
▽ More
A divide and conquer strategy for enhancement of noisy speeches in adverse environments involving lower levels of SNR is presented in this paper, where the total system of speech enhancement is divided into two separate steps. The first step is based on noise compensation on short time magnitude and the second step is based on phase compensation. The magnitude spectrum is compensated based on a modified spectral subtraction method where the cross-terms containing spectra of noise and clean speech are taken into consideration, which are neglected in the traditional spectral subtraction methods. By employing the modified magnitude and unchanged phase, a procedure is formulated to compensate the overestimation or underestimation of noise by phase compensation method based on the probability of speech presence. A modified complex spectrum based on these two steps are obtained to synthesize a musical noise free enhanced speech. Extensive simulations are carried out using the speech files available in the NOIZEUS database in order to evaluate the performance of the proposed method. It is shown in terms of the objective measures, spectrogram analysis and formal subjective listening tests that the proposed method consistently outperforms some of the state-of-the-art methods of speech enhancement for noisy speech corrupted by street or babble noise at very low as well as medium levels of SNR.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
Generalised Object Detection and Semantic Analysis: Casino Example using Matlab
Authors:
Othman Ahmad
Abstract:
Matlab version 7.1 had been used to detect playing cards on a Casino table and the suits and ranks of these cards had been identified. The process gives an example of an application of computer vision to a problem where rectangular objects are to be detected and the information content of the objects are extracted out. In the case of playing cards, it is the suit and rank of each card. The image p…
▽ More
Matlab version 7.1 had been used to detect playing cards on a Casino table and the suits and ranks of these cards had been identified. The process gives an example of an application of computer vision to a problem where rectangular objects are to be detected and the information content of the objects are extracted out. In the case of playing cards, it is the suit and rank of each card. The image processing system is done in two passes. Pass 1 detects rectangular shapes and template matched with a template of the left and right edges of the cards. Pass 2 extracts the suit and rank of the cards by matching the top left portion of the card that contains both rank and suit information, with stored templates of ranks and suits of the playing cards using a series of if-then statements.
△ Less
Submitted 17 September, 2011;
originally announced September 2011.
-
Software Requirements Specification of the IUfA's UUIS -- a Team 2 COMP5541-W10 Project Approach
Authors:
Omer Shahid Ahmad,
Faisal Alrashdi,
Jason,
Chen,
Najah Ilham,
Jianhai Lu,
Yiwei Sun,
Tong Wang,
Yongxin Zhu
Abstract:
In the 52-page document, we describe our approach to the Software Requirements Specification of the IUfA's UUIS prototype. This includes the overall system description, functional requirements, non-functional requirements, use cases, the corresponding data dictionary for all entities involved, mock user interface (UI) design, and the overall projected cost estimate. The design specification of UUI…
▽ More
In the 52-page document, we describe our approach to the Software Requirements Specification of the IUfA's UUIS prototype. This includes the overall system description, functional requirements, non-functional requirements, use cases, the corresponding data dictionary for all entities involved, mock user interface (UI) design, and the overall projected cost estimate. The design specification of UUIS can be found in arXiv:1005.0665.
△ Less
Submitted 7 May, 2010; v1 submitted 5 May, 2010;
originally announced May 2010.
-
Software Design Document, Testing, Deployment and Configuration Management of the UUIS--a Team 2 COMP5541-W10 Project Approach
Authors:
Omer Shahid Ahmad,
Faisal Alrashdi,
Jason,
Chen,
Najah Ilham,
Jianhai Lu,
Yiwei Sun,
Tong Wang,
Yongxin Zhu
Abstract:
The Software Design Document of UUIS describes the prototype design details of the system architecture, database layer, deployment and configuration details as well as test cases produced while working the design and implementation of the prototype. The requirements specification of UUIS are detailed in arXiv:1005.0783.
The Software Design Document of UUIS describes the prototype design details of the system architecture, database layer, deployment and configuration details as well as test cases produced while working the design and implementation of the prototype. The requirements specification of UUIS are detailed in arXiv:1005.0783.
△ Less
Submitted 7 May, 2010; v1 submitted 5 May, 2010;
originally announced May 2010.