Search | arXiv e-print repository

Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion

Abstract: The Transformer architecture has dominated machine learning in a wide range of tasks. The specific characteristic of this architecture is an expensive scaled dot-product attention mechanism that models the inter-token interactions, which is known to be the reason behind its success. However, such a mechanism does not have a direct parallel to the human brain which brings the question if the scaled… ▽ More The Transformer architecture has dominated machine learning in a wide range of tasks. The specific characteristic of this architecture is an expensive scaled dot-product attention mechanism that models the inter-token interactions, which is known to be the reason behind its success. However, such a mechanism does not have a direct parallel to the human brain which brings the question if the scaled-dot product is necessary for intelligence with strong expressive power. Inspired by the lateralization of the human brain, we propose a new simple but effective architecture called the Lateralization MLP (L-MLP). Stacking L-MLP blocks can generate complex architectures. Each L-MLP block is based on a multi-layer perceptron (MLP) that permutes data dimensions, processes each dimension in parallel, merges them, and finally passes through a joint MLP. We discover that this specific design outperforms other MLP variants and performs comparably to a transformer-based architecture in the challenging diffusion task while being highly efficient. We conduct experiments using text-to-image generation tasks to demonstrate the effectiveness and efficiency of L-MLP. Further, we look into the model behavior and discover a connection to the function of the human brain. Our code is publicly available: \url{https://github.com/zizhao-hu/L-MLP} △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.06856 [pdf, other]

Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing

Authors: Mohamadreza Rostami, Marco Chilese, Shaza Zeitouni, Rahul Kande, Jeyavijayan Rajendran, Ahmad-Reza Sadeghi

Abstract: Modern computing systems heavily rely on hardware as the root of trust. However, their increasing complexity has given rise to security-critical vulnerabilities that cross-layer at-tacks can exploit. Traditional hardware vulnerability detection methods, such as random regression and formal verification, have limitations. Random regression, while scalable, is slow in exploring hardware, and formal… ▽ More Modern computing systems heavily rely on hardware as the root of trust. However, their increasing complexity has given rise to security-critical vulnerabilities that cross-layer at-tacks can exploit. Traditional hardware vulnerability detection methods, such as random regression and formal verification, have limitations. Random regression, while scalable, is slow in exploring hardware, and formal verification techniques are often concerned with manual effort and state explosions. Hardware fuzzing has emerged as an effective approach to exploring and detecting security vulnerabilities in large-scale designs like modern processors. They outperform traditional methods regarding coverage, scalability, and efficiency. However, state-of-the-art fuzzers struggle to achieve comprehensive coverage of intricate hardware designs within a practical timeframe, often falling short of a 70% coverage threshold. We propose a novel ML-based hardware fuzzer, ChatFuzz, to address this challenge. Ourapproach leverages LLMs like ChatGPT to understand processor language, focusing on machine codes and generating assembly code sequences. RL is integrated to guide the input generation process by rewarding the inputs using code coverage metrics. We use the open-source RISCV-based RocketCore processor as our testbed. ChatFuzz achieves condition coverage rate of 75% in just 52 minutes compared to a state-of-the-art fuzzer, which requires a lengthy 30-hour window to reach a similar condition coverage. Furthermore, our fuzzer can attain 80% coverage when provided with a limited pool of 10 simulation instances/licenses within a 130-hour window. During this time, it conducted a total of 199K test cases, of which 6K produced discrepancies with the processor's golden model. Our analysis identified more than 10 unique mismatches, including two new bugs in the RocketCore and discrepancies from the RISC-V ISA Simulator. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.16530 [pdf, other]

An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models

Authors: Zizhao Hu, Shaochong Jia, Mohammad Rostami

Abstract: Diffusion models have been widely used for conditional data cross-modal generation tasks such as text-to-image and text-to-video. However, state-of-the-art models still fail to align the generated visual concepts with high-level semantics in a language such as object count, spatial relationship, etc. We approach this problem from a multimodal data fusion perspective and investigate how different f… ▽ More Diffusion models have been widely used for conditional data cross-modal generation tasks such as text-to-image and text-to-video. However, state-of-the-art models still fail to align the generated visual concepts with high-level semantics in a language such as object count, spatial relationship, etc. We approach this problem from a multimodal data fusion perspective and investigate how different fusion strategies can affect vision-language alignment. We discover that compared to the widely used early fusion of conditioning text in a pretrained image feature space, a specially designed intermediate fusion can: (i) boost text-to-image alignment with improved generation quality and (ii) improve training and inference efficiency by reducing low-rank text-to-image attention calculations. We perform experiments using a text-to-image generation task on the MS-COCO dataset. We compare our intermediate fusion mechanism with the classic early fusion mechanism on two common conditioning methods on a U-shaped ViT backbone. Our intermediate fusion model achieves a higher CLIP Score and lower FID, with 20% reduced FLOPs, and 50% increased training speed compared to a strong U-ViT baseline with an early fusion. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16188 [pdf, other]

Cross-domain Multi-modal Few-shot Object Detection via Rich Text

Authors: Zeyu Shangguan, Daniel Seita, Mohammad Rostami

Abstract: Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks due to generating richer features. However, existing multi-modal object detection (MM-OD) methods degrade when facing significant domain-shift and are sample insufficient. We hypothesize that rich text information could more effectively help the model to build a knowledge relations… ▽ More Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks due to generating richer features. However, existing multi-modal object detection (MM-OD) methods degrade when facing significant domain-shift and are sample insufficient. We hypothesize that rich text information could more effectively help the model to build a knowledge relationship between the vision instance and its language description and can help mitigate domain shift. Specifically, we study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method that utilizes rich text semantic information as an auxiliary modality to achieve domain adaptation in the context of FSOD. Our proposed network contains (i) a multi-modal feature aggregation module that aligns the vision and language support feature embeddings and (ii) a rich text semantic rectify module that utilizes bidirectional text feature generation to reinforce multi-modal feature alignment and thus to enhance the model's language understanding capability. We evaluate our model on common standard cross-domain object detection datasets and demonstrate that our approach considerably outperforms existing FSOD methods. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.12511 [pdf, other]

Forward Gradient-Based Frank-Wolfe Optimization for Memory Efficient Deep Neural Network Training

Authors: M. Rostami, S. S. Kia

Abstract: Training a deep neural network using gradient-based methods necessitates the calculation of gradients at each level. However, using backpropagation or reverse mode differentiation, to calculate the gradients necessities significant memory consumption, rendering backpropagation an inefficient method for computing gradients. This paper focuses on analyzing the performance of the well-known Frank-Wol… ▽ More Training a deep neural network using gradient-based methods necessitates the calculation of gradients at each level. However, using backpropagation or reverse mode differentiation, to calculate the gradients necessities significant memory consumption, rendering backpropagation an inefficient method for computing gradients. This paper focuses on analyzing the performance of the well-known Frank-Wolfe algorithm, a.k.a. conditional gradient algorithm by having access to the forward mode of automatic differentiation to compute gradients. We provide in-depth technical details that show the proposed Algorithm does converge to the optimal solution with a sub-linear rate of convergence by having access to the noisy estimate of the true gradient obtained in the forward mode of automated differentiation, referred to as the Projected Forward Gradient. In contrast, the standard Frank-Wolfe algorithm, when provided with access to the Projected Forward Gradient, fails to converge to the optimal solution. We demonstrate the convergence attributes of our proposed algorithms using a numerical example. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.03018 [pdf, other]

CRISPR: Ensemble Model

Authors: Mohammad Rostami, Amin Ghariyazi, Hamed Dashti, Mohammad Hossein Rohban, Hamid R. Rabiee

Abstract: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a gene editing technology that has revolutionized the fields of biology and medicine. However, one of the challenges of using CRISPR is predicting the on-target efficacy and off-target sensitivity of single-guide RNAs (sgRNAs). This is because most existing methods are trained on separate datasets with different genes and cells,… ▽ More Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a gene editing technology that has revolutionized the fields of biology and medicine. However, one of the challenges of using CRISPR is predicting the on-target efficacy and off-target sensitivity of single-guide RNAs (sgRNAs). This is because most existing methods are trained on separate datasets with different genes and cells, which limits their generalizability. In this paper, we propose a novel ensemble learning method for sgRNA design that is accurate and generalizable. Our method combines the predictions of multiple machine learning models to produce a single, more robust prediction. This approach allows us to learn from a wider range of data, which improves the generalizability of our model. We evaluated our method on a benchmark dataset of sgRNA designs and found that it outperformed existing methods in terms of both accuracy and generalizability. Our results suggest that our method can be used to design sgRNAs with high sensitivity and specificity, even for new genes or cells. This could have important implications for the clinical use of CRISPR, as it would allow researchers to design more effective and safer treatments for a variety of diseases. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.18599 [pdf, other]

Meta-Tasks: An alternative view on Meta-Learning Regularization

Authors: Mohammad Rostami, Atik Faysal, Huaxia Wang, Avimanyu Sahoo, Ryan Antle

Abstract: Few-shot learning (FSL) is a challenging machine learning problem due to a scarcity of labeled data. The ability to generalize effectively on both novel and training tasks is a significant barrier to FSL. This paper proposes a novel solution that can generalize to both training and novel tasks while also utilizing unlabeled samples. The method refines the embedding model before updating the outer… ▽ More Few-shot learning (FSL) is a challenging machine learning problem due to a scarcity of labeled data. The ability to generalize effectively on both novel and training tasks is a significant barrier to FSL. This paper proposes a novel solution that can generalize to both training and novel tasks while also utilizing unlabeled samples. The method refines the embedding model before updating the outer loop using unsupervised techniques as ``meta-tasks''. The experimental results show that our proposed method performs well on novel and training tasks, with faster and better convergence, lower generalization, and standard deviation error, indicating its potential for practical applications in FSL. The experimental results show that the proposed method outperforms prototypical networks by 3.9%. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.03704 [pdf, other]

WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors

Authors: Pallavi Borkar, Chen Chen, Mohamadreza Rostami, Nikhilesh Singh, Rahul Kande, Ahmad-Reza Sadeghi, Chester Rebeiro, Jeyavijayan Rajendran

Abstract: Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-b… ▽ More Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-box or grey-box fuzzing to detect timing vulnerabilities in processors. However, they cannot identify the locations or root causes of these timing vulnerabilities, nor do they provide coverage feedback to enable the designer's confidence in the processor's security. To address the deficiencies of the existing fuzzers, we present WhisperFuzz--the first white-box fuzzer with static analysis--aiming to detect and locate timing vulnerabilities in processors and evaluate the coverage of microarchitectural timing behaviors. WhisperFuzz uses the fundamental nature of processors' timing behaviors, microarchitectural state transitions, to localize timing vulnerabilities. WhisperFuzz automatically extracts microarchitectural state transitions from a processor design at the register-transfer level (RTL) and instruments the design to monitor the state transitions as coverage. Moreover, WhisperFuzz measures the time a design-under-test (DUT) takes to process tests, identifying any minor, abnormal variations that may hint at a timing vulnerability. WhisperFuzz detects 12 new timing vulnerabilities across advanced open-sourced RISC-V processors: BOOM, Rocket Core, and CVA6. Eight of these violate the zero latency requirements of the Zkt extension and are considered serious security vulnerabilities. Moreover, WhisperFuzz also pinpoints the locations of the new and the existing vulnerabilities. △ Less

Submitted 14 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted to USENIX Sec'24

arXiv:2402.00580 [pdf, other]

Continuous Unsupervised Domain Adaptation Using Stabilized Representations and Experience Replay

Authors: Mohammad Rostami

Abstract: We introduce an algorithm for tackling the problem of unsupervised domain adaptation (UDA) in continual learning (CL) scenarios. The primary objective is to maintain model generalization under domain shift when new domains arrive continually through updating a base model when only unlabeled data is accessible in subsequent tasks. While there are many existing UDA algorithms, they typically require… ▽ More We introduce an algorithm for tackling the problem of unsupervised domain adaptation (UDA) in continual learning (CL) scenarios. The primary objective is to maintain model generalization under domain shift when new domains arrive continually through updating a base model when only unlabeled data is accessible in subsequent tasks. While there are many existing UDA algorithms, they typically require access to both the source and target domain datasets simultaneously. Conversely, existing CL approaches can handle tasks that all have labeled data. Our solution is based on stabilizing the learned internal distribution to enhances the model generalization on new domains. The internal distribution is modeled by network responses in hidden layer. We model this internal distribution using a Gaussian mixture model (GMM ) and update the model by matching the internally learned distribution of new domains to the estimated GMM. Additionally, we leverage experience replay to overcome the problem of catastrophic forgetting, where the model loses previously acquired knowledge when learning new tasks. We offer theoretical analysis to explain why our algorithm would work. We also offer extensive comparative and analytic experiments to demonstrate that our method is effective. We perform experiments on four benchmark datasets to demonstrate that our approach is effective. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.15275 [pdf, other]

Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks

Authors: Yuliang Cai, Mohammad Rostami

Abstract: Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural networks pose significant challenges for the widespread adoption of these models for applications that demand on-edge computing. To tackle this challenge, continual… ▽ More Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural networks pose significant challenges for the widespread adoption of these models for applications that demand on-edge computing. To tackle this challenge, continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. However, current CL methods mainly focus on learning tasks that are exclusively vision-based or language-based. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language, known as Vision-and-Language (VaL) tasks. Due to the success of transformers in other modalities, our architecture has the potential to be used in multimodal learning settings. In our framework, we benefit from introducing extra parameters to a base transformer to specialize the network for each task. As a result, we enable dynamic model expansion to learn several tasks in a sequence. We also use knowledge distillation to benefit from relevant past experiences to learn the current task more efficiently. Our proposed method, Task Attentive Multimodal Continual Learning (TAM-CL), allows for the exchange of information between tasks while mitigating the problem of catastrophic forgetting. Notably, our approach is scalable, incurring minimal memory and time overhead. TAM-CL achieves state-of-the-art (SOTA) performance on challenging multimodal tasks △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.07207 [pdf, other]

Unsupervised Domain Adaptation Using Compact Internal Representations

Authors: Mohammad Rostami

Abstract: A major technique for tackling unsupervised domain adaptation involves map** data points from both the source and target domains into a shared embedding space. The map** encoder to the embedding space is trained such that the embedding space becomes domain agnostic, allowing a classifier trained on the source domain to generalize well on the target domain. To further enhance the performance of… ▽ More A major technique for tackling unsupervised domain adaptation involves map** data points from both the source and target domains into a shared embedding space. The map** encoder to the embedding space is trained such that the embedding space becomes domain agnostic, allowing a classifier trained on the source domain to generalize well on the target domain. To further enhance the performance of unsupervised domain adaptation (UDA), we develop an additional technique which makes the internal distribution of the source domain more compact, thereby improving the model's ability to generalize in the target domain.We demonstrate that by increasing the margins between data representations for different classes in the embedding space, we can improve the model performance for UDA. To make the internal representation more compact, we estimate the internally learned multi-modal distribution of the source domain as Gaussian mixture model (GMM). Utilizing the estimated GMM, we enhance the separation between different classes in the source domain, thereby mitigating the effects of domain shift. We offer theoretical analysis to support outperofrmance of our method. To evaluate the effectiveness of our approach, we conduct experiments on widely used UDA benchmark UDA datasets. The results indicate that our method enhances model generalizability and outperforms existing techniques. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.02941 [pdf, other]

Unsupervised Federated Domain Adaptation for Segmentation of MRI Images

Authors: Navapat Nananukul, Hamid Soltanian-zadeh, Mohammad Rostami

Abstract: Automatic semantic segmentation of magnetic resonance imaging (MRI) images using deep neural networks greatly assists in evaluating and planning treatments for various clinical applications. However, training these models is conditioned on the availability of abundant annotated data to implement the end-to-end supervised learning procedure. Even if we annotate enough data, MRI images display consi… ▽ More Automatic semantic segmentation of magnetic resonance imaging (MRI) images using deep neural networks greatly assists in evaluating and planning treatments for various clinical applications. However, training these models is conditioned on the availability of abundant annotated data to implement the end-to-end supervised learning procedure. Even if we annotate enough data, MRI images display considerable variability due to factors such as differences in patients, MRI scanners, and imaging protocols. This variability necessitates retraining neural networks for each specific application domain, which, in turn, requires manual annotation by expert radiologists for all new domains. To relax the need for persistent data annotation, we develop a method for unsupervised federated domain adaptation using multiple annotated source domains. Our approach enables the transfer of knowledge from several annotated source domains to adapt a model for effective use in an unannotated target domain. Initially, we ensure that the target domain data shares similar representations with each source domain in a latent embedding space, modeled as the output of a deep encoder, by minimizing the pair-wise distances of the distributions for the target domain and the source domains. We then employ an ensemble approach to leverage the knowledge obtained from all domains. We provide theoretical analysis and perform experiments on the MICCAI 2016 multi-site dataset to demonstrate our method is effective. △ Less

Submitted 13 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.01042 [pdf, other]

Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation

Authors: Mohammad Rostami, Dayuan Jian

Abstract: Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks in challenging scenarios, such as high-dynamic range environments and fast-motion maneuvers. Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data caused by the relatively recent emergence of e… ▽ More Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks in challenging scenarios, such as high-dynamic range environments and fast-motion maneuvers. Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data caused by the relatively recent emergence of event-based cameras. To overcome this limitation, leveraging the knowledge available from annotated data obtained with conventional frame-based cameras presents an effective solution based on unsupervised domain adaptation. We propose a new algorithm tailored for adapting a deep neural network trained on annotated frame-based data to generalize well on event-based unannotated data. Our approach incorporates uncorrelated conditioning and self-supervised learning in an adversarial learning scheme to close the gap between the two source and target domains. By applying self-supervised learning, the algorithm learns to align the representations of event-based data with those from frame-based camera data, thereby facilitating knowledge transfer.Furthermore, the inclusion of uncorrelated conditioning ensures that the adapted model effectively distinguishes between event-based and conventional data, enhancing its ability to classify event-based images accurately.Through empirical experimentation and evaluation, we demonstrate that our algorithm surpasses existing approaches designed for the same purpose using two benchmarks. The superior performance of our solution is attributed to its ability to effectively utilize annotated data from frame-based cameras and transfer the acquired knowledge to the event-based vision domain. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01035 [pdf, other]

Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations

Authors: Serban Stan, Mohammad Rostami

Abstract: Semantic segmentation models trained on annotated data fail to generalize well when the input data distribution changes over extended time period, leading to requiring re-training to maintain performance. Classic Unsupervised domain adaptation (UDA) attempts to address a similar problem when there is target domain with no annotated data points through transferring knowledge from a source domain wi… ▽ More Semantic segmentation models trained on annotated data fail to generalize well when the input data distribution changes over extended time period, leading to requiring re-training to maintain performance. Classic Unsupervised domain adaptation (UDA) attempts to address a similar problem when there is target domain with no annotated data points through transferring knowledge from a source domain with annotated data. We develop an online UDA algorithm for semantic segmentation of images that improves model generalization on unannotated domains in scenarios where source data access is restricted during adaptation. We perform model adaptation is by minimizing the distributional distance between the source latent features and the target features in a shared embedding space. Our solution promotes a shared domain-agnostic latent feature space between the two domains, which allows for classifier generalization on the target dataset. To alleviate the need of access to source samples during adaptation, we approximate the source latent feature distribution via an appropriate surrogate distribution, in this case a Gassian mixture model (GMM). We evaluate our approach on well established semantic segmentation datasets and demonstrate it compares favorably against state-of-the-art (SOTA) UDA semantic segmentation methods. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2311.16488 [pdf, other]

Efficient Multimodal Diffusion Models Using Joint Data Infilling with Partially Shared U-Net

Authors: Zizhao Hu, Shaochong Jia, Mohammad Rostami

Abstract: Recently, diffusion models have been used successfully to fit distributions for cross-modal data translation and multimodal data generation. However, these methods rely on extensive scaling, overlooking the inefficiency and interference between modalities. We develop Partially Shared U-Net (PS-U-Net) architecture which is an efficient multimodal diffusion model that allows text and image inputs to… ▽ More Recently, diffusion models have been used successfully to fit distributions for cross-modal data translation and multimodal data generation. However, these methods rely on extensive scaling, overlooking the inefficiency and interference between modalities. We develop Partially Shared U-Net (PS-U-Net) architecture which is an efficient multimodal diffusion model that allows text and image inputs to pass through dedicated layers and skip-connections for preserving modality-specific fine-grained details. Inspired by image inpainting, we also propose a new efficient multimodal sampling method that introduces new scenarios for conditional generation while only requiring a simple joint distribution to be learned. Our empirical exploration of the MS-COCO dataset demonstrates that our method generates multimodal text and image data with higher quality compared to existing multimodal diffusion models while having a comparable size, faster training, faster multimodal sampling, and more flexible generation. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.13085 [pdf, other]

Unsupervised Representation Learning to Aid Semi-Supervised Meta Learning

Authors: Atik Faysal, Mohammad Rostami, Huaxia Wang, Avimanyu Sahoo, Ryan Antle

Abstract: Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the un… ▽ More Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the unsupervised meta-learning. A temperature-scaled cross-entropy loss is used in the inner loop of meta-learning to prevent overfitting during unsupervised learning. The learned parameters from this step are applied to the targeted supervised meta-learning in a transfer-learning fashion for initialization and fast adaptation with improved accuracy. The proposed method is model agnostic and can aid any meta-learning model to improve accuracy. We use model agnostic meta-learning (MAML) and relation network (RN) on Omniglot and mini-Imagenet datasets to demonstrate the performance of the proposed method. Furthermore, a meta-learning model with the proposed initialization can achieve satisfactory accuracy with significantly fewer training samples. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.07925 [pdf, other]

First-Order Dynamic Optimization for Streaming Convex Costs

Authors: M. Rostami, H. Moradian, S. S. Kia

Abstract: This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimiza… ▽ More This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function demonstrate our results. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.03898 [pdf, other]

Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization

Authors: Zizhao Hu, Mohammad Rostami

Abstract: Learning new tasks accumulatively without forgetting remains a critical challenge in continual learning. Generative experience replay addresses this challenge by synthesizing pseudo-data points for past learned tasks and later replaying them for concurrent training along with the new tasks' data. Generative replay is the best strategy for continual learning under a strict class-incremental setting… ▽ More Learning new tasks accumulatively without forgetting remains a critical challenge in continual learning. Generative experience replay addresses this challenge by synthesizing pseudo-data points for past learned tasks and later replaying them for concurrent training along with the new tasks' data. Generative replay is the best strategy for continual learning under a strict class-incremental setting when certain constraints need to be met: (i) constant model size, (ii) no pre-training dataset, and (iii) no memory buffer for storing past tasks' data. Inspired by the biological nervous system mechanisms, we introduce a time-aware regularization method to dynamically fine-tune the three training objective terms used for generative replay: supervised learning, latent regularization, and data reconstruction. Experimental results on major benchmarks indicate that our method pushes the limit of brain-inspired continual learners under such strict settings, improves memory retention, and increases the average performance over continually arriving tasks. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.15522 [pdf, ps, other]

Robust Internal Representations for Domain Generalization

Authors: Mohammad Rostami

Abstract: This paper which is part of the New Faculty Highlights Invited Speaker Program of AAAI'23, serves as a comprehensive survey of my research in transfer learning by utilizing embedding spaces. The work reviewed in this paper specifically revolves around the inherent challenges associated with continual learning and limited availability of labeled data. By providing an overview of my past and ongoing… ▽ More This paper which is part of the New Faculty Highlights Invited Speaker Program of AAAI'23, serves as a comprehensive survey of my research in transfer learning by utilizing embedding spaces. The work reviewed in this paper specifically revolves around the inherent challenges associated with continual learning and limited availability of labeled data. By providing an overview of my past and ongoing contributions, this paper aims to present a holistic understanding of my research, paving the way for future explorations and advancements in the field. My research delves into the various settings of transfer learning, including, few-shot learning, zero-shot learning, continual learning, domain adaptation, and distributed learning. I hope this survey provides a forward-looking perspective for researchers who would like to focus on similar research directions. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: to appear in AI Magazine Winter 2023 Issue

arXiv:2308.07535 [pdf, other]

Improved Region Proposal Network for Enhanced Few-Shot Object Detection

Authors: Zeyu Shangguan, Mohammad Rostami

Abstract: Despite significant success of deep learning in object detection tasks, the standard training of deep neural networks requires access to a substantial quantity of annotated images across all classes. Data annotation is an arduous and time-consuming endeavor, particularly when dealing with infrequent objects. Few-shot object detection (FSOD) methods have emerged as a solution to the limitations of… ▽ More Despite significant success of deep learning in object detection tasks, the standard training of deep neural networks requires access to a substantial quantity of annotated images across all classes. Data annotation is an arduous and time-consuming endeavor, particularly when dealing with infrequent objects. Few-shot object detection (FSOD) methods have emerged as a solution to the limitations of classic object detection approaches based on deep learning. FSOD methods demonstrate remarkable performance by achieving robust object detection using a significantly smaller amount of training data. A challenge for FSOD is that instances from novel classes that do not belong to the fixed set of training classes appear in the background and the base model may pick them up as potential objects. These objects behave similarly to label noise because they are classified as one of the training dataset classes, leading to FSOD performance degradation. We develop a semi-supervised algorithm to detect and then utilize these unlabeled novel objects as positive samples during the FSOD training stage to improve FSOD performance. Specifically, we develop a hierarchical ternary classification region proposal network (HTRPN) to localize the potential unlabeled novel objects and assign them new objectness labels to distinguish these objects from the base training dataset classes. Our improved hierarchical sampling strategy for the region proposal network (RPN) also boosts the perception ability of the object detection model for large objects. We test our approach and COCO and PASCAL VOC baselines that are commonly used in FSOD literature. Our experimental results indicate that our method is effective and outperforms the existing state-of-the-art (SOTA) FSOD methods. Our implementation is provided as a supplement to support reproducibility of the results. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.10422

arXiv:2305.18675 [pdf, other]

History Repeats: Overcoming Catastrophic Forgetting For Event-Centric Temporal Knowledge Graph Completion

Authors: Mehrnoosh Mirtaheri, Mohammad Rostami, Aram Galstyan

Abstract: Temporal knowledge graph (TKG) completion models typically rely on having access to the entire graph during training. However, in real-world scenarios, TKG data is often received incrementally as events unfold, leading to a dynamic non-stationary data distribution over time. While one could incorporate fine-tuning to existing methods to allow them to adapt to evolving TKG data, this can lead to fo… ▽ More Temporal knowledge graph (TKG) completion models typically rely on having access to the entire graph during training. However, in real-world scenarios, TKG data is often received incrementally as events unfold, leading to a dynamic non-stationary data distribution over time. While one could incorporate fine-tuning to existing methods to allow them to adapt to evolving TKG data, this can lead to forgetting previously learned patterns. Alternatively, retraining the model with the entire updated TKG can mitigate forgetting but is computationally burdensome. To address these challenges, we propose a general continual training framework that is applicable to any TKG completion method, and leverages two key ideas: (i) a temporal regularization that encourages repurposing of less important model parameters for learning new knowledge, and (ii) a clustering-based experience replay that reinforces the past knowledge by selectively preserving only a small portion of the past data. Our experimental results on widely used event-centric TKG datasets demonstrate the effectiveness of our proposed continual training framework in adapting to new events while reducing catastrophic forgetting. Further, we perform ablation studies to show the effectiveness of each component of our proposed framework. Finally, we investigate the relation between the memory dedicated to experience replay and the benefit gained from our clustering-based sampling strategy. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 14 pages, 6 figures

ACM Class: I.2.6; I.2.7

arXiv:2305.18433 [pdf, other]

Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models

Authors: Zizhao Hu, Mohammad Rostami

Abstract: Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional… ▽ More Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional generation. Inspired by how humans synchronously acquire multi-modal information and learn the correlation between modalities, we explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain. Our empirical results demonstrate that our approach can achieve data generation conditioned on all correlated modalities. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2304.02769 [pdf, other]

Low-Shot Learning for Fictional Claim Verification

Authors: Viswanath Chadalapaka, Derek Nguyen, JoonWon Choi, Shaunak Joshi, Mohammad Rostami

Abstract: In this paper, we study the problem of claim verification in the context of claims about fictional stories in a low-shot learning setting. To this end, we generate two synthetic datasets and then develop an end-to-end pipeline and model that is tested on both benchmarks. To test the efficacy of our pipeline and the difficulty of benchmarks, we compare our models' results against human and random a… ▽ More In this paper, we study the problem of claim verification in the context of claims about fictional stories in a low-shot learning setting. To this end, we generate two synthetic datasets and then develop an end-to-end pipeline and model that is tested on both benchmarks. To test the efficacy of our pipeline and the difficulty of benchmarks, we compare our models' results against human and random assignment results. Our code is available at https://github.com/Derposoft/plot_hole_detection. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 6 pages

arXiv:2304.02168 [pdf, other]

I2I: Initializing Adapters with Improvised Knowledge

Authors: Tejas Srinivasan, Furong Jia, Mohammad Rostami, Jesse Thomason

Abstract: Adapters present a promising solution to the catastrophic forgetting problem in continual learning. However, training independent Adapter modules for every new task misses an opportunity for cross-task knowledge transfer. We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapt… ▽ More Adapters present a promising solution to the catastrophic forgetting problem in continual learning. However, training independent Adapter modules for every new task misses an opportunity for cross-task knowledge transfer. We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters. We evaluate I2I on CLiMB, a multimodal continual learning benchmark, by conducting experiments on sequences of visual question answering tasks. Adapters trained with I2I consistently achieve better task accuracy than independently-trained Adapters, demonstrating that our algorithm facilitates knowledge transfer between task Adapters. I2I also results in better cross-task knowledge transfer than the state-of-the-art AdapterFusion without incurring the associated parametric cost. △ Less

Submitted 10 July, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: Accepted at 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2303.14615 [pdf, other]

Explainable Artificial Intelligence Architecture for Melanoma Diagnosis Using Indicator Localization and Self-Supervised Learning

Authors: Ruitong Sun, Mohammad Rostami

Abstract: Melanoma is a prevalent lethal type of cancer that is treatable if diagnosed at early stages of development. Skin lesions are a typical indicator for diagnosing melanoma but they often led to delayed diagnosis due to high similarities of cancerous and benign lesions at early stages of melanoma. Deep learning (DL) can be used as a solution to classify skin lesion pictures with a high accuracy, but… ▽ More Melanoma is a prevalent lethal type of cancer that is treatable if diagnosed at early stages of development. Skin lesions are a typical indicator for diagnosing melanoma but they often led to delayed diagnosis due to high similarities of cancerous and benign lesions at early stages of melanoma. Deep learning (DL) can be used as a solution to classify skin lesion pictures with a high accuracy, but clinical adoption of deep learning faces a significant challenge. The reason is that the decision processes of deep learning models are often uninterpretable which makes them black boxes that are challenging to trust. We develop an explainable deep learning architecture for melanoma diagnosis which generates clinically interpretable visual explanations for its decisions. Our experiments demonstrate that our proposed architectures matches clinical explanations significantly better than existing architectures. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.14423 [pdf, other]

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

Authors: Yuliang Cai, Jesse Thomason, Mohammad Rostami

Abstract: The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primari… ▽ More The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primarily consider learning unimodal vision-only or language-only tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks based on increasing the number of the learnable parameters dynamically and using knowledge distillation. The new additional parameters are used to specialize the network for each task. Our approach enables sharing information between the tasks while addressing the challenge of catastrophic forgetting. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead. Our model reaches state-of-the-art performance on challenging vision-and-language tasks. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.12424 [pdf, other]

Unsupervised Domain Adaptation for Training Event-Based Networks Using Contrastive Learning and Uncorrelated Conditioning

Authors: Dayuan Jian, Mohammad Rostami

Abstract: Event-based cameras offer reliable measurements for preforming computer vision tasks in high-dynamic range environments and during fast motion maneuvers. However, adopting deep learning in event-based vision faces the challenge of annotated data scarcity due to recency of event cameras. Transferring the knowledge that can be obtained from conventional camera annotated data offers a practical solut… ▽ More Event-based cameras offer reliable measurements for preforming computer vision tasks in high-dynamic range environments and during fast motion maneuvers. However, adopting deep learning in event-based vision faces the challenge of annotated data scarcity due to recency of event cameras. Transferring the knowledge that can be obtained from conventional camera annotated data offers a practical solution to this challenge. We develop an unsupervised domain adaptation algorithm for training a deep network for event-based data image classification using contrastive learning and uncorrelated conditioning of data. Our solution outperforms the existing algorithms for this purpose. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.12255 [pdf, other]

Encoding Binary Concepts in the Latent Space of Generative Models for Enhancing Data Representation

Authors: Zizhao Hu, Mohammad Rostami

Abstract: Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs small" and "a neuron is active or inactive". Binary concepts are ubiquitous features and can be used to transfer knowledge to improve model generalization. We p… ▽ More Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs small" and "a neuron is active or inactive". Binary concepts are ubiquitous features and can be used to transfer knowledge to improve model generalization. We propose a novel binarized regularization to facilitate learning of binary concepts to improve the quality of data generation in autoencoders. We introduce a binarizing hyperparameter $r$ in data generation process to disentangle the latent space symmetrically. We demonstrate that this method can be applied easily to existing variational autoencoder (VAE) variants to encourage symmetric disentanglement, improve reconstruction quality, and prevent posterior collapse without computation overhead. We also demonstrate that this method can boost existing models to learn more transferable representations and generate more representative samples for the input distribution which can alleviate catastrophic forgetting using generative replay under continual learning settings. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.10422 [pdf, other]

Identification of Novel Classes for Improving Few-Shot Object Detection

Authors: Zeyu Shangguan, Mohammad Rostami

Abstract: Conventional training of deep neural networks requires a large number of the annotated image which is a laborious and time-consuming task, particularly for rare objects. Few-shot object detection (FSOD) methods offer a remedy by realizing robust object detection using only a few training samples per class. An unexplored challenge for FSOD is that instances from unlabeled novel classes that do not… ▽ More Conventional training of deep neural networks requires a large number of the annotated image which is a laborious and time-consuming task, particularly for rare objects. Few-shot object detection (FSOD) methods offer a remedy by realizing robust object detection using only a few training samples per class. An unexplored challenge for FSOD is that instances from unlabeled novel classes that do not belong to the fixed set of training classes appear in the background. These objects behave similarly to label noise, leading to FSOD performance degradation. We develop a semi-supervised algorithm to detect and then utilize these unlabeled novel objects as positive samples during training to improve FSOD performance. Specifically, we propose a hierarchical ternary classification region proposal network (HTRPN) to localize the potential unlabeled novel objects and assign them new objectness labels. Our improved hierarchical sampling strategy for the region proposal network (RPN) also boosts the perception ability of the object detection model for large objects. Our experimental results indicate that our method is effective and outperforms the existing state-of-the-art (SOTA) FSOD methods. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2301.12369 [pdf, other]

Preserving Fairness in AI under Domain Shift

Authors: Serban Stan, Mohammad Rostami

Abstract: Existing algorithms for ensuring fairness in AI use a single-shot training strategy, where an AI model is trained on an annotated training dataset with sensitive attributes and then fielded for utilization. This training strategy is effective in problems with stationary distributions, where both training and testing data are drawn from the same distribution. However, it is vulnerable with respect… ▽ More Existing algorithms for ensuring fairness in AI use a single-shot training strategy, where an AI model is trained on an annotated training dataset with sensitive attributes and then fielded for utilization. This training strategy is effective in problems with stationary distributions, where both training and testing data are drawn from the same distribution. However, it is vulnerable with respect to distributional shifts in the input space that may occur after the initial training phase. As a result, the time-dependent nature of data can introduce biases into the model predictions. Model retraining from scratch using a new annotated dataset is a naive solution that is expensive and time-consuming. We develop an algorithm to adapt a fair model to remain fair under domain shift using solely new unannotated data points. We recast this learning setting as an unsupervised domain adaptation problem. Our algorithm is based on updating the model such that the internal representation of data remains unbiased despite distributional shifts in the input space. We provide extensive empirical validation on three widely employed fairness datasets to demonstrate the effectiveness of our algorithm. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2301.12361 [pdf, other]

Graph Harmony: Denoising and Nuclear-Norm Wasserstein Adaptation for Enhanced Domain Transfer in Graph-Structured Data

Authors: Mengxi Wu, Mohammad Rostami

Abstract: Graph-structured data can be found in numerous domains, yet the scarcity of labeled instances hinders its effective utilization of deep learning in many scenarios. Traditional unsupervised domain adaptation (UDA) strategies for graphs primarily hinge on adversarial learning and pseudo-labeling. These approaches fail to effectively leverage graph discriminative features, leading to class mismatchin… ▽ More Graph-structured data can be found in numerous domains, yet the scarcity of labeled instances hinders its effective utilization of deep learning in many scenarios. Traditional unsupervised domain adaptation (UDA) strategies for graphs primarily hinge on adversarial learning and pseudo-labeling. These approaches fail to effectively leverage graph discriminative features, leading to class mismatching and unreliable label quality. To navigate these obstacles, we develop the Denoising and Nuclear-Norm Wasserstein Adaptation Network (DNAN). DNAN employs the Nuclear-norm Wasserstein discrepancy (NWD), which can simultaneously achieve domain alignment and class distinguishment. DANA also integrates a denoising mechanism via a variational graph autoencoder that mitigates data noise. This denoising mechanism helps capture essential features of both source and target domains, improving the robustness of the domain adaptation process. Our comprehensive experiments demonstrate that DNAN outperforms state-of-the-art methods on standard UDA benchmarks for graph classification. △ Less

Submitted 12 December, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

arXiv:2211.00807 [pdf, other]

Unsupervised Model Adaptation for Source-free Segmentation of Medical Images

Authors: Serban Stan, Mohammad Rostami

Abstract: The recent prevalence of deep neural networks has lead semantic segmentation networks to achieve human-level performance in the medical field when sufficient training data is provided. Such networks however fail to generalize when tasked with predicting semantic maps for out-of-distribution images, requiring model re-training on the new distributions. This expensive process necessitates expert kno… ▽ More The recent prevalence of deep neural networks has lead semantic segmentation networks to achieve human-level performance in the medical field when sufficient training data is provided. Such networks however fail to generalize when tasked with predicting semantic maps for out-of-distribution images, requiring model re-training on the new distributions. This expensive process necessitates expert knowledge in order to generate training labels. Distribution shifts can arise naturally in the medical field via the choice of imaging device, i.e. MRI or CT scanners. To combat the need for labeling images in a target domain after a model is successfully trained in a fully annotated \textit{source domain} with a different data distribution, unsupervised domain adaptation (UDA) can be used. Most UDA approaches ensure target generalization by creating a shared source/target latent feature space. This allows a source trained classifier to maintain performance on the target domain. However most UDA approaches require joint source and target data access, which may create privacy leaks with respect to patient information. We propose an UDA algorithm for medical image segmentation that does not require access to source data during adaptation, and is thus capable in maintaining patient data privacy. We rely on an approximation of the source latent features at adaptation time, and create a joint source/target embedding space by minimizing a distributional distance metric based on optimal transport. We demonstrate our approach is competitive to recent UDA medical segmentation works even with the added privacy requisite. △ Less

Submitted 29 July, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.14362 [pdf, other]

Federated Learning Using Variance Reduced Stochastic Gradient for Probabilistically Activated Agents

Authors: M. R. Rostami, S. S. Kia

Abstract: This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration. In distributed machine learning, when privacy matters, FL is a functional tool. Placing FL in an environment where it has some i… ▽ More This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration. In distributed machine learning, when privacy matters, FL is a functional tool. Placing FL in an environment where it has some irregular connections of agents (devices), reaching a trained model in both an economical and quick way can be a demanding job. The first layer of our algorithm corresponds to the model parameter propagation across agents done by the server. In the second layer, each agent does its local update with a stochastic and variance-reduced technique called Stochastic Variance Reduced Gradient (SVRG). We leverage the concept of variance reduction from stochastic optimization when the agents want to do their local update step to reduce the variance caused by stochastic gradient descent (SGD). We provide a convergence bound for our algorithm which improves the rate from $O(\frac{1}{\sqrt{K}})$ to $O(\frac{1}{K})$ by using a constant step-size. We demonstrate the performance of our algorithm using numerical examples. △ Less

Submitted 1 April, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2209.14644 [pdf, other]

Increasing Model Generalizability for Unsupervised Domain Adaptation

Authors: Mohammad Rostami

Abstract: A dominant approach for addressing unsupervised domain adaptation is to map data points for the source and the target domains into an embedding space which is modeled as the output-space of a shared deep encoder. The encoder is trained to make the embedding space domain-agnostic to make a source-trained classifier generalizable on the target domain. A secondary mechanism to improve UDA performance… ▽ More A dominant approach for addressing unsupervised domain adaptation is to map data points for the source and the target domains into an embedding space which is modeled as the output-space of a shared deep encoder. The encoder is trained to make the embedding space domain-agnostic to make a source-trained classifier generalizable on the target domain. A secondary mechanism to improve UDA performance further is to make the source domain distribution more compact to improve model generalizability. We demonstrate that increasing the interclass margins in the embedding space can help to develop a UDA algorithm with improved performance. We estimate the internally learned multi-modal distribution for the source domain, learned as a result of pretraining, and use it to increase the interclass class separation in the source domain to reduce the effect of domain shift. We demonstrate that using our approach leads to improved model generalizability on four standard benchmark UDA image classification datasets and compares favorably against exiting methods. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Presented 2022 Conference on Lifelong Learning Agents

arXiv:2208.07825 [pdf, other]

doi 10.22111/IJFS.2023.7876

An Adaptive Image Encryption Scheme Guided by Fuzzy Models

Authors: Mahdi Shariatzadeh, Mohammad Javad Rostami, Mahdi Eftekhari

Abstract: A new image encryption scheme using the advanced encryption standard (AES), a chaotic map, a genetic operator, and a fuzzy inference system is proposed in this paper. In this work, plain images were used as input, and the required security level was achieved. Security criteria were computed after running a proposed encryption process. Then an adaptive fuzzy system decided whether to repeat the enc… ▽ More A new image encryption scheme using the advanced encryption standard (AES), a chaotic map, a genetic operator, and a fuzzy inference system is proposed in this paper. In this work, plain images were used as input, and the required security level was achieved. Security criteria were computed after running a proposed encryption process. Then an adaptive fuzzy system decided whether to repeat the encryption process, terminate it, or run the next stage based on the achieved results and user demand. The SHA-512 hash function was employed to increase key sensitivity. Security analysis was conducted to evaluate the security of the proposed scheme, which showed it had high security and all the criteria necessary for a good and efficient encryption algorithm were met. Simulation results and the comparison of similar works showed the proposed encryptor had a pseudo-noise output and was strongly dependent upon the changing key and plain image. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: Iranian Journal of Fuzzy Systems (2023)

arXiv:2208.07635 [pdf]

A New Scheme for Image Compression and Encryption Using ECIES, Henon Map, and AEGAN

Authors: Mahdi Shariatzadeh, Mahdi Eftekhari, Mohammad Javad Rostami

Abstract: Providing security in the transmission of images and other multimedia data has become one of the most important scientific and practical issues. In this paper, a method for compressing and encryption images is proposed, which can safely transmit images in low-bandwidth data transmission channels. At first, using the autoencoding generative adversarial network (AEGAN) model, the images are mapped t… ▽ More Providing security in the transmission of images and other multimedia data has become one of the most important scientific and practical issues. In this paper, a method for compressing and encryption images is proposed, which can safely transmit images in low-bandwidth data transmission channels. At first, using the autoencoding generative adversarial network (AEGAN) model, the images are mapped to a vector in the latent space with low dimensions. In the next step, the obtained vector is encrypted using public key encryption methods. In the proposed method, Henon chaotic map is used for permutation, which makes information transfer more secure. To evaluate the results of the proposed scheme, three criteria SSIM, PSNR, and execution time have been used. △ Less

Submitted 24 August, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

arXiv:2207.04565 [pdf, other]

Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning

Authors: Kleanthis Avramidis, Mohammad Rostami, Melinda Chang, Shrikanth Narayanan

Abstract: Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especi… ▽ More Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especially in the presence of challenges posed by pseudopapilledema that has similar fundus appearance but distinct clinical implications. We present a deep learning-based algorithm for the automatic detection of pediatric papilledema. Our approach is based on optic disc localization and detection of explainable papilledema indicators through data augmentation. Experiments on real-world clinical data demonstrate that our proposed method is effective with a diagnostic accuracy comparable to expert ophthalmologists. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 5 pages, 4 figures, 2 tables, 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2206.09059 [pdf, other]

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Authors: Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason

Abstract: Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to… ▽ More Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer. We envision that CLiMB will facilitate research on a new class of CL algorithms for this challenging multimodal setting. △ Less

Submitted 24 November, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted to NeurIPS 2022 Datasets and Benchmarks track

arXiv:2110.04662 [pdf, other]

Cognitively Inspired Learning of Incremental Drifting Concepts

Authors: Mohammad Rostami, Aram Galstyan

Abstract: Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to… ▽ More Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference. △ Less

Submitted 21 April, 2023; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: 2023 International Joint Conference on Artificial Intelligence

arXiv:2109.02051 [pdf, other]

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Authors: Amir Mohammad Rostami, Mohammad Mehdi Homayounpour, Ahmad Nickabadi

Abstract: Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. Upon further inve… ▽ More Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. Upon further investigation of the proposed methods, it appears that a broader three-tiered view of the proposed systems. comprised of the classifier, feature extraction phase, and model loss function, may to some extent lessen the problem. Accordingly, the present study proposes the Efficient Attention Branch Network (EABN) modular architecture with a combined loss function to address the generalization problem... △ Less

Submitted 19 September, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2108.12081 [pdf, other]

Detection and Continual Learning of Novel Face Presentation Attacks

Authors: Mohammad Rostami, Leonidas Spinoulas, Mohamed Hussein, Joe Mathai, Wael Abd-Almageed

Abstract: Advances in deep learning, combined with availability of large datasets, have led to impressive improvements in face presentation attack detection research. However, state-of-the-art face antispoofing systems are still vulnerable to novel types of attacks that are never seen during training. Moreover, even if such attacks are correctly detected, these systems lack the ability to adapt to newly enc… ▽ More Advances in deep learning, combined with availability of large datasets, have led to impressive improvements in face presentation attack detection research. However, state-of-the-art face antispoofing systems are still vulnerable to novel types of attacks that are never seen during training. Moreover, even if such attacks are correctly detected, these systems lack the ability to adapt to newly encountered attacks. The post-training ability of continually detecting new types of attacks and self-adaptation to identify these attack types, after the initial detection phase, is highly appealing. In this paper, we enable a deep neural network to detect anomalies in the observed input data points as potential new types of attacks by suppressing the confidence-level of the network outside the training samples' distribution. We then use experience replay to update the model to incorporate knowledge about new types of attacks without forgetting the past learned attack types. Experimental results are provided to demonstrate the effectiveness of the proposed method on two benchmark datasets as well as a newly introduced dataset which exhibits a large variety of attack types. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Journal ref: 2021 International Conference on Computer Vision

arXiv:2107.01598 [pdf, other]

Domain Adaptation for Sentiment Analysis Using Increased Intraclass Separation

Authors: Mohammad Rostami, Aram Galstyan

Abstract: Sentiment analysis is a costly yet necessary task for enterprises to study the opinions of their customers to improve their products and to determine optimal marketing strategies. Due to the existence of a wide range of domains across different products and services, cross-domain sentiment analysis methods have received significant attention. These methods mitigate the domain gap between different… ▽ More Sentiment analysis is a costly yet necessary task for enterprises to study the opinions of their customers to improve their products and to determine optimal marketing strategies. Due to the existence of a wide range of domains across different products and services, cross-domain sentiment analysis methods have received significant attention. These methods mitigate the domain gap between different applications by training cross-domain generalizable classifiers which help to relax the need for data annotation for each domain. Most existing methods focus on learning domain-agnostic representations that are invariant with respect to both the source and the target domains. As a result, a classifier that is trained using the source domain annotated data would generalize well in a related target domain. We introduce a new domain adaptation method which induces large margins between different classes in an embedding space. This embedding space is trained to be domain-agnostic by matching the data distributions across the domains. Large intraclass margins in the source domain help to reduce the effect of "domain shift" on the classifier performance in the target domain. Theoretical and empirical analysis are provided to demonstrate that the proposed method is effective. △ Less

Submitted 4 July, 2021; originally announced July 2021.

arXiv:2106.12124 [pdf, other]

Secure Domain Adaptation with Multiple Sources

Authors: Serban Stan, Mohammad Rostami

Abstract: Multi-source unsupervised domain adaptation (MUDA) is a framework to address the challenge of annotated data scarcity in a target domain via transferring knowledge from multiple annotated source domains. When the source domains are distributed, data privacy and security can become significant concerns and protocols may limit data sharing, yet existing MUDA methods overlook these constraints. We de… ▽ More Multi-source unsupervised domain adaptation (MUDA) is a framework to address the challenge of annotated data scarcity in a target domain via transferring knowledge from multiple annotated source domains. When the source domains are distributed, data privacy and security can become significant concerns and protocols may limit data sharing, yet existing MUDA methods overlook these constraints. We develop an algorithm to address MUDA when source domain data cannot be shared with the target or across the source domains. Our method is based on aligning the distributions of source and target domains indirectly via estimating the source feature embeddings and predicting over a confidence based combination of domain specific model predictions. We provide theoretical analysis to support our approach and conduct empirical experiments to demonstrate that our algorithm is effective. △ Less

Submitted 14 November, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

arXiv:2104.08808 [pdf, other]

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning

Authors: Xisen **, Bill Yuchen Lin, Mohammad Rostami, Xiang Ren

Abstract: The ability to continuously expand knowledge over time and utilize it to rapidly generalize to new tasks is a key feature of human linguistic intelligence. Existing models that pursue rapid generalization to new tasks (e.g., few-shot learning methods), however, are mostly trained in a single shot on fixed datasets, unable to dynamically expand their knowledge; while continual learning algorithms a… ▽ More The ability to continuously expand knowledge over time and utilize it to rapidly generalize to new tasks is a key feature of human linguistic intelligence. Existing models that pursue rapid generalization to new tasks (e.g., few-shot learning methods), however, are mostly trained in a single shot on fixed datasets, unable to dynamically expand their knowledge; while continual learning algorithms are not specifically designed for rapid generalization. We present a new learning setup, Continual Learning of Few-Shot Learners (CLIF), to address the challenges of both learning settings in a unified setup. CLIF assumes a model learns from a sequence of diverse NLP tasks arriving sequentially, accumulating knowledge for improved generalization to new tasks, while also retaining performance on the tasks learned earlier. We examine how the generalization ability is affected in the continual learning setup, evaluate a number of continual learning algorithms, and propose a novel regularized adapter generation approach. We find that catastrophic forgetting affects generalization ability to a less degree than performance on seen tasks; while continual learning algorithms can still bring considerable benefit to the generalization ability. △ Less

Submitted 20 August, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted at Findings of EMNLP 2021; Fixed an error in Table 3 (see footnote 4); Updated Q3 in Sec. 4.2

arXiv:2101.00522 [pdf, other]

Domain Adaptation for the Segmentation of Confidential Medical Images

Authors: Serban Stan, Mohammad Rostami

Abstract: Convolutional neural networks (CNNs) have led to significant improvements in the semantic segmentation of images. When source and target datasets come from different modalities, CNN performance suffers due to domain shift. In such cases data annotation in the target domain becomes necessary to maintain model performance. To circumvent the re-annotation process, unsupervised domain adaptation (UDA)… ▽ More Convolutional neural networks (CNNs) have led to significant improvements in the semantic segmentation of images. When source and target datasets come from different modalities, CNN performance suffers due to domain shift. In such cases data annotation in the target domain becomes necessary to maintain model performance. To circumvent the re-annotation process, unsupervised domain adaptation (UDA) is proposed to adapt a model to new modalities using solely unlabeled target data. Common UDA algorithms require access to source domain data during adaptation, which may not be feasible in medical imaging due to data sharing restrictions. In this work, we develop an algorithm for UDA where the source domain data is inaccessible during target adaptation. Our approach is based on encoding the source domain information into an internal distribution that is used to guide adaptation in the absence of source samples. We demonstrate the effectiveness of our algorithm by comparing it to state-of-the-art medical image semantic segmentation approaches on two medical image semantic segmentation datasets. △ Less

Submitted 7 October, 2022; v1 submitted 2 January, 2021; originally announced January 2021.

arXiv:2012.15695 [pdf, other]

EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting

Authors: Amir Mohammad Rostami, Ali Karimi, Mohammad Ali Akhaee

Abstract: Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 sam… ▽ More Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes. The continuous speech synthesis method proposed to made FKD usable in the practical application which works with continuous speeches. Besides, we proposed a lightweight architecture called EfficientNet-A0 (absolute zero) by applying the compound scaling method on EfficientNet-B0 for keyword spotting task. Finally, the proposed architecture is evaluated with various models. It is realized that EfficientNet-A0 and Resnet models outperform other models on this dataset. △ Less

Submitted 31 December, 2020; originally announced December 2020.

Comments: 9 pages, 2 figures

arXiv:2010.12144 [pdf, other]

One-shot Learning for Temporal Knowledge Graphs

Authors: Mehrnoosh Mirtaheri, Mohammad Rostami, Xiang Ren, Fred Morstatter, Aram Galstyan

Abstract: Most real-world knowledge graphs are characterized by a long-tail relation frequency distribution where a significant fraction of relations occurs only a handful of times. This observation has given rise to recent interest in low-shot learning methods that are able to generalize from only a few examples. The existing approaches, however, are tailored to static knowledge graphs and not easily gener… ▽ More Most real-world knowledge graphs are characterized by a long-tail relation frequency distribution where a significant fraction of relations occurs only a handful of times. This observation has given rise to recent interest in low-shot learning methods that are able to generalize from only a few examples. The existing approaches, however, are tailored to static knowledge graphs and not easily generalized to temporal settings, where data scarcity poses even bigger problems, e.g., due to occurrence of new, previously unseen relations. We address this shortcoming by proposing a one-shot learning framework for link prediction in temporal knowledge graphs. Our proposed method employs a self-attention mechanism to effectively encode temporal interactions between entities, and a network to compute a similarity score between a given query and a (one-shot) example. Our experiments show that the proposed algorithm outperforms the state of the art baselines for two well-studied benchmarks while achieving significantly better performance for sparse relations. △ Less

Submitted 22 October, 2020; originally announced October 2020.

arXiv:2009.12518 [pdf, other]

Unsupervised Model Adaptation for Continual Semantic Segmentation

Authors: Serban Stan, Mohammad Rostami

Abstract: We develop an algorithm for adapting a semantic segmentation model that is trained using a labeled source domain to generalize well in an unlabeled target domain. A similar problem has been studied extensively in the unsupervised domain adaptation (UDA) literature, but existing UDA algorithms require access to both the source domain labeled data and the target domain unlabeled data for training a… ▽ More We develop an algorithm for adapting a semantic segmentation model that is trained using a labeled source domain to generalize well in an unlabeled target domain. A similar problem has been studied extensively in the unsupervised domain adaptation (UDA) literature, but existing UDA algorithms require access to both the source domain labeled data and the target domain unlabeled data for training a domain agnostic semantic segmentation model. Relaxing this constraint enables a user to adapt pretrained models to generalize in a target domain, without requiring access to source data. To this end, we learn a prototypical distribution for the source domain in an intermediate embedding space. This distribution encodes the abstract knowledge that is learned from the source domain. We then use this distribution for aligning the target domain distribution with the source domain distribution in the embedding space. We provide theoretical analysis and explain conditions under which our algorithm is effective. Experiments on benchmark adaptation task demonstrate our method achieves competitive performance even compared with joint UDA approaches. △ Less

Submitted 9 January, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: 12 pages, 5 figures

arXiv:2009.04825 [pdf]

Presentation a Trust Walker for rating prediction in Recommender System with Biased Random Walk: Effects of H-index Centrality, Similarity in Items and Friends

Authors: Saman Forouzandeh, Mehrdad Rostami, Kamal Berahmand

Abstract: The use of recommender systems has increased dramatically to assist online social network users in the decision-making process and selecting appropriate items. On the other hand, due to many different items, users cannot score a wide range of them, and usually, there is a scattering problem for the matrix created for users. To solve the problem, the trust-based recommender systems are applied to p… ▽ More The use of recommender systems has increased dramatically to assist online social network users in the decision-making process and selecting appropriate items. On the other hand, due to many different items, users cannot score a wide range of them, and usually, there is a scattering problem for the matrix created for users. To solve the problem, the trust-based recommender systems are applied to predict the score of the desired item for the user. Various criteria have been considered to define trust, and the degree of trust between users is usually calculated based on these criteria. In this regard, it is impossible to obtain the degree of trust for all users because of the large number of them in social networks. Also, for this problem, researchers use different modes of the Random Walk algorithm to randomly visit some users, study their behavior, and gain the degree of trust between them. In the present study, a trust-based recommender system is presented that predicts the score of items that the target user has not rated, and if the item is not found, it offers the user the items dependent on that item that are also part of the user's interests. In a trusted network, by weighting the edges between the nodes, the degree of trust is determined, and a TrustWalker is developed, which uses the Biased Random Walk (BRW) algorithm to move between the nodes. The weight of the edges is effective in the selection of random steps. The implementation and evaluation of the present research method have been carried out on three datasets named Epinions, Flixster, and FilmTrust; the results reveal the high efficiency of the proposed method. △ Less

Submitted 10 September, 2020; originally announced September 2020.

arXiv:2008.04103 [pdf]

Review of Swarm Intelligence-based Feature Selection Methods

Authors: Mehrdad Rostami, Kamal Berahmand, Saman Forouzandeh

Abstract: In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. An important issue with these applications is the curse of dimensionality, where the number of features is much higher than the number… ▽ More In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. An important issue with these applications is the curse of dimensionality, where the number of features is much higher than the number of patterns. One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the data mining task and reduce its computational complexity. The feature selection method aims at selecting a subset of features with the lowest inner similarity and highest relevancy to the target class. It reduces the dimensionality of the data by eliminating irrelevant, redundant, or noisy data. In this paper, a comparative analysis of different feature selection methods is presented, and a general categorization of these methods is performed. Moreover, in this paper, state-of-the-art swarm intelligence are studied, and the recent feature selection methods based on these algorithms are reviewed. Furthermore, the strengths and weaknesses of the different studied swarm intelligence-based feature selection methods are evaluated. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Showing 1–50 of 64 results for author: Rostami, M