Search | arXiv e-print repository

arXiv:2407.00116 [pdf, other]

Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges

Authors: Mahmoud Ibrahim, Yasmina Al Khalil, Sina Amirrajab, Chang Sun, Marcel Breeuwer, Josien Pluim, Bart Elen, Gokhan Ertaylan, Michel Dumontier

Abstract: This paper presents a comprehensive systematic review of generative models (GANs, VAEs, DMs, and LLMs) used to synthesize various medical data types, including imaging (dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray), text, time-series, and tabular data (EHR). Unlike previous narrowly focused reviews, our study encompasses a broad array of medical data modalities and explores various ge… ▽ More This paper presents a comprehensive systematic review of generative models (GANs, VAEs, DMs, and LLMs) used to synthesize various medical data types, including imaging (dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray), text, time-series, and tabular data (EHR). Unlike previous narrowly focused reviews, our study encompasses a broad array of medical data modalities and explores various generative models. Our search strategy queries databases such as Scopus, PubMed, and ArXiv, focusing on recent works from January 2021 to November 2023, excluding reviews and perspectives. This period emphasizes recent advancements beyond GANs, which have been extensively covered previously. The survey reveals insights from three key aspects: (1) Synthesis applications and purpose of synthesis, (2) generation techniques, and (3) evaluation methods. It highlights clinically valid synthesis applications, demonstrating the potential of synthetic data to tackle diverse clinical requirements. While conditional models incorporating class labels, segmentation masks and image translations are prevalent, there is a gap in utilizing prior clinical knowledge and patient-specific context, suggesting a need for more personalized synthesis approaches and emphasizing the importance of tailoring generative approaches to the unique characteristics of medical data. Additionally, there is a significant gap in using synthetic data beyond augmentation, such as for validation and evaluation of downstream medical AI models. The survey uncovers that the lack of standardized evaluation methodologies tailored to medical images is a barrier to clinical application, underscoring the need for in-depth evaluation approaches, benchmarking, and comparative studies to promote openness and collaboration. △ Less

Submitted 2 July, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

arXiv:2406.10743 [pdf, other]

Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

Authors: Mark Ibrahim, David Klindt, Randall Balestriero

Abstract: Deep Learning is often depicted as a trio of data-architecture-loss. Yet, recent Self Supervised Learning (SSL) solutions have introduced numerous additional design choices, e.g., a projector network, positive views, or teacher-student networks. These additions pose two challenges. First, they limit the impact of theoretical studies that often fail to incorporate all those intertwined designs. Sec… ▽ More Deep Learning is often depicted as a trio of data-architecture-loss. Yet, recent Self Supervised Learning (SSL) solutions have introduced numerous additional design choices, e.g., a projector network, positive views, or teacher-student networks. These additions pose two challenges. First, they limit the impact of theoretical studies that often fail to incorporate all those intertwined designs. Second, they slow-down the deployment of SSL methods to new domains as numerous hyper-parameters need to be carefully tuned. In this study, we bring forward the surprising observation that--at least for pretraining datasets of up to a few hundred thousands samples--the additional designs introduced by SSL do not contribute to the quality of the learned representations. That finding not only provides legitimacy to existing theoretical studies, but also simplifies the practitioner's path to SSL deployment in numerous small and medium scale settings. Our finding answers a long-lasting question: the often-experienced sensitivity to training settings and hyper-parameters encountered in SSL come from their design, rather than the absence of supervised guidance. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.05183 [pdf, other]

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Authors: Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

Abstract: Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorizatio… ▽ More Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2406.00360 [pdf, other]

L2R-CIPU: Efficient CNN Computation with Left-to-Right Composite Inner Product Units

Authors: Malik Zohaib Nisar, Mohammad Sohail Ibrahim, Muhammad Usman, Jeong-A Lee

Abstract: This paper proposes a composite inner-product computation unit based on left-to-right (LR) arithmetic for the acceleration of convolution neural networks (CNN) on hardware. The efficacy of the proposed L2R-CIPU method has been shown on the VGG-16 network, and assessment is done on various performance metrics. The L2R-CIPU design achieves 1.06x to 6.22x greater performance, 4.8x to 15x more TOPS/W,… ▽ More This paper proposes a composite inner-product computation unit based on left-to-right (LR) arithmetic for the acceleration of convolution neural networks (CNN) on hardware. The efficacy of the proposed L2R-CIPU method has been shown on the VGG-16 network, and assessment is done on various performance metrics. The L2R-CIPU design achieves 1.06x to 6.22x greater performance, 4.8x to 15x more TOPS/W, and 4.51x to 53.45x higher TOPS/mm2 than prior architectures. △ Less

Submitted 10 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.17247 [pdf, other]

An Introduction to Vision-Language Modeling

Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind map** vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on map** images to language, we also discuss extending VLMs to videos. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13356 [pdf, other]

Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings

Authors: Nurullah Sevim, Mostafa Ibrahim, Sabit Ekin

Abstract: The advent of Large Language Models (LLMs) has revolutionized language understanding and human-like text generation, drawing interest from many other fields with this question in mind: What else are the LLMs capable of? Despite their widespread adoption, ongoing research continues to explore new ways to integrate LLMs into diverse systems. This paper explores new techniques to harness the power… ▽ More The advent of Large Language Models (LLMs) has revolutionized language understanding and human-like text generation, drawing interest from many other fields with this question in mind: What else are the LLMs capable of? Despite their widespread adoption, ongoing research continues to explore new ways to integrate LLMs into diverse systems. This paper explores new techniques to harness the power of LLMs for 6G (6th Generation) wireless communication technologies, a domain where automation and intelligent systems are pivotal. The inherent adaptability of LLMs to domain-specific tasks positions them as prime candidates for enhancing wireless systems in the 6G landscape. We introduce a novel Reinforcement Learning (RL) based framework that leverages LLMs for network deployment in wireless communications. Our approach involves training an RL agent, utilizing LLMs as its core, in an urban setting to maximize coverage. The agent's objective is to navigate the complexities of urban environments and identify the network parameters for optimal area coverage. Additionally, we integrate LLMs with Convolutional Neural Networks (CNNs) to capitalize on their strengths while mitigating their limitations. The Deep Deterministic Policy Gradient (DDPG) algorithm is employed for training purposes. The results suggest that LLM-assisted models can outperform CNN-based models in some cases while performing at least as well in others. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.00740 [pdf, other]

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v… ▽ More There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations. △ Less

Submitted 14 May, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

arXiv:2404.16717 [pdf, other]

doi 10.1145/3630106.3659039

Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

Authors: Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

Abstract: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard V… ▽ More Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard VLM classifiers map all instances of a class to a \it{single vector based on the class label}. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing hierarchies, diverse object states, and real-world geographic diversity, as well finer-grained datasets where intra-class diversity may be less prevalent. Importantly, our method is inherently interpretable, offering faithful explanations for each inference to facilitate model debugging and enhance transparency. We also find our method scales efficiently to a large number of attributes to account for diversity -- leading to more accurate predictions for atypical instances. Finally, we characterize a principled trade-off between overall and worst class accuracy, which can be tuned via a hyperparameter of our method. We hope this work spurs further research into the promise of zero-shot classification beyond a single class vector for capturing diversity in the world, and building transparent AI systems without compromising performance. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted to FAccT 2024

arXiv:2404.10960 [pdf, other]

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

Authors: Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, Mark Ibrahim

Abstract: A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering… ▽ More A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know. Inspired by analogous approaches in classification, this study explores the feasibility and efficacy of abstaining while uncertain in the context of LLMs within the domain of question-answering. We investigate two kinds of uncertainties, statistical uncertainty metrics and a distinct verbalized measure, termed as In-Dialogue Uncertainty (InDU). Using these uncertainty measures combined with models with and without Reinforcement Learning with Human Feedback (RLHF), we show that in all three situations, abstention based on the right kind of uncertainty measure can boost the reliability of LLMs. By sacrificing only a few highly uncertain samples we can improve correctness by 2% to 8%, avoid 50% hallucinations via correctly identifying unanswerable questions and increase safety by 70% up to 99% with almost no additional computational overhead. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.04173 [pdf, other]

H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations

Authors: Zishen Wan, Che-Kai Liu, Mohamed Ibrahim, Hanchen Yang, Samuel Spetalnick, Tushar Krishna, Arijit Raychowdhury

Abstract: Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative com… ▽ More Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative computation with high-dimensional matrix-vector multiplications and suffers from non-convergence problems. In this paper, we present H3DFact, a heterogeneous 3D integrated in-memory compute engine capable of efficiently factorizing high-dimensional holographic representations. H3DFact exploits the computation-in-superposition capability of holographic vectors and the intrinsic stochasticity associated with memristive-based 3D compute-in-memory. Evaluated on large-scale factorization and perceptual problems, H3DFact demonstrates superior capability in factorization accuracy and operational capacity by up to five orders of magnitude, with 5.5x compute density, 1.2x energy efficiency improvements, and 5.9x less silicon footprint compared to iso-capacity 2D designs. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 2024 Design Automation and Test in Europe (DATE); The first two authors have equal contributions

arXiv:2403.20297 [pdf, other]

Balanced Data Placement for GEMV Acceleration with Processing-In-Memory

Authors: Mohamed Assem Ibrahim, Mahzabeen Islam, Shaizeen Aga

Abstract: With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving considerable attention. A challenge with GEMVs is the high memory bandwidth this primitive demands. Multiple memory vendors have proposed commercially viable processing-in-memory (PIM) prototypes that attain bandwidth boost o… ▽ More With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving considerable attention. A challenge with GEMVs is the high memory bandwidth this primitive demands. Multiple memory vendors have proposed commercially viable processing-in-memory (PIM) prototypes that attain bandwidth boost over processor via augmenting memory banks with compute capabilities and broadcasting same command to all banks. While proposed PIM designs stand to accelerate GEMV, we observe in this work that a key impediment to truly harness PIM acceleration is deducing optimal data-placement to place the matrix in memory banks. To this end, we tease out several factors that impact data-placement and propose PIMnast methodology which, like a gymnast, balances these factors to identify data-placements that deliver GEMV acceleration. Across a spectrum of GenAI models, our proposed PIMnast methodology along with additional orchestration knobs we identify delivers up to 6.86$\times$ speedup for GEMVs (of the available 7$\times$ roofline speedup) leading to up to 5$\times$ speedup for per-token latencies. △ Less

Submitted 1 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.14124 [pdf, other]

Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling

Authors: Yong He, Hongshan Yu, Muhammad Ibrahim, Xiaoyan Liu, Tongjia Chen, Anwaar Ulhaq, Ajmal Mian

Abstract: Point cloud processing methods leverage local and global point features %at the feature level to cater to downstream tasks, yet they often overlook the task-level context inherent in point clouds during the encoding stage. We argue that integrating task-level information into the encoding stage significantly enhances performance. To that end, we propose SMTransformer which incorporates task-level… ▽ More Point cloud processing methods leverage local and global point features %at the feature level to cater to downstream tasks, yet they often overlook the task-level context inherent in point clouds during the encoding stage. We argue that integrating task-level information into the encoding stage significantly enhances performance. To that end, we propose SMTransformer which incorporates task-level information into a vector-based transformer by utilizing a soft mask generated from task-level queries and keys to learn the attention weights. Additionally, to facilitate effective communication between features from the encoding and decoding layers in high-level tasks such as segmentation, we introduce a skip-attention-based up-sampling block. This block dynamically fuses features from various resolution points across the encoding and decoding layers. To mitigate the increase in network parameters and training time resulting from the complexity of the aforementioned blocks, we propose a novel shared position encoding strategy. This strategy allows various transformer blocks to share the same position information over the same resolution points, thereby reducing network parameters and training time without compromising accuracy.Experimental comparisons with existing methods on multiple datasets demonstrate the efficacy of SMTransformer and skip-attention-based up-sampling for point cloud processing tasks, including semantic segmentation and classification. In particular, we achieve state-of-the-art semantic segmentation results of 73.4% mIoU on S3DIS Area 5 and 62.4% mIoU on SWAN dataset △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

arXiv:2402.07329 [pdf, other]

The Bias of Harmful Label Associations in Vision-Language Models

Authors: Caner Hazirbas, Alicia Sun, Yonathan Efroni, Mark Ibrahim

Abstract: Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness. While prior work has uncovered bias in vision-language models' (VLMs) classification performance across geography, work has been limited along the important axis of harmful label associations due to a lack o… ▽ More Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness. While prior work has uncovered bias in vision-language models' (VLMs) classification performance across geography, work has been limited along the important axis of harmful label associations due to a lack of rich, labeled data. In this work, we investigate harmful label associations in the recently released Casual Conversations datasets containing more than 70,000 videos. We study bias in the frequency of harmful label associations across self-provided labels for age, gender, apparent skin tone, and physical adornments across several leading VLMs. We find that VLMs are $4-7$x more likely to harmfully classify individuals with darker skin tones. We also find scaling transformer encoder model size leads to higher confidence in harmful predictions. Finally, we find improvements on standard vision tasks across VLMs does not address disparities in harmful label associations. △ Less

Submitted 15 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.01933 [pdf, other]

ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

Authors: Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

Abstract: Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"… ▽ More Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?" This paper presents ToMoBrush, a dental health sensing system that explores using off-the-shelf sonic toothbrushes for dental condition detection. Our solution leverages the fact that a sonic toothbrush produces rich acoustic signals when in contact with teeth, which contain important information about each tooth's status. ToMoBrush extracts tooth resonance signatures from the acoustic signals to characterize varied dental health conditions of the teeth. We evaluate ToMoBrush on 19 participants and dental-standard models for detecting common dental problems including caries, calculus, and food impaction, achieving a detection ROC-AUC of 0.90, 0.83, and 0.88 respectively. Interviews with dental experts validate ToMoBrush's potential in enhancing at-home dental healthcare. △ Less

Submitted 2 February, 2024; originally announced February 2024.

ACM Class: J.3; C.3; H.5.2

arXiv:2401.14109 [pdf, other]

CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks

Authors: Andrei Tomut, Saeed S. Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbinder Singh, Faysal Ishtiaq, Cesar Muñoz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrazin Alizadeh, David Montero, Pablo Martin-Ramiro, Muhammad Ibrahim, Oussama Tahiri Alaoui, John Malcolm, Samuel Mugel, Roman Orus

Abstract: Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods such as pruning, distillation, and low-rank approximation focus on reducing the eff… ▽ More Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods such as pruning, distillation, and low-rank approximation focus on reducing the effective number of neurons in the network, while quantization focuses on reducing the numerical precision of individual weights to reduce the model size while kee** the number of neurons fixed. While these compression methods have been relatively successful in practice, there is no compelling reason to believe that truncating the number of neurons is an optimal strategy. In this context, this paper introduces CompactifAI, an innovative LLM compression approach using quantum-inspired Tensor Networks that focuses on the model's correlation space instead, allowing for a more controlled, refined and interpretable model compression. Our method is versatile and can be implemented with - or on top of - other compression techniques. As a benchmark, we demonstrate that a combination of CompactifAI with quantization allows to reduce a 93% the memory size of LlaMA 7B, reducing also 70% the number of parameters, accelerating 50% the training and 25% the inference times of the model, and just with a small accuracy drop of 2% - 3%, going much beyond of what is achievable today by other compression techniques. Our methods also allow to perform a refined layer sensitivity profiling, showing that deeper layers tend to be more suitable for tensor network compression, which is compatible with recent observations on the ineffectiveness of those layers for LLM performance. Our results imply that standard LLMs are, in fact, heavily overparametrized, and do not need to be large at all. △ Less

Submitted 13 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 5 pages, 4 figures, 2 tables, and supplementary information of 2 pages and 1 figure. Revised version with new benchmarks for LlaMA2-7B

arXiv:2401.13472 [pdf, other]

Segmenting Cardiac Muscle Z-disks with Deep Neural Networks

Authors: Mihaela Croitor Ibrahim, Nishant Ravikumar, Alistair Curd, Joanna Leng, Oliver Umney, Michelle Peckham

Abstract: Z-disks are complex structures that delineate repeating sarcomeres in striated muscle. They play significant roles in cardiomyocytes such as providing mechanical stability for the contracting sarcomere, cell signalling and autophagy. Changes in Z-disk architecture have been associated with impaired cardiac function. Hence, there is a strong need to create tools to segment Z-disks from microscopy i… ▽ More Z-disks are complex structures that delineate repeating sarcomeres in striated muscle. They play significant roles in cardiomyocytes such as providing mechanical stability for the contracting sarcomere, cell signalling and autophagy. Changes in Z-disk architecture have been associated with impaired cardiac function. Hence, there is a strong need to create tools to segment Z-disks from microscopy images, that overcome traditional limitations such as variability in image brightness and staining technique. In this study, we apply deep learning based segmentation models to extract Z-disks in images of striated muscle tissue. We leverage a novel Airyscan confocal dataset, which comprises high resolution images of Z-disks of healthy heart tissue, stained with Affimers for specific Z-disk proteins. We employed an interactive labelling tool, Ilastik to obtain ground truth segmentation masks and use the resulting data set to train and evaluate the performance of several state-of-the-art segmentation networks. On the test set, UNet++ achieves best segmentation performance for Z-disks in cardiomyocytes, with an average Dice score of 0.91 and outperforms other established segmentation methods including UNet, FPN, DeepLabv3+ and pix2pix. However, pix2pix demonstrates improved generalisation, when tested on an additional dataset of cardiomyocytes with a titin mutation. This is the first study to demonstrate that automated machine learning-based segmentation approaches may be used effectively to segment Z-disks in confocal microscopy images. Automated segmentation approaches and predicted segmentation masks could be used to derive morphological features of Z-disks (e.g. width and orientation), and subsequently, to quantify disease-related changes to cardiac microstructure. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.10271 [pdf, other]

Querying Triadic Concepts through Partial or Complete Matching of Triples

Authors: Pedro Henrique B. Ruas, Rokia Missaoui, Mohamed Hamza Ibrahim

Abstract: In this paper, we introduce a new method for querying triadic concepts through partial or complete matching of triples using an inverted index, to retrieve already computed triadic concepts that contain a set of terms in their extent, intent, and/or modus. As opposed to the approximation approach described in Ananias, this method (i) does not need to keep the initial triadic context or its three d… ▽ More In this paper, we introduce a new method for querying triadic concepts through partial or complete matching of triples using an inverted index, to retrieve already computed triadic concepts that contain a set of terms in their extent, intent, and/or modus. As opposed to the approximation approach described in Ananias, this method (i) does not need to keep the initial triadic context or its three dyadic counterparts, (ii) avoids the application of derivation operators on the triple components through context exploration, and (iii) eliminates the requirement for a factorization phase to get triadic concepts as the answer to one-dimensional queries. Additionally, our solution introduces a novel metric for ranking the retrieved triadic concepts based on their similarity to a given query. Lastly, an empirical study is primarily done to illustrate the effectiveness and scalability of our approach against the approximation one. Our solution not only showcases superior efficiency, but also highlights a better scalability, making it suitable for big data scenarios. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.01764 [pdf, other]

Understanding the Detrimental Class-level Effects of Data Augmentation

Authors: Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson

Abstract: Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. The… ▽ More Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes. △ Less

Submitted 7 December, 2023; originally announced January 2024.

Comments: Neural Information Processing Systems (NeurIPS), 2023

arXiv:2312.14421 [pdf, other]

Enhancing Actionable Formal Concept Identification with Base-Equivalent Conceptual-Relevance

Authors: Ayao Bobi, Rokia Missaoui, Mohamed Hamza Ibrahim

Abstract: In knowledge discovery applications, the pattern set generated from data can be tremendously large and hard to explore by analysts. In the Formal Concept Analysis (FCA) framework, there have been studies to identify important formal concepts through the stability index and other quality measures. In this paper, we introduce the Base-Equivalent Conceptual Relevance (BECR) score, a novel conceptual… ▽ More In knowledge discovery applications, the pattern set generated from data can be tremendously large and hard to explore by analysts. In the Formal Concept Analysis (FCA) framework, there have been studies to identify important formal concepts through the stability index and other quality measures. In this paper, we introduce the Base-Equivalent Conceptual Relevance (BECR) score, a novel conceptual relevance interestingness measure for improving the identification of actionable concepts. From a conceptual perspective, the base and equivalent attributes are considered meaningful information and are highly essential to maintain the conceptual structure of concepts. Thus, the basic idea of BECR is that the more base and equivalent attributes and minimal generators a concept intent has, the more relevant it is. As such, BECR quantifies these attributes and minimal generators per concept intent. Our preliminary experiments on synthetic and real-world datasets show the efficiency of BECR compared to the well-known stability index. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.15930 [pdf, other]

WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

Authors: Youssef Benchekroun, Megi Dervishi, Mark Ibrahim, Jean-Baptiste Gaya, Xavier Martinet, Grégoire Mialon, Thomas Scialom, Emmanuel Dupoux, Dieuwke Hupkes, Pascal Vincent

Abstract: We propose WorldSense, a benchmark designed to assess the extent to which LLMs are consistently able to sustain tacit world models, by testing how they draw simple inferences from descriptions of simple arrangements of entities. Worldsense is a synthetic benchmark with three problem types, each with their own trivial control, which explicitly avoids bias by decorrelating the abstract structure of… ▽ More We propose WorldSense, a benchmark designed to assess the extent to which LLMs are consistently able to sustain tacit world models, by testing how they draw simple inferences from descriptions of simple arrangements of entities. Worldsense is a synthetic benchmark with three problem types, each with their own trivial control, which explicitly avoids bias by decorrelating the abstract structure of problems from the vocabulary and expressions, and by decorrelating all problem subparts with the correct response. We run our benchmark on three state-of-the-art chat-LLMs (GPT3.5, GPT4 and Llama2-chat) and show that these models make errors even with as few as three objects. Furthermore, they have quite heavy response biases, preferring certain responses irrespective of the question. Errors persist even with chain-of-thought prompting and in-context learning. Lastly, we show that while finetuning on similar problems does result in substantial improvements -- within- and out-of-distribution -- the finetuned models do not generalise beyond a constraint problem space. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.10025 [pdf, other]

A Novel Neural Network-Based Federated Learning System for Imbalanced and Non-IID Data

Authors: Mahfuzur Rahman Chowdhury, Muhammad Ibrahim

Abstract: With the growth of machine learning techniques, privacy of data of users has become a major concern. Most of the machine learning algorithms rely heavily on large amount of data which may be collected from various sources. Collecting these data yet maintaining privacy policies has become one of the most challenging tasks for the researchers. To combat this issue, researchers have introduced federa… ▽ More With the growth of machine learning techniques, privacy of data of users has become a major concern. Most of the machine learning algorithms rely heavily on large amount of data which may be collected from various sources. Collecting these data yet maintaining privacy policies has become one of the most challenging tasks for the researchers. To combat this issue, researchers have introduced federated learning, where a prediction model is learnt by ensuring the privacy of data of clients data. However, the prevalent federated learning algorithms possess an accuracy and efficiency trade-off, especially for non-IID data. In this research, we propose a centralized, neural network-based federated learning system. The centralized algorithm incorporates micro-level parallel processing inspired by the traditional mini-batch algorithm where the client devices and the server handle the forward and backward propagation respectively. We also devise a semi-centralized version of our proposed algorithm. This algorithm takes advantage of edge computing for minimizing the load from the central server, where clients handle both the forward and backward propagation while sacrificing the overall train time to some extent. We evaluate our proposed systems on five well-known benchmark datasets and achieve satisfactory performance in a reasonable time across various data distribution settings as compared to some existing benchmark algorithms. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 48 pages

arXiv:2311.08815 [pdf, other]

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Authors: Cian Eastwood, Julius von Kügelgen, Linus Ericsson, Diane Bouchacourt, Pascal Vincent, Bernhard Schölkopf, Mark Ibrahim

Abstract: Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style f… ▽ More Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits of our approach on synthetic datasets and then present promising but limited results on ImageNet. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.05716 [pdf, other]

ML-based Real-Time Control at the Edge: An Approach Using hls4ml

Authors: R. Shi, S. Ogrenci, J. M. Arnold, J. R. Berlioz, P. Hanlet, K. J. Hazelwood, M. A. Ibrahim, H. Liu, V. P. Nagaslaev, A. Narayanan 1, D. J. Nicklaus, J. Mitrevski, G. Pradhan, A. L. Saewert, B. A. Schupbach, K. Seiya, M. Thieme, R. M. Thurman-Keup, N. V. Tran

Abstract: This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data… ▽ More This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data from these sensors is monitored by human operators who predict the relative contribution of different sub-systems to the beam loss. Using this information, they engage control interventions. In this paper, we present a controller to track this phenomenon in real-time using edge-Machine Learning (ML) and support control with low latency and high accuracy. We implemented this system on an Intel Arria 10 SoC. Optimizations at the algorithm, high-level synthesis, and interface levels to improve latency and resource usage are presented. Our design implements a neural network, which can predict the main source of beam loss (between two possible causes) at speeds up to 575 frames per second (fps) (average latency of 1.74 ms). The practical deployed system is required to operate at 320 fps, with a 3ms latency requirement, which has been met by our design successfully. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05034 [pdf, other]

Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Authors: Mohamed Assem Ibrahim, Shaizeen Aga, Ada Li, Suchita Pati, Mahzabeen Islam

Abstract: Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-precision during training. Furthermore, with emerging directional data formats (e.g., MX9, MX6, etc.) multiple low-precision weight copies can be requi… ▽ More Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-precision during training. Furthermore, with emerging directional data formats (e.g., MX9, MX6, etc.) multiple low-precision weight copies can be required. To lower memory capacity needs of weights, we explore just-in-time quantization (JIT-Q) where we only store high-precision weights in memory and generate low-precision weights only when needed. To perform JIT-Q efficiently, in this work, we evaluate emerging processing-in-memory (PIM) technology to execute quantization. With PIM, we can offload quantization to in-memory compute units enabling quantization to be performed without incurring costly data movement while allowing quantization to be concurrent with accelerator computation. Our proposed PIM-offloaded quantization keeps up with GPU compute and delivers considerable capacity savings (up to 24\%) at marginal throughput loss (up to 2.4\%). Said memory capacity savings can unlock several benefits such as fitting larger model in the same system, reducing model parallelism requirement, and improving overall ML training efficiency. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.19909 [pdf, other]

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Authors: Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

Abstract: Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performan… ▽ More Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones △ Less

Submitted 19 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.13269 [pdf, other]

An Exploratory Study on Simulated Annealing for Feature Selection in Learning-to-Rank

Authors: Mohd. Sayemul Haque, Md. Fahim, Muhammad Ibrahim

Abstract: Learning-to-rank is an applied domain of supervised machine learning. As feature selection has been found to be effective for improving the accuracy of learning models in general, it is intriguing to investigate this process for learning-to-rank domain. In this study, we investigate the use of a popular meta-heuristic approach called simulated annealing for this task. Under the general framework o… ▽ More Learning-to-rank is an applied domain of supervised machine learning. As feature selection has been found to be effective for improving the accuracy of learning models in general, it is intriguing to investigate this process for learning-to-rank domain. In this study, we investigate the use of a popular meta-heuristic approach called simulated annealing for this task. Under the general framework of simulated annealing, we explore various neighborhood selection strategies and temperature cooling schemes. We further introduce a new hyper-parameter called the progress parameter that can effectively be used to traverse the search space. Our algorithms are evaluated on five publicly benchmark datasets of learning-to-rank. For a better validation, we also compare the simulated annealing-based feature selection algorithm with another effective meta-heuristic algorithm, namely local beam search. Extensive experimental results shows the efficacy of our proposed models. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 29 pages

arXiv:2309.16748 [pdf, other]

Discovering environments with XRM

Authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz

Abstract: Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generaliz… ▽ More Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stop** criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stop**, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.15251 [pdf, other]

VPA: Fully Test-Time Visual Prompt Adaptation

Authors: Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton Ferrer, Caner Hazirbas

Abstract: Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the firs… ▽ More Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation. Experimental results reveal that VPA effectively enhances OOD generalization by 3.3% across various models, surpassing previous test-time approaches. Furthermore, we show that VPA improves corruption robustness by 6.5% compared to strong baselines. Finally, we demonstrate that VPA also boosts domain adaptation performance by relatively 5.2%. Our VPA also exhibits marked effectiveness in improving the robustness of zero-shot recognition for vision-language models. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.07984 [pdf, other]

Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures

Authors: Johnathan Alsop, Shaizeen Aga, Mohamed Ibrahim, Mahzabeen Islam, Andrew Mccrabb, Nuwan Jayasena

Abstract: Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. However, these proposals are largely driven by the needs of (a narrow set of) machine learning (ML) primitives. While such propo… ▽ More Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. However, these proposals are largely driven by the needs of (a narrow set of) machine learning (ML) primitives. While such proposals are reasonable given the the growing importance of ML, as memory is a pervasive component, %in this work, we make there is a case for a more inclusive PIM design that can accelerate primitives across domains. In this work, we ascertain the capabilities of commercial PIM proposals to accelerate various primitives across domains. We first begin with outlining a set of characteristics, termed PIM-amenability-test, which aid in assessing if a given primitive is likely to be accelerated by PIM. Next, we apply this test to primitives under study to ascertain efficient data-placement and orchestration to map the primitives to underlying PIM architecture. We observe here that, even though primitives under study are largely PIM-amenable, existing commercial PIM proposals do not realize their performance potential for these primitives. To address this, we identify bottlenecks that arise in PIM execution and propose hardware and software optimizations which stand to broaden the acceleration reach of commercial PIM designs (improving average PIM speedups from 1.12x to 2.49x relative to a GPU baseline). Overall, while we believe emerging commercial PIM proposals add a necessary and complementary design point in the application acceleration space, hardware-software co-design is necessary to deliver their benefits broadly. △ Less

Submitted 17 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.07610 [pdf, other]

Feature Engineering in Learning-to-Rank for Community Question Answering Task

Authors: Nafis Sajid, Md Rashidul Hasan, Muhammad Ibrahim

Abstract: Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to… ▽ More Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user. In this work, we empirically investigate a few aspects of this domain. Firstly, in addition to traditional features like TF-IDF, BM25 etc., we introduce a BERT-based feature that captures the semantic similarity between the question and answer. Secondly, most of the existing research works have focused on features extracted only from the question part; features extracted from answers have not been explored extensively. We combine both types of features in a linear fashion. Thirdly, using our proposed concepts, we conduct an empirical investigation with different rank-learning algorithms, some of which have not been used so far in CQA domain. On three standard CQA datasets, our proposed framework achieves state-of-the-art performance. We also analyze importance of the features we use in our investigation. This work is expected to guide the practitioners to select a better set of features for the CQA retrieval task. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 20 pages

arXiv:2309.06019 [pdf, other]

DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator

Authors: Muhammad Sohail Ibrahim, Muhammad Usman, Malik Zohaib Nisar, Jeong-A Lee

Abstract: We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-signifi… ▽ More We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-significant-digit-first (MSDF) (also called online) multipliers and adders that processes data from left-to-right, allowing the execution of subsequent operations in digit-pipelined manner. Use of online operators eliminates the need for the development of complex mechanism of identifying the negative activation, as the output with highest weight value is generated first, and the sign of the result can be identified as soon as first non-zero digit is generated. The precision of the online operators can be tuned at run-time, making them extremely useful in situations where accuracy can be compromised for power and energy savings. The proposed design has been implemented on Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various performance metrics. The results show the proposed design presents power savings, has shorter cycle time, and approximately 50% higher OPS per watt. △ Less

Submitted 21 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: Presented at 2023 26th Euromicro Conference on Digital System Design (DSD)

arXiv:2309.00157 [pdf, other]

doi 10.1109/ACCESS.2023.3348270

Information Fusion for Assistance Systems in Production Assessment

Authors: Fernando Arévalo, Christian Alison M. Piolo, M. Tahasanul Ibrahim, Andreas Schwung

Abstract: We propose a novel methodology to define assistance systems that rely on information fusion to combine different sources of information while providing an assessment. The main contribution of this paper is providing a general framework for the fusion of n number of information sources using the evidence theory. The fusion provides a more robust prediction and an associated uncertainty that can be… ▽ More We propose a novel methodology to define assistance systems that rely on information fusion to combine different sources of information while providing an assessment. The main contribution of this paper is providing a general framework for the fusion of n number of information sources using the evidence theory. The fusion provides a more robust prediction and an associated uncertainty that can be used to assess the prediction likeliness. Moreover, we provide a methodology for the information fusion of two primary sources: an ensemble classifier based on machine data and an expert-centered model. We demonstrate the information fusion approach using data from an industrial setup, which rounds up the application part of this research. Furthermore, we address the problem of data drift by proposing a methodology to update the data-based models using an evidence theory approach. We validate the approach using the Benchmark Tennessee Eastman while doing an ablation study of the model update parameters. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: 21 Pages, 10 Figures

arXiv:2308.15395 [pdf, other]

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

Authors: Mathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani, Pascal Notin, Artemy Bakulin, Dariusz Brzezinski, Kaiwen Deng, Yuanfang Guan, Justin Hong, Michael Ibrahim, Wojciech Kotlowski, Marcin Kowiel, Panagiotis Misiakos, Achille Nazaret, Markus Püschel, Chris Wendler, Arash Mehrjou, Patrick Schwab

Abstract: In drug discovery, map** interactions between genes within cellular systems is a crucial early step. This helps formulate hypotheses regarding molecular mechanisms that could potentially be targeted by future medicines. The CausalBench Challenge was an initiative to invite the machine learning community to advance the state of the art in constructing gene-gene interaction networks. These network… ▽ More In drug discovery, map** interactions between genes within cellular systems is a crucial early step. This helps formulate hypotheses regarding molecular mechanisms that could potentially be targeted by future medicines. The CausalBench Challenge was an initiative to invite the machine learning community to advance the state of the art in constructing gene-gene interaction networks. These networks, derived from large-scale, real-world datasets of single cells under various perturbations, are crucial for understanding the causal mechanisms underlying disease biology. Using the framework provided by the CausalBench benchmark, participants were tasked with enhancing the capacity of the state of the art methods to leverage large-scale genetic perturbation data. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. The winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.13276 [pdf, ps, other]

Knowledge-Based Version Incompatibility Detection for Deep Learning

Authors: Zhongkai Zhao, Bonan Kou, Mohamed Yilmaz Ibrahim, Muhao Chen, Tianyi Zhang

Abstract: Version incompatibility issues are rampant when reusing or reproducing deep learning models and applications. Existing techniques are limited to library dependency specifications declared in PyPI. Therefore, these techniques cannot detect version issues due to undocumented version constraints or issues involving hardware drivers or OS. To address this challenge, we propose to leverage the abundant… ▽ More Version incompatibility issues are rampant when reusing or reproducing deep learning models and applications. Existing techniques are limited to library dependency specifications declared in PyPI. Therefore, these techniques cannot detect version issues due to undocumented version constraints or issues involving hardware drivers or OS. To address this challenge, we propose to leverage the abundant discussions of DL version issues from Stack Overflow to facilitate version incompatibility detection. We reformulate the problem of knowledge extraction as a Question-Answering (QA) problem and use a pre-trained QA model to extract version compatibility knowledge from online discussions. The extracted knowledge is further consolidated into a weighted knowledge graph to detect potential version incompatibilities when reusing a DL project. Our evaluation results show that (1) our approach can accurately extract version knowledge with 84% accuracy, and (2) our approach can accurately identify 65% of known version issues in 10 popular DL projects with a high precision (92%), while two state-of-the-art approaches can only detect 29% and 6% of these issues with 33% and 17% precision respectively. △ Less

Submitted 28 August, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: 12 pages, FSE 2023

arXiv:2308.12840 [pdf, other]

doi 10.1371/journal.pone.0288670

FaceTouch: Detecting hand-to-face touch with supervised contrastive learning to assist in tracing infectious disease

Authors: Mohamed R. Ibrahim, Terry Lyons

Abstract: Through our respiratory system, many viruses and diseases frequently spread and pass from one person to another. Covid-19 served as an example of how crucial it is to track down and cut back on contacts to stop its spread. There is a clear gap in finding automatic methods that can detect hand-to-face contact in complex urban scenes or indoors. In this paper, we introduce a computer vision framewor… ▽ More Through our respiratory system, many viruses and diseases frequently spread and pass from one person to another. Covid-19 served as an example of how crucial it is to track down and cut back on contacts to stop its spread. There is a clear gap in finding automatic methods that can detect hand-to-face contact in complex urban scenes or indoors. In this paper, we introduce a computer vision framework, called FaceTouch, based on deep learning. It comprises deep sub-models to detect humans and analyse their actions. FaceTouch seeks to detect hand-to-face touches in the wild, such as through video chats, bus footage, or CCTV feeds. Despite partial occlusion of faces, the introduced system learns to detect face touches from the RGB representation of a given scene by utilising the representation of the body gestures such as arm movement. This has been demonstrated to be useful in complex urban scenarios beyond simply identifying hand movement and its closeness to faces. Relying on Supervised Contrastive Learning, the introduced model is trained on our collected dataset, given the absence of other benchmark datasets. The framework shows a strong validation in unseen datasets which opens the door for potential deployment. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Set to be published in the PLoS ONE Journal

arXiv:2308.03977 [pdf, other]

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

Authors: Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos

Abstract: Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite… ▽ More Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models. △ Less

Submitted 12 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.03973 [pdf, other]

Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures

Authors: Mohamed Assem Ibrahim, Shaizeen Aga

Abstract: This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT.… ▽ More This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT. To this end, we first deduce a map** of FFT computation to a strawman PIM architecture representative of recent commercial designs. We observe that even with careful data map**, PIM is not effective in accelerating FFT. To address this, we make a case for collaborative acceleration of FFT with PIM and GPU. Further, we propose software and hardware innovations which lower PIM operations necessary for a given FFT. Overall, our optimized PIM FFT map**, termed Pimacolaba, delivers performance and data movement savings of up to 1.38$\times$ and 2.76$\times$, respectively, over a range of FFT sizes. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.13136 [pdf, other]

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

Authors: Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim

Abstract: For more than a decade, researchers have measured progress in object recognition on ImageNet-based generalization benchmarks such as ImageNet-A, -C, and -R. Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate these standard benchmarks, but remain brittle in practice. This suggests standard benchmarks, which tend to focus on predefined or synthetic… ▽ More For more than a decade, researchers have measured progress in object recognition on ImageNet-based generalization benchmarks such as ImageNet-A, -C, and -R. Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate these standard benchmarks, but remain brittle in practice. This suggests standard benchmarks, which tend to focus on predefined or synthetic changes, may not be sufficient for measuring real world generalization. Consequently, we propose studying generalization across geography as a more realistic measure of progress using two datasets of objects from households across the globe. We conduct an extensive empirical evaluation of progress across nearly 100 vision models up to most recent foundation models. We first identify a progress gap between standard benchmarks and real-world, geographical shifts: progress on ImageNet results in up to 2.5x more progress on standard generalization benchmarks than real-world distribution shifts. Second, we study model generalization across geographies by measuring the disparities in performance across regions, a more fine-grained measure of real world generalization. We observe all models have large geographic disparities, even foundation CLIP models, with differences of 7-20% in accuracy between regions. Counter to modern intuition, we discover progress on standard benchmarks fails to improve geographic disparities and often exacerbates them: geographic disparities between the least performant models and today's best models have more than tripled. Our results suggest scaling alone is insufficient for consistent robustness to real-world distribution shifts. Finally, we highlight in early experiments how simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work, reducing geographic disparity on both benchmarks by over two-thirds. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.00741 [pdf, other]

UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Authors: Muhammad Ibrahim, Naveed Akhtar, Saeed Anwar, Ajmal Mian

Abstract: Localization is a fundamental task in robotics for autonomous navigation. Existing localization methods rely on a single input data modality or train several computational models to process different modalities. This leads to stringent computational requirements and sub-optimal results that fail to capitalize on the complementary information in other data streams. This paper proposes UnLoc, a nove… ▽ More Localization is a fundamental task in robotics for autonomous navigation. Existing localization methods rely on a single input data modality or train several computational models to process different modalities. This leads to stringent computational requirements and sub-optimal results that fail to capitalize on the complementary information in other data streams. This paper proposes UnLoc, a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our multi-stream network can handle LiDAR, Camera and RADAR inputs for localization on demand, i.e., it can work with one or more input sensors, making it robust to sensor failure. UnLoc uses 3D sparse convolutions and cylindrical partitioning of the space to process LiDAR frames and implements ResNet blocks with a slot attention-based feature filtering module for the Radar and image modalities. We introduce a unique learnable modality encoding scheme to distinguish between the input sensor data. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets. The results ascertain the efficacy of our technique. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input has been accepted for publication in the Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2305.14362 [pdf, ps, other]

doi 10.1080/25765299.2023.2204672

On the Eight Levels theorem and applications towards Lucas-Lehmer primality test for Mersenne primes, I

Authors: Moustafa Ibrahim

Abstract: Lucas-Lehmer test is the current standard algorithm used for testing the primality of Mersenne numbers, but it may have limitations in terms of its efficiency and accuracy. Develo** new algorithms or improving upon existing ones could potentially improve the search for Mersenne primes and the understanding of the distribution of Mersenne primes and composites. The development of new versions of… ▽ More Lucas-Lehmer test is the current standard algorithm used for testing the primality of Mersenne numbers, but it may have limitations in terms of its efficiency and accuracy. Develo** new algorithms or improving upon existing ones could potentially improve the search for Mersenne primes and the understanding of the distribution of Mersenne primes and composites. The development of new versions of the primality test for Mersenne numbers could help to speed up the search for new Mersenne primes by improving the efficiency of the algorithm. This could potentially lead to the discovery of new Mersenne primes that were previously beyond the reach of current computational resources. The current paper proves what the author called the Eight Levels Theorem and then highlights and proves three new different versions for Lucas-Lehmer primality test for Mersenne primes and also gives a new criterion for Mersenne compositeness. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 33 pages, 4 tables, 3 new versions for the Lucas-Lehmer primality test, 4 new combinatorial identities. arXiv admin note: substantial text overlap with arXiv:2108.13792

MSC Class: 11Y16; 11Y55; 11Y11; 11A51; 11B37; 11B83; 11B75; 11B39; 11B37 ACM Class: G.2.1; K.2; K.7.3; J.7; G.1.0; G.4

Journal ref: Arab Journal of Basic and Applied Sciences, 30:1, 267-284

arXiv:2304.12210 [pdf, other]

A Cookbook of Self-Supervised Learning

Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Gei**, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be. △ Less

Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.05391 [pdf, other]

Pinpointing Why Object Recognition Performance Degrades Across Income Levels and Geographies

Authors: Laura Gustafson, Megan Richards, Melissa Hall, Caner Hazirbas, Diane Bouchacourt, Mark Ibrahim

Abstract: Despite impressive advances in object-recognition, deep learning systems' performance degrades significantly across geographies and lower income levels raising pressing concerns of inequity. Addressing such performance gaps remains a challenge, as little is understood about why performance degrades across incomes or geographies. We take a step in this direction by annotating images from Dollar Str… ▽ More Despite impressive advances in object-recognition, deep learning systems' performance degrades significantly across geographies and lower income levels raising pressing concerns of inequity. Addressing such performance gaps remains a challenge, as little is understood about why performance degrades across incomes or geographies. We take a step in this direction by annotating images from Dollar Street, a popular benchmark of geographically and economically diverse images, labeling each image with factors such as color, shape, and background. These annotations unlock a new granular view into how objects differ across incomes and regions. We then use these object differences to pinpoint model vulnerabilities across incomes and regions. We study a range of modern vision models, finding that performance disparities are most associated with differences in texture, occlusion, and images with darker lighting. We illustrate how insights from our factor labels can surface mitigations to improve models' performance disparities. As an example, we show that mitigating a model's vulnerability to texture can improve performance on the lower income level. We release all the factor annotations along with an interactive dashboard to facilitate research into more equitable vision systems. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2304.04314 [pdf, ps, other]

RIS-aided Mixed RF-FSO Wireless Networks: Secrecy Performance Analysis with Simultaneous Eavesdrop**

Authors: Md. Mijanur Rahman, A. S. M. Badrudduza, Noor Ahmad Sarker, Md. Ibrahim, Imran Shafique Ansari

Abstract: The appearance of sixth-generation networks has resulted in the proposal of several solutions to tackle signal loss. One of these solutions is the utilization of reconfigurable intelligent surfaces (RIS), which can reflect or refract signals as required. This integration offers significant potential to improve the coverage area from the sender to the receiver. In this paper, we present a comprehen… ▽ More The appearance of sixth-generation networks has resulted in the proposal of several solutions to tackle signal loss. One of these solutions is the utilization of reconfigurable intelligent surfaces (RIS), which can reflect or refract signals as required. This integration offers significant potential to improve the coverage area from the sender to the receiver. In this paper, we present a comprehensive framework for analyzing the secrecy performance of a RIS-aided mixed radio frequency (RF)-free space optics (FSO) system, for the first time. Our study assumes that a secure message is transmitted from a RF transmitter to a FSO receiver through an intermediate relay. The RF link experiences Rician fading while the FSO link experiences Málaga distributed turbulence with pointing errors. We examine three scenarios: 1) RF-link eavesdrop**, 2) FSO-link eavesdrop**, and 3) a simultaneous eavesdrop** attack on both RF and FSO links. We evaluate the secrecy performance using analytical expressions to compute secrecy metrics such as the average secrecy capacity, secrecy outage probability, strictly positive secrecy capacity, effective secrecy throughput, and intercept probability. Our results are confirmed via Monte-Carlo simulations and demonstrate that fading parameters, atmospheric turbulence conditions, pointing errors, and detection techniques play a crucial role in enhancing secrecy performance. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: No comments

arXiv:2303.09063 [pdf, other]

Plant Disease Detection using Region-Based Convolutional Neural Network

Authors: Hasin Rehana, Muhammad Ibrahim, Md. Haider Ali

Abstract: Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases an… ▽ More Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases and boost the yield. Most of the farmers use generalized pesticides and fertilizers in the entire fields without specifically knowing the condition of the plants. Thus the production cost oftentimes increases, and, not only that, sometimes this becomes detrimental to the yield. Deep Learning models are found to be very effective to automatically detect plant diseases from images of plants, thereby reducing the need for human specialists. This paper aims at building a lightweight deep learning model for predicting leaf disease in tomato plants. By modifying the region-based convolutional neural network, we design an efficient and effective model that demonstrates satisfactory empirical performance on a benchmark dataset. Our proposed model can easily be deployed in a larger system where drones take images of leaves and these images will be fed into our model to know the health condition. △ Less

Submitted 12 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2303.00261 [pdf, other]

Speeding Up EfficientNet: Selecting Update Blocks of Convolutional Neural Networks using Genetic Algorithm in Transfer Learning

Authors: Md. Mehedi Hasana, Muhammad Ibrahim, Md. Sawkat Ali

Abstract: The performance of convolutional neural networks (CNN) depends heavily on their architectures. Transfer learning performance of a CNN relies quite strongly on selection of its trainable layers. Selecting the most effective update layers for a certain target dataset often requires expert knowledge on CNN architecture which many practitioners do not posses. General users prefer to use an available a… ▽ More The performance of convolutional neural networks (CNN) depends heavily on their architectures. Transfer learning performance of a CNN relies quite strongly on selection of its trainable layers. Selecting the most effective update layers for a certain target dataset often requires expert knowledge on CNN architecture which many practitioners do not posses. General users prefer to use an available architecture (e.g. GoogleNet, ResNet, EfficientNet etc.) that is developed by domain experts. With the ever-growing number of layers, it is increasingly becoming quite difficult and cumbersome to handpick the update layers. Therefore, in this paper we explore the application of genetic algorithm to mitigate this problem. The convolutional layers of popular pretrained networks are often grouped into modules that constitute their building blocks. We devise a genetic algorithm to select blocks of layers for updating the parameters. By experimenting with EfficientNetB0 pre-trained on ImageNet and using Food-101, CIFAR-100 and MangoLeafBD as target datasets, we show that our algorithm yields similar or better results than the baseline in terms of accuracy, and requires lower training and evaluation time due to learning less number of parameters. We also devise a metric called block importance to measure efficacy of each block as update block and analyze the importance of the blocks selected by our algorithm. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 9 pages

arXiv:2302.10876 [pdf, ps, other]

Effects of Co-channel Interference on RIS Empowered Wireless Networks amid Multiple Eavesdrop** Attempts

Authors: Md. Roisul Ajom Ruku, Md. Ibrahim, A. S. M. Badrudduza, Imran Shafique Ansari

Abstract: This letter is concerned with the secrecy performance of reconfigurable intelligent surfaces (RIS)-aided wireless networks in the existence of multiple interferers towards the destination. To be more precise, we analyze three critical issues in the design of secure RIS-assisted networks: 1) How do interferers affect the performance of secure wireless networks? 2) Which of the two groups of eavesdr… ▽ More This letter is concerned with the secrecy performance of reconfigurable intelligent surfaces (RIS)-aided wireless networks in the existence of multiple interferers towards the destination. To be more precise, we analyze three critical issues in the design of secure RIS-assisted networks: 1) How do interferers affect the performance of secure wireless networks? 2) Which of the two groups of eavesdroppers (i.e., colluding and non-colluding) is more severe? 3) How can RIS improve network confidentiality? To do so, we develop the analytical expression of secrecy outage probability in closed-form, along with asymptotic analysis at high signal-to-noise ratio regime to better understand the impacts of different system parameters on secrecy performance. Finally, we validate our analytical results using a computer based Monte-Carlo simulation. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: No

arXiv:2302.10257 [pdf, other]

Secrecy Outage Analysis of Energy Harvesting Relay-based Mixed UOWC-RF Network with Multiple Eavesdroppers

Authors: Moloy Kumar Ghosh, Milton Kumar Kundu, Md Ibrahim, A. S. M. Badrudduza, Md. Shamim Anower, Imran Shafique Ansari, Ali A. Shaikhi, Mohammed A. Mohandes

Abstract: This work deals with the physical layer security performance of a dual-hop underwater optical communication (UOWC)-radio frequency (RF) network under the intruding attempts of multiple eavesdroppers via RF links. The intermediate decode and forward relay node between the underwater source and the destination transforms the optical signal into electrical form and re-transmits it to the destination… ▽ More This work deals with the physical layer security performance of a dual-hop underwater optical communication (UOWC)-radio frequency (RF) network under the intruding attempts of multiple eavesdroppers via RF links. The intermediate decode and forward relay node between the underwater source and the destination transforms the optical signal into electrical form and re-transmits it to the destination node with the help of harvested energy by the relay from an integrated power beacon within the system. The source-to-relay link (UOWC) follows a mixture exponential generalized Gamma turbulence with pointing error impairments whereas all the remaining links (RF) undergo $κ-μ$ shadowed fading. With regards to the types of intruders, herein two scenarios are considered, i.e., colluding (\textit{Scenario-I}) and non-colluding (\textit{Scenario-II}) eavesdroppers and the analytical expressions of secure outage probability, probability of strictly positive secrecy capacity, and effective secrecy throughput are derived in closed form for each scenario. Furthermore, the impacts of UOWC and RF channel parameters as well as detection techniques on secrecy capacity are demonstrated, and following this a comparison between the two considered scenarios is demonstrated that reveals the collusion between the eavesdroppers imposes the most harmful threat on secrecy throughput but a better secrecy level can be attained adopting diversity at the destination and power beacon nodes along with heterodyne detection rather than intensity modulation and direct detection technique. Finally, all the derived expressions are corroborated via Monte Carlo simulations. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: No

arXiv:2302.02191 [pdf, ps, other]

Unsupervised Learning for Pilot-free Transmission in 3GPP MIMO Systems

Authors: Omar M. Sleem, Mohamed Salah Ibrahim, Akshay Malhotra, Mihaela Beluri, Philip Pietraski

Abstract: Reference signals overhead reduction has recently evolved as an effective solution for improving the system spectral efficiency. This paper introduces a new downlink data structure that is free from demodulation reference signals (DM-RS), and hence does not require any channel estimation at the receiver. The new proposed data transmission structure involves a simple repetition step of part of the… ▽ More Reference signals overhead reduction has recently evolved as an effective solution for improving the system spectral efficiency. This paper introduces a new downlink data structure that is free from demodulation reference signals (DM-RS), and hence does not require any channel estimation at the receiver. The new proposed data transmission structure involves a simple repetition step of part of the user data across the different sub-bands. Exploiting the repetition structure at the user side, it is shown that reliable recovery is possible via canonical correlation analysis. This paper also proposes two effective mechanisms for boosting the CCA performance in OFDM systems; one for repetition pattern selection and another to deal with the severe frequency selectivity issues. The proposed approach exhibits favorable complexity-performance tradeoff, rendering it appealing for practical implementation. Numerical results, using a 3GPP link-level testbench, demonstrate the superiority of the proposed approach relative to the state-of-the-art methods. △ Less

Submitted 4 February, 2023; originally announced February 2023.

arXiv:2301.08957 [pdf, other]

Slice Transformer and Self-supervised Learning for 6DoF Localization in 3D Point Cloud Maps

Authors: Muhammad Ibrahim, Naveed Akhtar, Saeed Anwar, Michael Wise, Ajmal Mian

Abstract: Precise localization is critical for autonomous vehicles. We present a self-supervised learning method that employs Transformers for the first time for the task of outdoor localization using LiDAR data. We propose a pre-text task that reorganizes the slices of a $360^\circ$ LiDAR scan to leverage its axial properties. Our model, called Slice Transformer, employs multi-head attention while systemat… ▽ More Precise localization is critical for autonomous vehicles. We present a self-supervised learning method that employs Transformers for the first time for the task of outdoor localization using LiDAR data. We propose a pre-text task that reorganizes the slices of a $360^\circ$ LiDAR scan to leverage its axial properties. Our model, called Slice Transformer, employs multi-head attention while systematically processing the slices. To the best of our knowledge, this is the first instance of leveraging multi-head attention for outdoor point clouds. We additionally introduce the Perth-WA dataset, which provides a large-scale LiDAR map of Perth city in Western Australia, covering $\sim$4km$^2$ area. Localization annotations are provided for Perth-WA. The proposed localization method is thoroughly evaluated on Perth-WA and Appollo-SouthBay datasets. We also establish the efficacy of our self-supervised learning approach for the common downstream task of object classification using ModelNet40 and ScanNN datasets. The code and Perth-WA data will be publicly released. △ Less

Submitted 13 August, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

Comments: Accepted in IEEE International Conference on Robotics and Automation (ICRA), 2023

arXiv:2212.10567 [pdf, other]

Anticancer Peptides Classification using Kernel Sparse Representation Classifier

Authors: Ehtisham Fazal, Muhammad Sohail Ibrahim, Seongyong Park, Imran Naseem, Abdul Wahab

Abstract: Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and s… ▽ More Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and develo** new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}). △ Less

Submitted 19 December, 2022; originally announced December 2022.

Showing 1–50 of 131 results for author: Ibrahim, M