-
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Authors:
Hasan Abed Al Kader Hammoud,
Umberto Michieli,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem,
Mete Ozay
Abstract:
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popu…
▽ More
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Robotic in-hand manipulation with relaxed optimization
Authors:
Ali Hammoud,
Valerio Belcamino,
Quentin Huet,
Alessandro Carfì,
Mahdi Khoramshahi,
Veronique Perdereau,
Fulvio Mastrogiovanni
Abstract:
Dexterous in-hand manipulation is a unique and valuable human skill requiring sophisticated sensorimotor interaction with the environment while respecting stability constraints. Satisfying these constraints with generated motions is essential for a robotic platform to achieve reliable in-hand manipulation skills. Explicitly modelling these constraints can be challenging, but they can be implicitly…
▽ More
Dexterous in-hand manipulation is a unique and valuable human skill requiring sophisticated sensorimotor interaction with the environment while respecting stability constraints. Satisfying these constraints with generated motions is essential for a robotic platform to achieve reliable in-hand manipulation skills. Explicitly modelling these constraints can be challenging, but they can be implicitly modelled and learned through experience or human demonstrations. We propose a learning and control approach based on dictionaries of motion primitives generated from human demonstrations. To achieve this, we defined an optimization process that combines motion primitives to generate robot fingertip trajectories for moving an object from an initial to a desired final pose. Based on our experiments, our approach allows a robotic hand to handle objects like humans, adhering to stability constraints without requiring explicit formalization. In other words, the proposed motion primitive dictionaries learn and implicitly embed the constraints crucial to the in-hand manipulation task.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Federated Learning and Evolutionary Game Model for Fog Federation Formation
Authors:
Zyad Yasser,
Ahmad Hammoud,
Azzam Mourad,
Hadi Otrok,
Zbigniew Dziong,
Mohsen Guizani
Abstract:
In this paper, we tackle the network delays in the Internet of Things (IoT) for an enhanced QoS through a stable and optimized federated fog computing infrastructure. Network delays contribute to a decline in the Quality-of-Service (QoS) for IoT applications and may even disrupt time-critical functions. Our paper addresses the challenge of establishing fog federations, which are designed to enhanc…
▽ More
In this paper, we tackle the network delays in the Internet of Things (IoT) for an enhanced QoS through a stable and optimized federated fog computing infrastructure. Network delays contribute to a decline in the Quality-of-Service (QoS) for IoT applications and may even disrupt time-critical functions. Our paper addresses the challenge of establishing fog federations, which are designed to enhance QoS. However, instabilities within these federations can lead to the withdrawal of providers, thereby diminishing federation profitability and expected QoS. Additionally, the techniques used to form federations could potentially pose data leakage risks to end-users whose data is involved in the process. In response, we propose a stable and comprehensive federated fog architecture that considers federated network profiling of the environment to enhance the QoS for IoT applications. This paper introduces a decentralized evolutionary game theoretic algorithm built on top of a Genetic Algorithm mechanism that addresses the fog federation formation issue. Furthermore, we present a decentralized federated learning algorithm that predicts the QoS between fog servers without the need to expose users' location to external entities. Such a predictor module enhances the decision-making process when allocating resources during the federation formation phases without exposing the data privacy of the users/servers. Notably, our approach demonstrates superior stability and improved QoS when compared to other benchmark approaches.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
On Pretraining Data Diversity for Self-Supervised Learning
Authors:
Hasan Abed Al Kader Hammoud,
Tuhin Das,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem
Abstract:
We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even wit…
▽ More
We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days. Code and trained models will be available at https://github.com/hammoudhasan/DiversitySSL .
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Authors:
Hasan Abed Al Kader Hammoud,
Hani Itani,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem
Abstract:
We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data. Leveraging recent text-to-image (TTI) generative networks and large language models (LLM), we are able to generate synthetic datasets of images and corresponding captions at any scale, with no human intervention. With trainin…
▽ More
We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data. Leveraging recent text-to-image (TTI) generative networks and large language models (LLM), we are able to generate synthetic datasets of images and corresponding captions at any scale, with no human intervention. With training at scale, SynthCLIP achieves performance comparable to CLIP models trained on real datasets. We also introduce SynthCI-30M, a purely synthetic dataset comprising 30 million captioned images. Our code, trained models, and generated data are released at https://github.com/hammoudhasan/SynthCLIP
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning
Authors:
Mohamad Abed El Rahman Hammoud,
Naila Raboudi,
Edriss S. Titi,
Omar Knio,
Ibrahim Hoteit
Abstract:
Data assimilation (DA) plays a pivotal role in diverse applications, ranging from climate predictions and weather forecasts to trajectory planning for autonomous vehicles. A prime example is the widely used ensemble Kalman filter (EnKF), which relies on linear updates to minimize variance among the ensemble of forecast states. Recent advancements have seen the emergence of deep learning approaches…
▽ More
Data assimilation (DA) plays a pivotal role in diverse applications, ranging from climate predictions and weather forecasts to trajectory planning for autonomous vehicles. A prime example is the widely used ensemble Kalman filter (EnKF), which relies on linear updates to minimize variance among the ensemble of forecast states. Recent advancements have seen the emergence of deep learning approaches in this domain, primarily within a supervised learning framework. However, the adaptability of such models to untrained scenarios remains a challenge. In this study, we introduce a novel DA strategy that utilizes reinforcement learning (RL) to apply state corrections using full or partial observations of the state variables. Our investigation focuses on demonstrating this approach to the chaotic Lorenz '63 system, where the agent's objective is to minimize the root-mean-squared error between the observations and corresponding forecast states. Consequently, the agent develops a correction strategy, enhancing model forecasts based on available system state observations. Our strategy employs a stochastic action policy, enabling a Monte Carlo-based DA framework that relies on randomly sampling the policy to generate an ensemble of assimilated realizations. Results demonstrate that the developed RL algorithm performs favorably when compared to the EnKF. Additionally, we illustrate the agent's capability to assimilate non-Gaussian data, addressing a significant limitation of the EnKF.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web
Authors:
Ameya Prabhu,
Hasan Abed Al Kader Hammoud,
Ser-Nam Lim,
Bernard Ghanem,
Philip H. S. Torr,
Adel Bibi
Abstract:
Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annot…
▽ More
Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Downscaling Using CDAnet Under Observational and Model Noises: The Rayleigh-Benard Convection Paradigm
Authors:
Mohamad Abed El Rahman Hammoud,
Edriss S. Titi,
Ibrahim Hoteit,
Omar Knio
Abstract:
Efficient downscaling of large ensembles of coarse-scale information is crucial in several applications, such as oceanic and atmospheric modeling. The determining form map is a theoretical lifting function from the low-resolution solution trajectories of a dissipative dynamical system to their corresponding fine-scale counterparts. Recently, a physics-informed deep neural network ("CDAnet") was in…
▽ More
Efficient downscaling of large ensembles of coarse-scale information is crucial in several applications, such as oceanic and atmospheric modeling. The determining form map is a theoretical lifting function from the low-resolution solution trajectories of a dissipative dynamical system to their corresponding fine-scale counterparts. Recently, a physics-informed deep neural network ("CDAnet") was introduced, providing a surrogate of the determining form map for efficient downscaling. CDAnet was demonstrated to efficiently downscale noise-free coarse-scale data in a deterministic setting. Herein, the performance of well-trained CDAnet models is analyzed in a stochastic setting involving (i) observational noise, (ii) model noise, and (iii) a combination of observational and model noises. The analysis is performed employing the Rayleigh-Benard convection paradigm, under three training conditions, namely, training with perfect, noisy, or downscaled data. Furthermore, the effects of noises, Rayleigh number, and spatial and temporal resolutions of the input coarse-scale information on the downscaled fields are examined. The results suggest that the expected l2-error of CDAnet behaves quadratically in terms of the standard deviations of the observational and model noises. The results also suggest that CDAnet responds to uncertainties similar to the theorized and numerically-validated CDA behavior with an additional error overhead due to CDAnet being a surrogate model of the determining form map.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Oil Spill Risk Analysis For The NEOM Shoreline
Authors:
HVR Mittal,
Mohamad Abed El Rahman Hammoud,
Ana K. Carrasco,
Ibrahim Hoteit,
Omar Knio
Abstract:
A risk analysis is conducted considering several release sources located around the NEOM shoreline. The sources are selected close to the coast and in neighboring regions of high marine traffic. The evolution of oil spills released by these sources is simulated using the MOHID model, driven by validated, high-resolution met-ocean fields of the Red Sea. For each source, simulations are conducted ov…
▽ More
A risk analysis is conducted considering several release sources located around the NEOM shoreline. The sources are selected close to the coast and in neighboring regions of high marine traffic. The evolution of oil spills released by these sources is simulated using the MOHID model, driven by validated, high-resolution met-ocean fields of the Red Sea. For each source, simulations are conducted over a 4-week period, starting from first, tenth and twentieth days of each month, covering five consecutive years. A total of 48 simulations are thus conducted for each source location, adequately reflecting the variability of met-ocean conditions in the region. The risk associated with each source is described in terms of amount of oil beached, and by the elapsed time required for the spilled oil to reach the NEOM coast, extending from the Gulf of Aqaba in the North to Duba in the South. A finer analysis is performed by segmenting the NEOM shoreline, based on important coastal development and installation sites. For each subregion, source and release event considered, a histogram of the amount of volume beached is generated, also classifying individual events in terms of the corresponding arrival times. In addition, for each subregion considered, an inverse analysis is conducted to identify regions of dependence of the cumulative risk, estimated using the collection of all sources and events considered. The transport of oil around the NEOM shorelines is promoted by chaotic circulations and northwest winds in summer, and a dominant cyclonic eddy in winter. Hence, spills originating from release sources located close to the NEOM shorelines are characterized by large monthly variations in arrival times, ranging from less than a week to more than two weeks. Large variations in the volume fraction of beached oil, ranging from less then 50\% to more than 80% are reported.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
In-hand manipulation planning using human motion dictionary
Authors:
Ali Hammoud,
Valerio Belcamino,
Alessandro Carfi,
Veronique Perdereau,
Fulvio Mastrogiovanni
Abstract:
Dexterous in-hand manipulation is a peculiar and useful human skill. This ability requires the coordination of many senses and hand motion to adhere to many constraints. These constraints vary and can be influenced by the object characteristics or the specific application. One of the key elements for a robotic platform to implement reliable inhand manipulation skills is to be able to integrate tho…
▽ More
Dexterous in-hand manipulation is a peculiar and useful human skill. This ability requires the coordination of many senses and hand motion to adhere to many constraints. These constraints vary and can be influenced by the object characteristics or the specific application. One of the key elements for a robotic platform to implement reliable inhand manipulation skills is to be able to integrate those constraints in their motion generations. These constraints can be implicitly modelled, learned through experience or human demonstrations. We propose a method based on motion primitives dictionaries to learn and reproduce in-hand manipulation skills. In particular, we focused on fingertip motions during the manipulation, and we defined an optimization process to combine motion primitives to reach specific fingertip configurations. The results of this work show that the proposed approach can generate manipulation motion coherent with the human one and that manipulation constraints are inherited even without an explicit formalization.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Mindstorms in Natural Language-Based Societies of Mind
Authors:
Mingchen Zhuge,
Haozhe Liu,
Francesco Faccio,
Dylan R. Ashley,
Róbert Csordás,
Anand Gopalakrishnan,
Abdullah Hamdi,
Hasan Abed Al Kader Hammoud,
Vincent Herrmann,
Kazuki Irie,
Louis Kirsch,
Bing Li,
Guohao Li,
Shuming Liu,
**jie Mai,
Piotr Piękos,
Aditya Ramesh,
Imanol Schlag,
Weimin Shi,
Aleksandar Stanić,
Wenyi Wang,
Yuhui Wang,
Mengmeng Xu,
Deng-** Fan,
Bernard Ghanem
, et al. (1 additional authors not shown)
Abstract:
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco…
▽ More
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right?
Authors:
Hasan Abed Al Kader Hammoud,
Ameya Prabhu,
Ser-Nam Lim,
Philip H. S. Torr,
Adel Bibi,
Bernard Ghanem
Abstract:
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy, which measures the accuracy of the model on the immediate next few samples. However, we show that this metric is unreliable, as even vacuous blind classifiers, which do not use input images for prediction, can achieve unrealistically high online accuracy by e…
▽ More
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy, which measures the accuracy of the model on the immediate next few samples. However, we show that this metric is unreliable, as even vacuous blind classifiers, which do not use input images for prediction, can achieve unrealistically high online accuracy by exploiting spurious label correlations in the data stream. Our study reveals that existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information, suggesting that they unintentionally learn spurious label correlations. To address this issue, we propose a novel metric for measuring adaptation based on the accuracy on the near-future samples, where spurious correlations are removed. We benchmark existing OCL approaches using our proposed metric on large-scale datasets under various computational budgets and find that better generalization can be achieved by retaining and reusing past seen information. We believe that our proposed metric can aid in the development of truly adaptive OCL methods. We provide code to reproduce our results at https://github.com/drimpossible/EvalOCL.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
Authors:
Hani Sami,
Ahmad Hammoud,
Mouhamad Arafeh,
Mohamad Wazzeh,
Sarhad Arisdakessian,
Mario Chahoud,
Osama Wehbi,
Mohamad Ajaj,
Azzam Mourad,
Hadi Otrok,
Omar Abdel Wahab,
Rabeb Mizouni,
Jamal Bentahar,
Chamseddine Talhi,
Zbigniew Dziong,
Ernesto Damiani,
Mohsen Guizani
Abstract:
The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, develo** the…
▽ More
The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, develo** the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Authors:
Guohao Li,
Hasan Abed Al Kader Hammoud,
Hani Itani,
Dmitrii Khizbullin,
Bernard Ghanem
Abstract:
The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cog…
▽ More
The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github.com/camel-ai/camel.
△ Less
Submitted 2 November, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs
Authors:
Hasan Abed Al Kader Hammoud,
Adel Bibi,
Philip H. S. Torr,
Bernard Ghanem
Abstract:
In this paper we investigate the frequency sensitivity of Deep Neural Networks (DNNs) when presented with clean samples versus poisoned samples. Our analysis shows significant disparities in frequency sensitivity between these two types of samples. Building on these findings, we propose FREAK, a frequency-based poisoned sample detection algorithm that is simple yet effective. Our experimental resu…
▽ More
In this paper we investigate the frequency sensitivity of Deep Neural Networks (DNNs) when presented with clean samples versus poisoned samples. Our analysis shows significant disparities in frequency sensitivity between these two types of samples. Building on these findings, we propose FREAK, a frequency-based poisoned sample detection algorithm that is simple yet effective. Our experimental results demonstrate the efficacy of FREAK not only against frequency backdoor attacks but also against some spatial attacks. Our work is just the first step in leveraging these insights. We believe that our analysis and proposed defense mechanism will provide a foundation for future research and development of backdoor defenses.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Computationally Budgeted Continual Learning: What Does Matter?
Authors:
Ameya Prabhu,
Hasan Abed Al Kader Hammoud,
Puneet Dokania,
Philip H. S. Torr,
Ser-Nam Lim,
Bernard Ghanem,
Adel Bibi
Abstract:
Continual Learning (CL) aims to sequentially train models on streams of incoming data that vary in distribution by preserving previous knowledge while adapting to new data. Current CL literature focuses on restricted access to previously seen data, while imposing no constraints on the computational budget for training. This is unreasonable for applications in-the-wild, where systems are primarily…
▽ More
Continual Learning (CL) aims to sequentially train models on streams of incoming data that vary in distribution by preserving previous knowledge while adapting to new data. Current CL literature focuses on restricted access to previously seen data, while imposing no constraints on the computational budget for training. This is unreasonable for applications in-the-wild, where systems are primarily constrained by computational and time budgets, not storage. We revisit this problem with a large-scale benchmark and analyze the performance of traditional CL approaches in a compute-constrained setting, where effective memory samples used in training can be implicitly restricted as a consequence of limited computation. We conduct experiments evaluating various CL sampling strategies, distillation losses, and partial fine-tuning on two large-scale datasets, namely ImageNet2K and Continual Google Landmarks V2 in data incremental, class incremental, and time incremental settings. Through extensive experiments amounting to a total of over 1500 GPU-hours, we find that, under compute-constrained setting, traditional CL approaches, with no exception, fail to outperform a simple minimal baseline that samples uniformly from memory. Our conclusions are consistent in a different number of stream time steps, e.g., 20 to 200, and under several computational budgets. This suggests that most existing CL methods are particularly too computationally expensive for realistic budgeted deployment. Code for this project is available at: https://github.com/drimpossible/BudgetCL.
△ Less
Submitted 14 July, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Real-Time Evaluation in Online Continual Learning: A New Hope
Authors:
Yasir Ghunaim,
Adel Bibi,
Kumail Alhamoud,
Motasem Alfarra,
Hasan Abed Al Kader Hammoud,
Ameya Prabhu,
Philip H. S. Torr,
Bernard Ghanem
Abstract:
Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions.…
▽ More
Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. To do this, we evaluate current CL methods with respect to their computational costs. We conduct extensive experiments on CLOC, a large-scale dataset containing 39 million time-stamped images with geolocation labels. We show that a simple baseline outperforms state-of-the-art CL methods under this evaluation, questioning the applicability of existing methods in realistic settings. In addition, we explore various CL components commonly used in the literature, including memory sampling strategies and regularization approaches. We find that all considered methods fail to be competitive against our simple baseline. This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical. We hope that the evaluation we provide will be the first step towards a paradigm shift to consider the computational cost in the development of online continual learning methods.
△ Less
Submitted 24 March, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Authors:
Hasan Abed Al Kader Hammoud,
Shuming Liu,
Mohammed Alkhrashi,
Fahad AlBalawi,
Bernard Ghanem
Abstract:
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have…
▽ More
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
△ Less
Submitted 19 January, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Continuous and Discrete Data Assimilation with Noisy Observations for the Rayleigh-Benard Convection: A Computational Study
Authors:
Mohamad Abed El Rahman Hammoud,
Olivier LeMaitre,
Edriss S. Titi,
Ibrahim Hoteit,
Omar Knio
Abstract:
Obtaining accurate high-resolution representations of model outputs is essential to describe the system dynamics. In general, however, only spatially- and temporally-coarse observations of the system states are available. These observations can also be corrupted by noise. Downscaling is a process/scheme in which one uses coarse scale observations to reconstruct the high-resolution solution of the…
▽ More
Obtaining accurate high-resolution representations of model outputs is essential to describe the system dynamics. In general, however, only spatially- and temporally-coarse observations of the system states are available. These observations can also be corrupted by noise. Downscaling is a process/scheme in which one uses coarse scale observations to reconstruct the high-resolution solution of the system states. Continuous Data Assimilation (CDA) is a recently introduced downscaling algorithm that constructs an increasingly accurate representation of the system states by continuously nudging the large scales using the coarse observations. We introduce a Discrete Data Assimilation (DDA) algorithm as a downscaling algorithm based on CDA with discrete-in-time nudging. We then investigate the performance of the CDA and DDA algorithms for downscaling noisy observations of the Rayleigh-Bénard convection system in the chaotic regime. In this computational study, a set of noisy observations was generated by perturbing a reference solution with Gaussian noise before downscaling them. The downscaled fields are then assessed using various error- and ensemble-based skill scores. The CDA solution was shown to converge towards the reference solution faster than that of DDA but at the cost of a higher asymptotic error. The numerical results also suggest a quadratic relationship between the $\ell_2$ error and the noise level for both CDA and DDA. Cubic and quadratic dependences of the DDA and CDA expected errors on the spatial resolution of the observations were obtained, respectively.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
Generalizability of Adversarial Robustness Under Distribution Shifts
Authors:
Kumail Alhamoud,
Hasan Abed Al Kader Hammoud,
Motasem Alfarra,
Bernard Ghanem
Abstract:
Recent progress in empirical and certified robustness promises to deliver reliable and deployable Deep Neural Networks (DNNs). Despite that success, most existing evaluations of DNN robustness have been done on images sampled from the same distribution on which the model was trained. However, in the real world, DNNs may be deployed in dynamic environments that exhibit significant distribution shif…
▽ More
Recent progress in empirical and certified robustness promises to deliver reliable and deployable Deep Neural Networks (DNNs). Despite that success, most existing evaluations of DNN robustness have been done on images sampled from the same distribution on which the model was trained. However, in the real world, DNNs may be deployed in dynamic environments that exhibit significant distribution shifts. In this work, we take a first step towards thoroughly investigating the interplay between empirical and certified adversarial robustness on one hand and domain generalization on another. To do so, we train robust models on multiple domains and evaluate their accuracy and robustness on an unseen domain. We observe that: (1) both empirical and certified robustness generalize to unseen domains, and (2) the level of generalizability does not correlate well with input visual similarity, measured by the FID between source and target domains. We also extend our study to cover a real-world medical application, in which adversarial augmentation significantly boosts the generalization of robustness with minimal effect on clean data accuracy.
△ Less
Submitted 6 November, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
Authors:
Guocheng Qian,
Yuchen Li,
Houwen Peng,
**jie Mai,
Hasan Abed Al Kader Hammoud,
Mohamed Elhoseiny,
Bernard Ghanem
Abstract:
PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than a…
▽ More
PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions. First, we propose a set of improved training strategies that significantly improve PointNet++ performance. For example, we show that, without any change in architecture, the overall accuracy (OA) of PointNet++ on ScanObjectNN object classification can be raised from 77.9% to 86.1%, even outperforming state-of-the-art PointMLP. Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets. PointNeXt can be flexibly scaled up and outperforms state-of-the-art methods on both 3D classification and segmentation tasks. For classification, PointNeXt reaches an overall accuracy of 87.7 on ScanObjectNN, surpassing PointMLP by 2.3%, while being 10x faster in inference. For semantic segmentation, PointNeXt establishes a new state-of-the-art performance with 74.9% mean IoU on S3DIS (6-fold cross-validation), being superior to the recent Point Transformer. The code and models are available at https://github.com/guochengqian/pointnext.
△ Less
Submitted 12 October, 2022; v1 submitted 9 June, 2022;
originally announced June 2022.
-
ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
Authors:
Guocheng Qian,
Hasan Abed Al Kader Hammoud,
Guohao Li,
Ali Thabet,
Bernard Ghanem
Abstract:
Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first pre…
▽ More
Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet++ into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy. We later replace the vanilla SA modules in PointNet++ with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet++ and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet++ by $7.4$ mIoU on S3DIS Area 5, while maintaining $1.6 \times $ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves $66.8$ mIoU and outperforms KPConv, while being more than $54 \times$ faster.
△ Less
Submitted 24 October, 2021; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Check Your Other Door! Creating Backdoor Attacks in the Frequency Domain
Authors:
Hasan Abed Al Kader Hammoud,
Bernard Ghanem
Abstract:
Deep Neural Networks (DNNs) are ubiquitous and span a variety of applications ranging from image classification to real-time object detection. As DNN models become more sophisticated, the computational cost of training these models becomes a burden. For this reason, outsourcing the training process has been the go-to option for many DNN users. Unfortunately, this comes at the cost of vulnerability…
▽ More
Deep Neural Networks (DNNs) are ubiquitous and span a variety of applications ranging from image classification to real-time object detection. As DNN models become more sophisticated, the computational cost of training these models becomes a burden. For this reason, outsourcing the training process has been the go-to option for many DNN users. Unfortunately, this comes at the cost of vulnerability to backdoor attacks. These attacks aim to establish hidden backdoors in the DNN so that it performs well on clean samples, but outputs a particular target label when a trigger is applied to the input. Existing backdoor attacks either generate triggers in the spatial domain or naively poison frequencies in the Fourier domain. In this work, we propose a pipeline based on Fourier heatmaps to generate a spatially dynamic and invisible backdoor attack in the frequency domain. The proposed attack is extensively evaluated on various datasets and network architectures. Unlike most existing backdoor attacks, the proposed attack can achieve high attack success rates with low poisoning rates and little to no drop in performance while remaining imperceptible to the human eye. Moreover, we show that the models poisoned by our attack are resistant to various state-of-the-art (SOTA) defenses, so we contribute two possible defenses that can evade the attack.
△ Less
Submitted 9 January, 2023; v1 submitted 12 September, 2021;
originally announced September 2021.
-
Privacy-accuracy trade-offs in noisy digital exposure notifications
Authors:
Abbas Hammoud,
Yun William Yu
Abstract:
Since the global spread of Covid-19 began to overwhelm the attempts of governments to conduct manual contact-tracing, there has been much interest in using the power of mobile phones to automate the contact-tracing process through the development of exposure notification applications. The rough idea is simple: use Bluetooth or other data-exchange technologies to record contacts between users, enab…
▽ More
Since the global spread of Covid-19 began to overwhelm the attempts of governments to conduct manual contact-tracing, there has been much interest in using the power of mobile phones to automate the contact-tracing process through the development of exposure notification applications. The rough idea is simple: use Bluetooth or other data-exchange technologies to record contacts between users, enable users to report positive diagnoses, and alert users who have been exposed to sick users. Of course, there are many privacy concerns associated with this idea. Much of the work in this area has been concerned with designing mechanisms for tracing contacts and alerting users that do not leak additional information about users beyond the existence of exposure events. However, although designing practical protocols is of crucial importance, it is essential to realize that notifying users about exposure events may itself leak confidential information (e.g. that a particular contact has been diagnosed). Luckily, while digital contact tracing is a relatively new task, the generic problem of privacy and data disclosure has been studied for decades. Indeed, the framework of differential privacy further permits provable query privacy by adding random noise. In this article, we translate two results from statistical privacy and social recommendation algorithms to exposure notification. We thus prove some naive bounds on the degree to which accuracy must be sacrificed if exposure notification frameworks are to be made more private through the injection of noise.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.