-
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Authors:
Hasan Abed Al Kader Hammoud,
Umberto Michieli,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem,
Mete Ozay
Abstract:
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popu…
▽ More
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Aaron Purewal,
Csaba Botos,
Fabro Steibel,
Fazel Keshtkar,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Imperial,
Juan Arturo Nolazco,
Lori Landay,
Matthew Jackson,
Phillip H. S. Torr,
Trevor Darrell,
Yong Lee,
Jakob Foerster
Abstract:
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This reg…
▽ More
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder de Witt,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Botos Csaba,
Fabro Steibel,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Marvin Imperial,
Juan A. Nolazco-Flores,
Lori Landay,
Matthew Jackson,
Paul Röttger,
Philip H. S. Torr,
Trevor Darrell,
Yong Suk Lee,
Jakob Foerster
Abstract:
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i…
▽ More
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
△ Less
Submitted 24 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Latent Guard: a Safety Framework for Text-to-image Generation
Authors:
Runtao Liu,
Ashkan Khakzar,
**dong Gu,
Qifeng Chen,
Philip Torr,
Fabio Pizzati
Abstract:
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed…
▽ More
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts in the input text embeddings. Our proposed framework is composed of a data generation pipeline specific to the task using large language models, ad-hoc architectural components, and a contrastive learning strategy to benefit from the generated data. The effectiveness of our method is verified on three datasets and against four baselines. Code and data will be shared at https://github.com/rt219/LatentGuard.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
On Pretraining Data Diversity for Self-Supervised Learning
Authors:
Hasan Abed Al Kader Hammoud,
Tuhin Das,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem
Abstract:
We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even wit…
▽ More
We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days. Code and trained models will be available at https://github.com/hammoudhasan/DiversitySSL .
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Authors:
Hasan Abed Al Kader Hammoud,
Hani Itani,
Fabio Pizzati,
Philip Torr,
Adel Bibi,
Bernard Ghanem
Abstract:
We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data. Leveraging recent text-to-image (TTI) generative networks and large language models (LLM), we are able to generate synthetic datasets of images and corresponding captions at any scale, with no human intervention. With trainin…
▽ More
We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data. Leveraging recent text-to-image (TTI) generative networks and large language models (LLM), we are able to generate synthetic datasets of images and corresponding captions at any scale, with no human intervention. With training at scale, SynthCLIP achieves performance comparable to CLIP models trained on real datasets. We also introduce SynthCI-30M, a purely synthetic dataset comprising 30 million captioned images. Our code, trained models, and generated data are released at https://github.com/hammoudhasan/SynthCLIP
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Material Palette: Extraction of Materials from a Single Image
Authors:
Ivan Lopes,
Fabio Pizzati,
Raoul de Charette
Abstract:
In this paper, we propose a method to extract physically-based rendering (PBR) materials from a single real-world image. We do so in two steps: first, we map regions of the image to material concepts using a diffusion model, which allows the sampling of texture images resembling each material in the scene. Second, we benefit from a separate network to decompose the generated textures into Spatiall…
▽ More
In this paper, we propose a method to extract physically-based rendering (PBR) materials from a single real-world image. We do so in two steps: first, we map regions of the image to material concepts using a diffusion model, which allows the sampling of texture images resembling each material in the scene. Second, we benefit from a separate network to decompose the generated textures into Spatially Varying BRDFs (SVBRDFs), providing us with materials ready to be used in rendering applications. Our approach builds on existing synthetic material libraries with SVBRDF ground truth, but also exploits a diffusion-generated RGB texture dataset to allow generalization to new samples using unsupervised domain adaptation (UDA). Our contributions are thoroughly evaluated on synthetic and real-world datasets. We further demonstrate the applicability of our method for editing 3D scenes with materials estimated from real photographs. The code and models will be made open-source. Project page: https://astra-vision.github.io/MaterialPalette/
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
ManiFest: Manifold Deformation for Few-shot Image Translation
Authors:
Fabio Pizzati,
Jean-François Lalonde,
Raoul de Charette
Abstract:
Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to…
▽ More
Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to be composed of large numbers of images). The learned manifold is interpolated and deformed towards the few-shot target domain via patch-based adversarial and feature statistics alignment losses. All of these components are trained simultaneously during a single end-to-end loop. In addition to the general few-shot translation task, our approach can alternatively be conditioned on a single exemplar image to reproduce its specific style. Extensive experiments demonstrate the efficacy of ManiFest on multiple tasks, outperforming the state-of-the-art on all metrics and in both the general- and exemplar-based scenarios. Our code is available at https://github.com/cv-rits/Manifest .
△ Less
Submitted 20 July, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Leveraging Local Domains for Image-to-Image Translation
Authors:
Anthony Dell'Eva,
Fabio Pizzati,
Massimo Bertozzi,
Raoul de Charette
Abstract:
Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as…
▽ More
Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.
△ Less
Submitted 14 February, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Physics-informed Guided Disentanglement in Generative Networks
Authors:
Fabio Pizzati,
Pietro Cerri,
Raoul de Charette
Abstract:
Image-to-image translation (i2i) networks suffer from entanglement effects in presence of physics-related phenomena in target domain (such as occlusions, fog, etc), lowering altogether the translation quality, controllability and variability. In this paper, we propose a general framework to disentangle visual traits in target images. Primarily, we build upon collection of simple physics models, gu…
▽ More
Image-to-image translation (i2i) networks suffer from entanglement effects in presence of physics-related phenomena in target domain (such as occlusions, fog, etc), lowering altogether the translation quality, controllability and variability. In this paper, we propose a general framework to disentangle visual traits in target images. Primarily, we build upon collection of simple physics models, guiding the disentanglement with a physical model that renders some of the target traits, and learning the remaining ones. Because physics allows explicit and interpretable outputs, our physical models (optimally regressed on target) allows generating unseen scenarios in a controllable manner. Secondarily, we show the versatility of our framework to neural-guided disentanglement where a generative network is used in place of a physical model in case the latter is not directly accessible. Altogether, we introduce three strategies of disentanglement being guided from either a fully differentiable physics model, a (partially) non-differentiable physics model, or a neural network. The results show our disentanglement strategies dramatically increase performances qualitatively and quantitatively in several challenging scenarios for image translation.
△ Less
Submitted 27 April, 2023; v1 submitted 29 July, 2021;
originally announced July 2021.
-
CoMoGAN: continuous model-guided image-to-image translation
Authors:
Fabio Pizzati,
Pietro Cerri,
Raoul de Charette
Abstract:
CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, which together disentangle image content from position on target manifold. We rely on naive physics-inspired models to guide the training while allowing private model/translations featu…
▽ More
CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, which together disentangle image content from position on target manifold. We rely on naive physics-inspired models to guide the training while allowing private model/translations features. CoMoGAN can be used with any GAN backbone and allows new types of image translation, such as cyclic image translation like timelapse generation, or detached linear translation. On all datasets, it outperforms the literature. Our code is available at http://github.com/cv-rits/CoMoGAN .
△ Less
Submitted 29 June, 2022; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Model-based occlusion disentanglement for image-to-image translation
Authors:
Fabio Pizzati,
Pietro Cerri,
Raoul de Charette
Abstract:
Image-to-image translation is affected by entanglement phenomena, which may occur in case of target data encompassing occlusions such as raindrops, dirt, etc. Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. The experiments demonstrate our method is able to handle varying ty…
▽ More
Image-to-image translation is affected by entanglement phenomena, which may occur in case of target data encompassing occlusions such as raindrops, dirt, etc. Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. The experiments demonstrate our method is able to handle varying types of occlusions and generate highly realistic translations, qualitatively and quantitatively outperforming the state-of-the-art on multiple datasets.
△ Less
Submitted 20 July, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation
Authors:
Fabio Pizzati,
Raoul de Charette,
Michela Zaccaria,
Pietro Cerri
Abstract:
Image-to-image translation architectures may have limited effectiveness in some circumstances. For example, while generating rainy scenarios, they may fail to model typical traits of rain as water drops, and this ultimately impacts the synthetic images realism. With our method, called domain bridge, web-crawled data are exploited to reduce the domain gap, leading to the inclusion of previously ign…
▽ More
Image-to-image translation architectures may have limited effectiveness in some circumstances. For example, while generating rainy scenarios, they may fail to model typical traits of rain as water drops, and this ultimately impacts the synthetic images realism. With our method, called domain bridge, web-crawled data are exploited to reduce the domain gap, leading to the inclusion of previously ignored elements in the generated images. We make use of a network for clear to rain translation trained with the domain bridge to extend our work to Unsupervised Domain Adaptation (UDA). In that context, we introduce an online multimodal style-sampling strategy, where image translation multimodality is exploited at training time to improve performances. Finally, a novel approach for self-supervised learning is presented, and used to further align the domains. With our contributions, we simultaneously increase the realism of the generated images, while reaching on par performances with respect to the UDA state-of-the-art, with a simpler approach.
△ Less
Submitted 14 March, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Lane Detection and Classification using Cascaded CNNs
Authors:
Fabio Pizzati,
Marco Allodi,
Alejandro Barrera,
Fernando García
Abstract:
Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundar…
▽ More
Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundaries w.r.t. the vehicle may not suffice for a reliable positioning, as for path planning or localization information regarding lane types may also be needed. In this work, we present an end-to-end system for lane boundary identification, clustering and classification, based on two cascaded neural networks, that runs in real-time. To build the system, 14336 lane boundaries instances of the TuSimple dataset for lane detection have been labelled using 8 different classes. Our dataset and the code for inference are available online.
△ Less
Submitted 17 July, 2019; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Enhanced free space detection in multiple lanes based on single CNN with scene identification
Authors:
Fabio Pizzati,
Fernando García
Abstract:
Many systems for autonomous vehicles' navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorith…
▽ More
Many systems for autonomous vehicles' navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorithms only detect navigable areas, without information about lanes. State-of-the-art algorithms use CNNs for both tasks, with significant consumption of computing resources. We propose a novel approach that estimates the free space inside each lane, with a single CNN. Additionally, adding only a small requirement concerning GPU RAM, we infer the road type, that will be useful for path planning. To achieve this result, we train a multi-task CNN. Then, we further elaborate the output of the network, to extract polygons that can be effectively used in navigation control. Finally, we provide a computationally efficient implementation, based on ROS, that can be executed in real time. Our code and trained models are available online.
△ Less
Submitted 6 May, 2019; v1 submitted 2 May, 2019;
originally announced May 2019.