-
Hybrid Data Management Architecture for Present Quantum Computing
Authors:
Markus Zajac,
Uta Störl
Abstract:
Quantum computers promise polynomial or exponential speed-up in solving certain problems compared to classical computers. However, in practical use, there are currently a number of fundamental technical challenges. One of them concerns the loading of data into quantum computers, since they cannot access common databases. In this vision paper, we develop a hybrid data management architecture in whi…
▽ More
Quantum computers promise polynomial or exponential speed-up in solving certain problems compared to classical computers. However, in practical use, there are currently a number of fundamental technical challenges. One of them concerns the loading of data into quantum computers, since they cannot access common databases. In this vision paper, we develop a hybrid data management architecture in which databases can serve as data sources for quantum algorithms. To test the architecture, we perform experiments in which we assign data points stored in a database to clusters. For cluster assignment, a quantum algorithm processes this data by determining the distances between data points and cluster centroids.
△ Less
Submitted 18 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Solving Distributed Flexible Job Shop Scheduling Problems in the Wool Textile Industry with Quantum Annealing
Authors:
Lilia Toma,
Markus Zajac,
Uta Störl
Abstract:
Many modern manufacturing companies have evolved from a single production site to a multi-factory production environment that must handle both geographically dispersed production orders and their multi-site production steps. The availability of a range of machines in different locations capable of performing the same operation and ship** times between factories have transformed planning systems…
▽ More
Many modern manufacturing companies have evolved from a single production site to a multi-factory production environment that must handle both geographically dispersed production orders and their multi-site production steps. The availability of a range of machines in different locations capable of performing the same operation and ship** times between factories have transformed planning systems from the classic Job Shop Scheduling Problem (JSSP) to Distributed Flexible Job Shop Scheduling Problem (DFJSP). As a result, the complexity of production planning has increased significantly. In our work, we use Quantum Annealing (QA) to solve the DFJSP. In addition to the assignment of production orders to production sites, the assignment of production steps to production sites also takes place. This requirement is based on a real use case of a wool textile manufacturer. To investigate the applicability of this method to large problem instances, problems ranging from 50 variables up to 250 variables, the largest problem that could be embedded into a D-Wave quantum annealer Quantum Processing Unit (QPU), are formulated and solved. Special attention is dedicated to the determination of the Lagrange parameters of the Quadratic Unconstrained Binary Optimization (QUBO) model and the QPU configuration parameters, as these factors can significantly impact solution quality. The obtained solutions are compared to solutions obtained by Simulated Annealing (SA), both in terms of solution quality and calculation time. The results demonstrate that QA has the potential to solve large problem instances specific to the industry.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
Authors:
Maciej Wołczyk,
Bartłomiej Cupiał,
Mateusz Ostaszewski,
Michał Bortkiewicz,
Michał Zając,
Razvan Pascanu,
Łukasz Kuciński,
Piotr Miłoś
Abstract:
Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: for…
▽ More
Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario.
△ Less
Submitted 12 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Exploiting Novel GPT-4 APIs
Authors:
Kellin Pelrine,
Mohammad Taufeeque,
Michał Zając,
Euan McLean,
Adam Gleave
Abstract:
Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API. However, real-world APIs are often more flexible than just text generation: these APIs expose ``gray-box'' access leading to new threat vectors. To explore this, we red-team three new functionalities exposed in the GPT-4 APIs: fine-…
▽ More
Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API. However, real-world APIs are often more flexible than just text generation: these APIs expose ``gray-box'' access leading to new threat vectors. To explore this, we red-team three new functionalities exposed in the GPT-4 APIs: fine-tuning, function calling and knowledge retrieval. We find that fine-tuning a model on as few as 15 harmful examples or 100 benign examples can remove core safeguards from GPT-4, enabling a range of harmful outputs. Furthermore, we find that GPT-4 Assistants readily divulge the function call schema and can be made to execute arbitrary function calls. Finally, we find that knowledge retrieval can be hijacked by injecting instructions into retrieval documents. These vulnerabilities highlight that any additions to the functionality exposed by an API can create new vulnerabilities.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Prediction Error-based Classification for Class-Incremental Learning
Authors:
Michał Zając,
Tinne Tuytelaars,
Gido M. van de Ven
Abstract:
Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Pre…
▽ More
Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Prediction Error-based Classification (PEC), which differs from traditional discriminative and generative classification paradigms. PEC computes a class score by measuring the prediction error of a model trained to replicate the outputs of a frozen random neural network on data from that class. The method can be interpreted as approximating a classification rule based on Gaussian Process posterior variance. PEC offers several practical advantages, including sample efficiency, ease of tuning, and effectiveness even when data are presented one class at a time. Our empirical results show that PEC performs strongly in single-pass-through-data CIL, outperforming other rehearsal-free baselines in all cases and rehearsal-based methods with moderate replay buffer size in most cases across multiple benchmarks.
△ Less
Submitted 9 March, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Exploring Continual Learning of Diffusion Models
Authors:
Michał Zając,
Kamil Deja,
Anna Kuzina,
Jakub M. Tomczak,
Tomasz Trzciński,
Florian Shkurti,
Piotr Miłoś
Abstract:
Diffusion models have achieved remarkable success in generating high-quality images thanks to their novel training procedures applied to unprecedented amounts of data. However, training a diffusion model from scratch is computationally expensive. This highlights the need to investigate the possibility of training these models iteratively, reusing computation while the data distribution changes. In…
▽ More
Diffusion models have achieved remarkable success in generating high-quality images thanks to their novel training procedures applied to unprecedented amounts of data. However, training a diffusion model from scratch is computationally expensive. This highlights the need to investigate the possibility of training these models iteratively, reusing computation while the data distribution changes. In this study, we take the first step in this direction and evaluate the continual learning (CL) properties of diffusion models. We begin by benchmarking the most common CL methods applied to Denoising Diffusion Probabilistic Models (DDPMs), where we note the strong performance of the experience replay with the reduced rehearsal coefficient. Furthermore, we provide insights into the dynamics of forgetting, which exhibit diverse behavior across diffusion timesteps. We also uncover certain pitfalls of using the bits-per-dimension metric for evaluating CL.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery
Authors:
Mateusz Olko,
Michał Zając,
Aleksandra Nowak,
Nino Scherrer,
Yashas Annadani,
Stefan Bauer,
Łukasz Kuciński,
Piotr Miłoś
Abstract:
Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to…
▽ More
Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.
△ Less
Submitted 3 April, 2024; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Disentangling Transfer in Continual Reinforcement Learning
Authors:
Maciej Wołczyk,
Michał Zając,
Razvan Pascanu,
Łukasz Kuciński,
Piotr Miłoś
Abstract:
The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcem…
▽ More
The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcement learning. We adopt SAC as the underlying RL algorithm and Continual World as a suite of continuous control tasks. We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. The best set of choices, dubbed ClonEx-SAC, is evaluated on the recent Continual World benchmark. ClonEx-SAC achieves 87% final success rate compared to 80% of PackNet, the best method in the benchmark. Moreover, the transfer grows from 0.18 to 0.54 according to the metric provided by Continual World.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Continual World: A Robotic Benchmark For Continual Reinforcement Learning
Authors:
Maciej Wołczyk,
Michał Zając,
Razvan Pascanu,
Łukasz Kuciński,
Piotr Miłoś
Abstract:
Continual learning (CL) -- the ability to continuously learn, building on previously acquired knowledge -- is a natural requirement for long-lived autonomous reinforcement learning (RL) agents. While building such agents, one needs to balance opposing desiderata, such as constraints on capacity and compute, the ability to not catastrophically forget, and to exhibit positive transfer on new tasks.…
▽ More
Continual learning (CL) -- the ability to continuously learn, building on previously acquired knowledge -- is a natural requirement for long-lived autonomous reinforcement learning (RL) agents. While building such agents, one needs to balance opposing desiderata, such as constraints on capacity and compute, the ability to not catastrophically forget, and to exhibit positive transfer on new tasks. Understanding the right trade-off is conceptually and computationally challenging, which we argue has led the community to overly focus on catastrophic forgetting. In response to these issues, we advocate for the need to prioritize forward transfer and propose Continual World, a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World as a testbed. Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand the performance of existing and future solutions. Information about the benchmark, including the open-source code, is available at https://sites.google.com/view/continualworld.
△ Less
Submitted 28 October, 2021; v1 submitted 23 May, 2021;
originally announced May 2021.
-
Google Research Football: A Novel Reinforcement Learning Environment
Authors:
Karol Kurach,
Anton Raichuk,
Piotr Stańczyk,
Michał Zając,
Olivier Bachem,
Lasse Espeholt,
Carlos Riquelme,
Damien Vincent,
Marcin Michalski,
Olivier Bousquet,
Sylvain Gelly
Abstract:
Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator…
▽ More
Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.
△ Less
Submitted 14 April, 2020; v1 submitted 25 July, 2019;
originally announced July 2019.
-
Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift
Authors:
Michał Zając,
Konrad Zolna,
Stanisław Jastrzębski
Abstract:
Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise).…
▽ More
Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
ZETH: On Integrating Zerocash on Ethereum
Authors:
Antoine Rondelet,
Michal Zajac
Abstract:
Transaction privacy is a hard problem on an account-based blockchain such as Ethereum. While Ben-Sasson et al. presented the Zerocash protocol [BCG+14] as a decentralized anonymous payment (DAP) scheme standing on top of Bitcoin, no study about the integration of such DAP on top of a ledger defined in the account model was provided. In this paper we aim to fill this gap and propose ZETH, an adapta…
▽ More
Transaction privacy is a hard problem on an account-based blockchain such as Ethereum. While Ben-Sasson et al. presented the Zerocash protocol [BCG+14] as a decentralized anonymous payment (DAP) scheme standing on top of Bitcoin, no study about the integration of such DAP on top of a ledger defined in the account model was provided. In this paper we aim to fill this gap and propose ZETH, an adaptation of Zerocash that can be deployed on top of Ethereum without making any change to the base layer. Our study shows that not only ZETH could be used to transfer Ether, the base currency of Ethereum, but it could also be used to transfer other types of smart contract-based digital assets. We propose an analysis of ZETH's privacy promises and argue that information leakages intrinsic to the use of this protocol are controlled and well-defined, which makes it a viable solution to support private transactions in the context of public and permissioned chains.
△ Less
Submitted 4 April, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Adversarial Framing for Image and Video Classification
Authors:
Konrad Zolna,
Michal Zajac,
Negar Rostamzadeh,
Pedro O. Pinheiro
Abstract:
Neural networks are prone to adversarial attacks. In general, such attacks deteriorate the quality of the input by either slightly modifying most of its pixels, or by occluding it with a patch. In this paper, we propose a method that keeps the image unchanged and only adds an adversarial framing on the border of the image. We show empirically that our method is able to successfully attack state-of…
▽ More
Neural networks are prone to adversarial attacks. In general, such attacks deteriorate the quality of the input by either slightly modifying most of its pixels, or by occluding it with a patch. In this paper, we propose a method that keeps the image unchanged and only adds an adversarial framing on the border of the image. We show empirically that our method is able to successfully attack state-of-the-art methods on both image and video classification problems. Notably, the proposed method results in a universal attack which is very fast at test time. Source code can be found at https://github.com/zajaczajac/adv_framing .
△ Less
Submitted 17 October, 2019; v1 submitted 11 December, 2018;
originally announced December 2018.
-
Improved GQ-CNN: Deep Learning Model for Planning Robust Grasps
Authors:
Maciej Jaśkowski,
Jakub Świątkowski,
Michał Zając,
Maciej Klimek,
Jarek Potiuk,
Piotr Rybicki,
Piotr Polatowski,
Przemysław Walczyk,
Kacper Nowicki,
Marek Cygan
Abstract:
Recent developments in the field of robot gras** have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the mo…
▽ More
Recent developments in the field of robot gras** have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the model validation accuracy from 92.2% to 95.8% and from 85.9% to 88.0% on respectively image-wise and object-wise training and validation splits.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Leakage-resilient Cryptography with key derived from sensitive data
Authors:
Konrad Durnoga,
Tomasz Kazana,
Michał Zając,
Maciej Zdanowicz
Abstract:
In this paper we address the problem of large space consumption for protocols in the Bounded Retrieval Model (BRM), which require users to store large secret keys subject to adversarial leakage. We propose a method to derive keys for such protocols on-the-fly from weakly random private data (like text documents or photos, users keep on their disks anyway for non-cryptographic purposes) in such a w…
▽ More
In this paper we address the problem of large space consumption for protocols in the Bounded Retrieval Model (BRM), which require users to store large secret keys subject to adversarial leakage. We propose a method to derive keys for such protocols on-the-fly from weakly random private data (like text documents or photos, users keep on their disks anyway for non-cryptographic purposes) in such a way that no extra storage is needed. We prove that any leakage-resilient protocol (belonging to a certain, arguably quite broad class) when run with a key obtained this way retains a similar level of security as the original protocol had. Additionally, we guarantee privacy of the data the actual keys are derived from. That is, an adversary can hardly gain any knowledge about the private data except that he could otherwise obtain via leakage. Our reduction works in the Random Oracle model.
△ Less
Submitted 1 October, 2018; v1 submitted 31 January, 2015;
originally announced February 2015.