-
Optimizing watermarks for large language models
Authors:
Bram Wouters
Abstract:
With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization…
▽ More
With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Authors:
Floris Holstege,
Bram Wouters,
Noud van Giersbergen,
Cees Diks
Abstract:
Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propos…
▽ More
Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
LUCE: A Blockchain Solution for monitoring data License accoUntability and CompliancE
Authors:
Andine Havelange,
Michel Dumontier,
Birgit Wouters,
Jona Linde,
David Townend,
Arno Riedl,
Visara Urovi
Abstract:
In this paper we present our preliminary work on monitoring data License accoUntability and CompliancE (LUCE). LUCE is a blockchain platform solution designed to stimulate data sharing and reuse, by facilitating compliance with licensing terms. The platform enables data accountability by recording the use of data and their purpose on a blockchain-supported platform. LUCE allows for individual data…
▽ More
In this paper we present our preliminary work on monitoring data License accoUntability and CompliancE (LUCE). LUCE is a blockchain platform solution designed to stimulate data sharing and reuse, by facilitating compliance with licensing terms. The platform enables data accountability by recording the use of data and their purpose on a blockchain-supported platform. LUCE allows for individual data to be rectified and erased. In doing so LUCE can ensure subjects' General Data Protection Regulation's (GDPR) rights to access, rectification and erasure. Our contribution is to provide a distributed solution for the automatic management of data accountability and their license terms.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Analyzing Partitioned FAIR Health Data Responsibly
Authors:
Chang Sun,
Lianne Ippel,
Birgit Wouters,
Johan van Soest,
Alexander Malic,
Onaopepo Adekunle,
Bob van den Berg,
Marco Puts,
Ole Mussmann,
Annemarie Koster,
Carla van der Kallen,
David Townend,
Andre Dekker,
Michel Dumontier
Abstract:
It is widely anticipated that the use of health-related big data will enable further understanding and improvements in human health and wellbeing. Our current project, funded through the Dutch National Research Agenda, aims to explore the relationship between the development of diabetes and socio-economic factors such as lifestyle and health care utilization. The analysis involves combining data f…
▽ More
It is widely anticipated that the use of health-related big data will enable further understanding and improvements in human health and wellbeing. Our current project, funded through the Dutch National Research Agenda, aims to explore the relationship between the development of diabetes and socio-economic factors such as lifestyle and health care utilization. The analysis involves combining data from the Maastricht Study (DMS), a prospective clinical study, and data collected by Statistics Netherlands (CBS) as part of its routine operations. However, a wide array of social, legal, technical, and scientific issues hinder the analysis. In this paper, we describe these challenges and our progress towards addressing them.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.