Search | arXiv e-print repository

Optimizing watermarks for large language models

Abstract: With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization… ▽ More With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 15 pages; preprint

arXiv:2310.11991 [pdf, other]

Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

Authors: Floris Holstege, Bram Wouters, Noud van Giersbergen, Cees Diks

Abstract: Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propos… ▽ More Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Preprint. Under Review. 33 pages

arXiv:1908.02287 [pdf, other]

LUCE: A Blockchain Solution for monitoring data License accoUntability and CompliancE

Authors: Andine Havelange, Michel Dumontier, Birgit Wouters, Jona Linde, David Townend, Arno Riedl, Visara Urovi

Abstract: In this paper we present our preliminary work on monitoring data License accoUntability and CompliancE (LUCE). LUCE is a blockchain platform solution designed to stimulate data sharing and reuse, by facilitating compliance with licensing terms. The platform enables data accountability by recording the use of data and their purpose on a blockchain-supported platform. LUCE allows for individual data… ▽ More In this paper we present our preliminary work on monitoring data License accoUntability and CompliancE (LUCE). LUCE is a blockchain platform solution designed to stimulate data sharing and reuse, by facilitating compliance with licensing terms. The platform enables data accountability by recording the use of data and their purpose on a blockchain-supported platform. LUCE allows for individual data to be rectified and erased. In doing so LUCE can ensure subjects' General Data Protection Regulation's (GDPR) rights to access, rectification and erasure. Our contribution is to provide a distributed solution for the automatic management of data accountability and their license terms. △ Less

Submitted 6 August, 2019; originally announced August 2019.

Comments: 14 pages, 10 figures

arXiv:1812.00991 [pdf]

Analyzing Partitioned FAIR Health Data Responsibly

Authors: Chang Sun, Lianne Ippel, Birgit Wouters, Johan van Soest, Alexander Malic, Onaopepo Adekunle, Bob van den Berg, Marco Puts, Ole Mussmann, Annemarie Koster, Carla van der Kallen, David Townend, Andre Dekker, Michel Dumontier

Abstract: It is widely anticipated that the use of health-related big data will enable further understanding and improvements in human health and wellbeing. Our current project, funded through the Dutch National Research Agenda, aims to explore the relationship between the development of diabetes and socio-economic factors such as lifestyle and health care utilization. The analysis involves combining data f… ▽ More It is widely anticipated that the use of health-related big data will enable further understanding and improvements in human health and wellbeing. Our current project, funded through the Dutch National Research Agenda, aims to explore the relationship between the development of diabetes and socio-economic factors such as lifestyle and health care utilization. The analysis involves combining data from the Maastricht Study (DMS), a prospective clinical study, and data collected by Statistics Netherlands (CBS) as part of its routine operations. However, a wide array of social, legal, technical, and scientific issues hinder the analysis. In this paper, we describe these challenges and our progress towards addressing them. △ Less

Submitted 2 December, 2018; originally announced December 2018.

Comments: 6 pages, 1 figure, preliminary result, project report

ACM Class: E.1; E.3; H.2.4; H.2.8

Showing 1–4 of 4 results for author: Wouters, B