-
Decoding-time Realignment of Language Models
Authors:
Tianlin Liu,
Shangmin Guo,
Leonardo Bianco,
Daniele Calandriello,
Quentin Berthet,
Felipe Llinares,
Jessica Hoffmann,
Lucas Dixon,
Michal Valko,
Mathieu Blondel
Abstract:
Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularizat…
▽ More
Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularization is critical: insufficient regularization can lead to reduced model capabilities due to reward hacking, whereas excessive regularization hinders alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths. This process, however, is resource-intensive, especially for large models. To address this challenge, we propose decoding-time realignment (DeRa), a simple method to explore and evaluate different regularization strengths in aligned models without retraining. DeRa enables control over the degree of alignment, allowing users to smoothly transition between unaligned and aligned models. It also enhances the efficiency of hyperparameter tuning by enabling the identification of effective regularization strengths using a validation dataset.
△ Less
Submitted 24 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Action-Evolution Petri Nets: a Framework for Modeling and Solving Dynamic Task Assignment Problems
Authors:
Riccardo Lo Bianco,
Remco Dijkman,
Wim Nuijten,
Willem van Jaarsveld
Abstract:
Dynamic task assignment involves assigning arriving tasks to a limited number of resources in order to minimize the overall cost of the assignments. To achieve optimal task assignment, it is necessary to model the assignment problem first. While there exist separate formalisms, specifically Markov Decision Processes and (Colored) Petri Nets, to model, execute, and solve different aspects of the pr…
▽ More
Dynamic task assignment involves assigning arriving tasks to a limited number of resources in order to minimize the overall cost of the assignments. To achieve optimal task assignment, it is necessary to model the assignment problem first. While there exist separate formalisms, specifically Markov Decision Processes and (Colored) Petri Nets, to model, execute, and solve different aspects of the problem, there is no integrated modeling technique. To address this gap, this paper proposes Action-Evolution Petri Nets (A-E PN) as a framework for modeling and solving dynamic task assignment problems. A-E PN provides a unified modeling technique that can represent all elements of dynamic task assignment problems. Moreover, A-E PN models are executable, which means they can be used to learn close-to-optimal assignment policies through Reinforcement Learning (RL) without additional modeling effort. To evaluate the framework, we define a taxonomy of archetypical assignment problems. We show for three cases that A-E PN can be used to learn close-to-optimal assignment policies. Our results suggest that A-E PN can be used to model and solve a broad range of dynamic task assignment problems.
△ Less
Submitted 9 June, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Learning policies for resource allocation in business processes
Authors:
J. Middelhuis,
R. Lo Bianco,
E. Scherzer,
Z. A. Bukhsh,
I. J. B. F. Adan,
R. M. Dijkman
Abstract:
Efficient allocation of resources to activities is pivotal in executing business processes but remains challenging. While resource allocation methodologies are well-established in domains like manufacturing, their application within business process management remains limited. Existing methods often do not scale well to large processes with numerous activities or optimize across multiple cases. Th…
▽ More
Efficient allocation of resources to activities is pivotal in executing business processes but remains challenging. While resource allocation methodologies are well-established in domains like manufacturing, their application within business process management remains limited. Existing methods often do not scale well to large processes with numerous activities or optimize across multiple cases. This paper aims to address this gap by proposing two learning-based methods for resource allocation in business processes. The first method leverages Deep Reinforcement Learning (DRL) to learn near-optimal policies by taking action in the business process. The second method is a score-based value function approximation approach, which learns the weights of a set of curated features to prioritize resource assignments. To evaluate the proposed approaches, we first designed six distinct business processes with archetypal process flows and characteristics. These business processes were then connected to form three realistically sized business processes. We benchmarked our methods against traditional heuristics and existing resource allocation methods. The results show that our methods learn adaptive resource allocation policies that outperform or are competitive with the benchmarks in five out of six individual business processes. The DRL approach outperforms all benchmarks in all three composite business processes and finds a policy that is, on average, 13.1% better than the best-performing benchmark.
△ Less
Submitted 23 January, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.