Search | arXiv e-print repository

Aligning Model Properties via Conformal Risk Control

Authors: William Overman, Jacqueline Jil Vallon, Mohsen Bayati

Abstract: AI model alignment is crucial due to inadvertent biases in training data and the underspecified pipeline in modern machine learning, where numerous models with excellent test set metrics can be produced, yet they may not meet end-user requirements. Recent advances demonstrate that post-training model alignment via human feedback can address some of these challenges. However, these methods are ofte… ▽ More AI model alignment is crucial due to inadvertent biases in training data and the underspecified pipeline in modern machine learning, where numerous models with excellent test set metrics can be produced, yet they may not meet end-user requirements. Recent advances demonstrate that post-training model alignment via human feedback can address some of these challenges. However, these methods are often confined to settings (such as generative AI) where humans can interpret model outputs and provide feedback. In traditional non-generative settings, where model outputs are numerical values or classes, detecting misalignment through single-sample outputs is highly challenging. In this paper we consider an alternative strategy. We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $\mathcal{P}$ of functions that exhibit specific desired behaviors. We focus on post-processing a pre-trained model $f$ to better align with $\mathcal{P}$ using conformal risk control. Specifically, we develop a general procedure for converting queries for a given property $\mathcal{P}$ to a collection of loss functions suitable for use in a conformal risk control algorithm. We prove a probabilistic guarantee that the resulting conformal interval around $f$ contains a function approximately satisfying $\mathcal{P}$. Given the capabilities of modern AI models with extensive parameters and training data, one might assume alignment issues will resolve naturally. However, increasing training data or parameters in a random feature model doesn't eliminate the need for alignment techniques when pre-training data is biased. We demonstrate our alignment methodology on supervised learning datasets for properties like monotonicity and concavity. Our flexible procedure can be applied to various desired properties. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2305.10744 [pdf, other]

Online Resource Allocation in Episodic Markov Decision Processes

Authors: Duksang Lee, William Overman, Dabeen Lee

Abstract: This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We pro… ▽ More This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(ρ^{-1}{H^{3/2}}S\sqrt{AT})$ where $ρ\in(0,1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes. For the decide-then-observe regime, we show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $\tilde O(ρ^{-1}{H^{3/2}}S\sqrt{AT})$ with high probability. We test the numerical efficiency of our method for a variant of the resource-constrained inventory management problem. △ Less

Submitted 18 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2110.10614 [pdf, other]

Independent Natural Policy Gradient Always Converges in Markov Potential Games

Authors: Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas

Abstract: Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. R… ▽ More Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well. We prove that Independent Natural Policy Gradient always converges in the last iterate using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov Potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: 24 pages

arXiv:2106.01969 [pdf, other]

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Authors: Stefanos Leonardos, Will Overman, Ioannis Panageas, Georgios Piliouras

Abstract: Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent… ▽ More Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over as MPGs can consist of settings where state-games can be zero-sum games. In the opposite direction, Markov games where every state-game is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings. △ Less

Submitted 28 September, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Fixed typos and a minor error in Proposition 3.2, condition C2

arXiv:1805.09518 [pdf, other]

Some Ordered Ramsey Numbers of Graphs on Four Vertices

Authors: Will Overman, Jeremy F. Alm, Kayla Coffey, Carolyn Langhoff

Abstract: An ordered graph $H$ on $n$ vertices is a graph whose vertices have been labeled bijectively with $\{1,...,n\}$. The ordered Ramsey number $r_<(H)$ is the minimum $n$ such that every two-coloring of the edges of the complete graph $K_n$ contains a monochromatic copy of $H$ such that the vertices in the copy appear in the same order as in $H$. Although some bounds on the ordered Ramsey numbers of… ▽ More An ordered graph $H$ on $n$ vertices is a graph whose vertices have been labeled bijectively with $\{1,...,n\}$. The ordered Ramsey number $r_<(H)$ is the minimum $n$ such that every two-coloring of the edges of the complete graph $K_n$ contains a monochromatic copy of $H$ such that the vertices in the copy appear in the same order as in $H$. Although some bounds on the ordered Ramsey numbers of certain infinite families of graphs are known, very little is known about the ordered Ramsey numbers of specific small graphs compared to how much we know about the usual Ramsey numbers for these graphs. In this paper we tackle the problem of proving non-trivial upper bounds on orderings of graphs on four vertices. We also extend one of our results to $n+1$ vertex graphs that consist of a complete graph on $n$ vertices with a pendant edge to vertex 1. Finally, we use a SAT solver to compute some numbers exactly. △ Less

Submitted 29 October, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

Comments: 14 pages. Updated paper with new SAT solver results

Showing 1–5 of 5 results for author: Overman, W