The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Authors:
Nathaniel Li,
Alexander Pan,
Anjali Gopal,
Summer Yue,
Daniel Berrios,
Alice Gatti,
Justin D. Li,
Ann-Kathrin Dombrowski,
Shashwat Goel,
Long Phan,
Gabriel Mukobi,
Nathan Helm-Burger,
Rassin Lababidi,
Lennart Justen,
Andrew B. Liu,
Michael Chen,
Isabelle Barrass,
Oliver Zhang,
Xiaoyuan Zhu,
Rishub Tamirisa,
Bhrugu Bharathi,
Adam Khoja,
Zhenqi Zhao,
Ariel Herbert-Voss,
Cort B. Breuer
, et al. (32 additional authors not shown)
Abstract:
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe…
▽ More
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
△ Less
Submitted 15 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
Formal Modeling and Analysis of Legal Contracts using ContractCheck
Authors:
Alan Khoja,
Martin Kölbl,
Stefan Leue,
Rüdiger Wilhelmi
Abstract:
We describe a method and tool called \textit{ContractCheck} that allows for the consistency analysis of legal contracts, in particular Sales Purchase Agreements (SPAs). The analysis relies on an encoding of the premises for the execution of the clauses of an SPA as well as the proposed consistency constraints using decidable fragments of first-order logic. Textual SPAs are first encoded in a struc…
▽ More
We describe a method and tool called \textit{ContractCheck} that allows for the consistency analysis of legal contracts, in particular Sales Purchase Agreements (SPAs). The analysis relies on an encoding of the premises for the execution of the clauses of an SPA as well as the proposed consistency constraints using decidable fragments of first-order logic. Textual SPAs are first encoded in a structured natural language format, called blocks. \textit{ContractCheck} interprets these blocks and constraints and translates them in first-oder logic assertions. It then invokes a Satisfiability Modulo Theories (SMT) solver in order to establish the executability of a considered contract by either providing a satisfying model, or by providing evidence of contradictory clauses that impede the execution of the contract. We illustrate the application of \textit{ContractCheck} and conclude by proposing directions for future research.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.