-
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
Authors:
Minghao Shao,
Sofija Jancheska,
Meet Udeshi,
Brendan Dolan-Gavitt,
Haoran Xi,
Kimberly Milner,
Boyuan Chen,
Max Yin,
Siddharth Garg,
Prashanth Krishnamurthy,
Farshad Khorrami,
Ramesh Karri,
Muhammad Shafique
Abstract:
Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database incl…
▽ More
Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for develo**, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Quantum interactive proofs and the complexity of separability testing
Authors:
Gus Gutoski,
Patrick Hayden,
Kevin Milner,
Mark M. Wilde
Abstract:
We identify a formal connection between physical problems related to the detection of separable (unentangled) quantum states and complexity classes in theoretical computer science. In particular, we show that to nearly every quantum interactive proof complexity class (including BQP, QMA, QMA(2), and QSZK), there corresponds a natural separability testing problem that is complete for that class. Of…
▽ More
We identify a formal connection between physical problems related to the detection of separable (unentangled) quantum states and complexity classes in theoretical computer science. In particular, we show that to nearly every quantum interactive proof complexity class (including BQP, QMA, QMA(2), and QSZK), there corresponds a natural separability testing problem that is complete for that class. Of particular interest is the fact that the problem of determining whether an isometry can be made to produce a separable state is either QMA-complete or QMA(2)-complete, depending upon whether the distance between quantum states is measured by the one-way LOCC norm or the trace norm. We obtain strong hardness results by proving that for each n-qubit maximally entangled state there exists a fixed one-way LOCC measurement that distinguishes it from any separable state with error probability that decays exponentially in n.
△ Less
Submitted 30 September, 2014; v1 submitted 27 August, 2013;
originally announced August 2013.
-
Two-message quantum interactive proofs and the quantum separability problem
Authors:
Patrick Hayden,
Kevin Milner,
Mark M. Wilde
Abstract:
Suppose that a polynomial-time mixed-state quantum circuit, described as a sequence of local unitary interactions followed by a partial trace, generates a quantum state shared between two parties. One might then wonder, does this quantum circuit produce a state that is separable or entangled? Here, we give evidence that it is computationally hard to decide the answer to this question, even if one…
▽ More
Suppose that a polynomial-time mixed-state quantum circuit, described as a sequence of local unitary interactions followed by a partial trace, generates a quantum state shared between two parties. One might then wonder, does this quantum circuit produce a state that is separable or entangled? Here, we give evidence that it is computationally hard to decide the answer to this question, even if one has access to the power of quantum computation. We begin by exhibiting a two-message quantum interactive proof system that can decide the answer to a promise version of the question. We then prove that the promise problem is hard for the class of promise problems with "quantum statistical zero knowledge" (QSZK) proof systems by demonstrating a polynomial-time Karp reduction from the QSZK-complete promise problem "quantum state distinguishability" to our quantum separability problem. By exploiting Knill's efficient encoding of a matrix description of a state into a description of a circuit to generate the state, we can show that our promise problem is NP-hard with respect to Cook reductions. Thus, the quantum separability problem (as phrased above) constitutes the first nontrivial promise problem decidable by a two-message quantum interactive proof system while being hard for both NP and QSZK. We also consider a variant of the problem, in which a given polynomial-time mixed-state quantum circuit accepts a quantum state as input, and the question is to decide if there is an input to this circuit which makes its output separable across some bipartite cut. We prove that this problem is a complete promise problem for the class QIP of problems decidable by quantum interactive proof systems. Finally, we show that a two-message quantum interactive proof system can also decide a multipartite generalization of the quantum separability problem.
△ Less
Submitted 6 September, 2013; v1 submitted 26 November, 2012;
originally announced November 2012.