Search | arXiv e-print repository

arXiv:2402.01942 [pdf, ps, other]

Pairwise Rearrangement is Fixed-Parameter Tractable in the Single Cut-and-Join Model

Authors: Lora Bailey, Heather Smith Blake, Garner Cochran, Nathan Fox, Michael Levet, Reem Mahmoud, Inne Singgih, Grace Stadnyk, Alexander Wiedemann

Abstract: Genome rearrangement is a common model for molecular evolution. In this paper, we consider the Pairwise Rearrangement problem, which takes as input two genomes and asks for the number of minimum-length sequences of permissible operations transforming the first genome into the second. In the Single Cut-and-Join model (Bergeron, Medvedev, & Stoye, J. Comput. Biol. 2010), Pairwise Rearrangement is… ▽ More Genome rearrangement is a common model for molecular evolution. In this paper, we consider the Pairwise Rearrangement problem, which takes as input two genomes and asks for the number of minimum-length sequences of permissible operations transforming the first genome into the second. In the Single Cut-and-Join model (Bergeron, Medvedev, & Stoye, J. Comput. Biol. 2010), Pairwise Rearrangement is $\#\textsf{P}$-complete (Bailey, et. al., COCOON 2023), which implies that exact sampling is intractable. In order to cope with this intractability, we investigate the parameterized complexity of this problem. We exhibit a fixed-parameter tractable algorithm with respect to the number of components in the adjacency graph that are not cycles of length $2$ or paths of length $1$. As a consequence, we obtain that Pairwise Rearrangement in the Single Cut-and-Join model is fixed-parameter tractable by distance. Our results suggest that the number of nontrivial components in the adjacency graph serves as the key obstacle for efficient sampling. △ Less

Submitted 23 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Full version of paper to appear in SWAT 2024; arXiv admin note: text overlap with arXiv:2305.01851

MSC Class: 92-08; 92D10; 92D20; 68Q17 ACM Class: F.2.2

arXiv:2311.01011 [pdf, other]

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Authors: Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

Abstract: While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by p… ▽ More While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by players of an online game called Tensor Trust. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. The attacks in our dataset have a lot of easily interpretable stucture, and shed light on the weaknesses of LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking. Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset. Furthermore, we show that some attack strategies from the dataset generalize to deployed LLM-based applications, even though they have a very different set of constraints to the game. We release all data and source code at https://tensortrust.ai/paper △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.10658 [pdf]

Checking and Automating Confidentiality Theory in Isabelle/UTP

Authors: Lex Bailey, Jim Woodcock, Simon Foster, Roberto Metere

Abstract: The severity of recent vulnerabilities discovered on modern CPUs, e.g., Spectre [1], highlights how information leakage can have devas-tating effects to the security of computer systems. At the same time, it suggests that confidentiality should be promoted as a normal part of program verification, to discover and mitigate such vulnerabili-ties early in development. The theory we propose is primari… ▽ More The severity of recent vulnerabilities discovered on modern CPUs, e.g., Spectre [1], highlights how information leakage can have devas-tating effects to the security of computer systems. At the same time, it suggests that confidentiality should be promoted as a normal part of program verification, to discover and mitigate such vulnerabili-ties early in development. The theory we propose is primarily based on Bank's theory [2], a framework for reasoning about confidentiali-ty properties formalised in the Unifying Theories of Programming (UTP) [3]. We mechanised our encoding in the current implementa-tion of UTP in the Isabelle theorem prover, Isabelle/UTP [4]. We have identified some theoretical issues in Bank's original framework. Finally, we demonstrate how our mechanisation can be used to for-mally verify of some of the examples from Bank's work. △ Less

Submitted 7 September, 2023; originally announced October 2023.

arXiv:2309.00236 [pdf, other]

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Authors: Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

Abstract: Are foundation models secure against malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control the behaviour of VLMs at inference time, and introduce the general Behaviour Matching algorithm for training image hijacks. From this, we derive the Prompt Matching method, allowing us to train hijacks matching… ▽ More Are foundation models secure against malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control the behaviour of VLMs at inference time, and introduce the general Behaviour Matching algorithm for training image hijacks. From this, we derive the Prompt Matching method, allowing us to train hijacks matching the behaviour of an arbitrary user-defined text prompt (e.g. 'the Eiffel Tower is now located in Rome') using a generic, off-the-shelf dataset unrelated to our choice of prompt. We use Behaviour Matching to craft hijacks for four types of attack, forcing VLMs to generate outputs of the adversary's choice, leak information from their context window, override their safety training, and believe false statements. We study these attacks against LLaVA, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all attack types achieve a success rate of over 80%. Moreover, our attacks are automated and require only small image perturbations. △ Less

Submitted 22 April, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

Comments: Project page at https://image-hijacks.github.io

arXiv:2305.01851 [pdf, ps, other]

Complexity and Enumeration in Models of Genome Rearrangement

Authors: Lora Bailey, Heather Smith Blake, Garner Cochran, Nathan Fox, Michael Levet, Reem Mahmoud, Elizabeth Matson, Inne Singgih, Grace Stadnyk, Xinyi Wang, Alexander Wiedemann

Abstract: In this paper, we examine the computational complexity of enumeration in certain genome rearrangement models. We first show that the Pairwise Rearrangement problem in the Single Cut-and-Join model (Bergeron, Medvedev, & Stoye, J. Comput. Biol. 2010) is $\#\textsf{P}$-complete under polynomial-time Turing reductions. Next, we show that in the Single Cut or Join model (Feijao & Meidanis, IEEE ACM Tr… ▽ More In this paper, we examine the computational complexity of enumeration in certain genome rearrangement models. We first show that the Pairwise Rearrangement problem in the Single Cut-and-Join model (Bergeron, Medvedev, & Stoye, J. Comput. Biol. 2010) is $\#\textsf{P}$-complete under polynomial-time Turing reductions. Next, we show that in the Single Cut or Join model (Feijao & Meidanis, IEEE ACM Trans. Comp. Biol. Bioinf. 2011), the problem of enumerating all medians ($\#$Median) is logspace-computable ($\textsf{FL}$), improving upon the previous polynomial-time ($\textsf{FP}$) bound of Miklós & Smith (RECOMB 2015). △ Less

Submitted 23 April, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: Full version of paper that appeared in COCOON 2023: https://doi.org/10.1007/978-3-031-49190-0_1

MSC Class: 92-08; 92D10; 92D20; 68Q17 ACM Class: F.2.2

arXiv:2209.12127 [pdf, other]

SpeedLimit: Neural Architecture Search for Quantized Transformer Models

Authors: Yuji Chai, Luke Bailey, Yunho **, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments. △ Less

Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

arXiv:2208.13451 [pdf, other]

doi 10.1145/3556787.3556859

Common Patterns in Block-Based Robot Programs

Authors: Florian Obermüller, Robert Pernerstorfer, Lisa Bailey, Ute Heuer, Gordon Fraser

Abstract: Programmable robots are engaging and fun to play with, interact with the real world, and are therefore well suited to introduce young learners to programming. Introductory robot programming languages often extend existing block-based languages such as Scratch. While teaching programming with such languages is well established, the interaction with the real world in robot programs leads to specific… ▽ More Programmable robots are engaging and fun to play with, interact with the real world, and are therefore well suited to introduce young learners to programming. Introductory robot programming languages often extend existing block-based languages such as Scratch. While teaching programming with such languages is well established, the interaction with the real world in robot programs leads to specific challenges, for which learners and educators may require assistance and feedback. A practical approach to provide this feedback is by identifying and pointing out patterns in the code that are indicative of good or bad solutions. While such patterns have been defined for regular block-based programs, robot-specific programming aspects have not been considered so far. The aim of this paper is therefore to identify patterns specific to robot programming for the Scratch-based mBlock programming language, which is used for the popular mBot and Codey Rocky robots. We identify: (1) 26 bug patterns, which indicate erroneous code; (2) three code smells, which indicate code that may work but is written in a confusing or difficult to understand way; and (3) 18 code perfumes, which indicate aspects of code that are likely good. We extend the LitterBox analysis framework to automatically identify these patterns in mBlock programs. Evaluated on a dataset of 3,540 mBlock programs, we find a total of 6,129 instances of bug patterns, 592 code smells and 14,495 code perfumes. This demonstrates the potential of our approach to provide feedback and assistance to learners and educators alike for their mBlock robot programs. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: To be published in the Proceedings of the 17th Workshop in Primary and Secondary Computing Education (WIPSCE'22)

MSC Class: 97P50 ACM Class: D.2.5; K.3.2

arXiv:2205.03578 [pdf, other]

doi 10.1029/2022SW003149

Automatic Detection of Interplanetary Coronal Mass Ejections in Solar Wind In Situ Data

Authors: Hannah T. Rüdisser, Andreas Windisch, Ute V. Amerstorfer, Christian Möstl, Tanja Amerstorfer, Rachel L. Bailey, Martin A. Reiss

Abstract: Interplanetary coronal mass ejections (ICMEs) are one of the main drivers for space weather disturbances. In the past, different approaches have been used to automatically detect events in existing time series resulting from solar wind in situ observations. However, accurate and fast detection still remains a challenge when facing the large amount of data from different instruments. For the automa… ▽ More Interplanetary coronal mass ejections (ICMEs) are one of the main drivers for space weather disturbances. In the past, different approaches have been used to automatically detect events in existing time series resulting from solar wind in situ observations. However, accurate and fast detection still remains a challenge when facing the large amount of data from different instruments. For the automatic detection of ICMEs we propose a pipeline using a method that has recently proven successful in medical image segmentation. Comparing it to an existing method, we find that while achieving similar results, our model outperforms the baseline regarding training time by a factor of approximately 20, thus making it more applicable for other datasets. The method has been tested on in situ data from the Wind spacecraft between 1997 and 2015 with a True Skill Statistic (TSS) of 0.64. Out of the 640 ICMEs, 466 were detected correctly by our algorithm, producing a total of 254 False Positives. Additionally, it produced reasonable results on datasets with fewer features and smaller training sets from Wind, STEREO-A and STEREO-B with True Skill Statistics of 0.56, 0.57 and 0.53, respectively. Our pipeline manages to find the start of an ICME with a mean absolute error (MAE) of around 2 hours and 56 minutes, and the end time with a MAE of 3 hours and 20 minutes. The relatively fast training allows straightforward tuning of hyperparameters and could therefore easily be used to detect other structures and phenomena in solar wind data, such as corotating interaction regions. △ Less

Submitted 29 October, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

Journal ref: Space Weather, 20, e2022SW003149

arXiv:2106.09933 [pdf, other]

doi 10.1109/ITSC48978.2021.9565046

Data Enforced: An Exploratory Impact Analysis of Automated Speed Enforcement in the District of Columbia

Authors: Awad Abdelhalim, Linda Bailey, Emily Dalphy, Kelli Raboy

Abstract: In 2015, the District of Columbia framed a Vision Zero mission and action plan, with a target of achieving zero traffic fatalities by 2024. This study examines the impacts of Automated Speed Enforcement (ASE) and its role in achieving the goals of Vision Zero. Independent datasets containing detailed information about traffic crashes, ASE camera locations, and citation records, and driving speeds… ▽ More In 2015, the District of Columbia framed a Vision Zero mission and action plan, with a target of achieving zero traffic fatalities by 2024. This study examines the impacts of Automated Speed Enforcement (ASE) and its role in achieving the goals of Vision Zero. Independent datasets containing detailed information about traffic crashes, ASE camera locations, and citation records, and driving speeds across the District's streets were collected, combined, and analyzed to identify patterns and trends in crashes, speed limit violations, and speeding behavior before and after the ASE camera installation. The results of this exploratory analysis confirm the safety benefits of ASE systems in Washington, D.C. The study also provides a blueprint for the different means of evaluating the short-term impact of ASE systems using different data sources which can aid practitioners in better evaluating existing systems and support the decision-making process regarding future installations. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2010.15996 [pdf, other]

Lessons Learned from the 1st ARIEL Machine Learning Challenge: Correcting Transiting Exoplanet Light Curves for Stellar Spots

Authors: Nikolaos Nikolaou, Ingo P. Waldmann, Angelos Tsiaras, Mario Morvan, Billy Edwards, Kai Hou Yip, Giovanna Tinetti, Subhajit Sarkar, James M. Dawson, Vadim Borisov, Gjergji Kasneci, Matej Petkovic, Tomaz Stepisnik, Tarek Al-Ubaidi, Rachel Louise Bailey, Michael Granitzer, Sahib Julka, Roman Kern, Patrick Ofner, Stefan Wagner, Lukas Heppe, Mirko Bunse, Katharina Morik

Abstract: The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The… ▽ More The last decade has witnessed a rapid growth of the field of exoplanet discovery and characterisation. However, several big challenges remain, many of which could be addressed using machine learning methodology. For instance, the most prolific method for detecting exoplanets and inferring several of their characteristics, transit photometry, is very sensitive to the presence of stellar spots. The current practice in the literature is to identify the effects of spots visually and correct for them manually or discard the affected data. This paper explores a first step towards fully automating the efficient and precise derivation of transit depths from transit light curves in the presence of stellar spots. The methods and results we present were obtained in the context of the 1st Machine Learning Challenge organized for the European Space Agency's upcoming Ariel mission. We first present the problem, the simulated Ariel-like data and outline the Challenge while identifying best practices for organizing similar challenges in the future. Finally, we present the solutions obtained by the top-5 winning teams, provide their code and discuss their implications. Successful solutions either construct highly non-linear (w.r.t. the raw data) models with minimal preprocessing -deep neural networks and ensemble methods- or amount to obtaining meaningful statistics from the light curves, constructing linear models on which yields comparably good predictive performance. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: 20 pages, 7 figures, 2 tables, Submitted to The Astrophysics Journal (ApJ)

arXiv:2008.13566 [pdf, other]

doi 10.1145/3481312.3481347

An Experience of Introducing Primary School Children to Programming using Ozobots (Practical Report)

Authors: Nina Körber, Lisa Bailey, Luisa Greifenstein, Gordon Fraser, Barbara Sabitzer, Marina Rottenhofer

Abstract: Algorithmic thinking is a central concept in the context of computational thinking, and it is commonly taught by computer programming. A recent trend is to introduce basic programming concepts already very early on at primary school level. There are, however, several challenges in teaching programming at this level: Schools and teachers are often neither equipped nor trained appropriately, and the… ▽ More Algorithmic thinking is a central concept in the context of computational thinking, and it is commonly taught by computer programming. A recent trend is to introduce basic programming concepts already very early on at primary school level. There are, however, several challenges in teaching programming at this level: Schools and teachers are often neither equipped nor trained appropriately, and the best way to move from initial "unplugged" activities to creating programs on a computer are still a matter of open debate. In this paper, we describe our experience of a small INTERREG-project aiming at supporting local primary schools in introducing children to programming concepts using Ozobot robots. These robots have two distinct advantages: First, they can be programmed with and without computers, thus hel** the transition from unplugged programming to programming with a computer. Second, they are small and easy to transport, even when used together with tablet computers. Although we learned in our outreach events that the use of Ozobots is not without challenges, our overall experience is positive and can hopefully support others in setting up first encounters with programming at primary schools. △ Less

Submitted 16 August, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

Comments: To be published in the proceedings of the 16th Workshop in Primary and Secondary Computing Education (WIPSCE)

MSC Class: 97P50 ACM Class: K.3.2; D.2.6

Showing 1–11 of 11 results for author: Bailey, L