Search | arXiv e-print repository

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Authors: Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman

Abstract: Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embeddin… ▽ More Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands. Often, the LLM will mistake the adversarial instructions as user commands to be followed, creating a security vulnerability in the larger system. We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input. The key insight is to utilize transformations of an input to provide a reliable and continuous signal of its provenance. We evaluate spotlighting as a defense against indirect prompt injection attacks, and find that it is a robust defense that has minimal detrimental impact to underlying NLP tasks. Using GPT-family models, we find that spotlighting reduces the attack success rate from greater than {50}\% to below {2}\% in our experiments with minimal impact on task efficacy. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2303.13299 [pdf, other]

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Authors: Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson

Abstract: As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family… ▽ More As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2210.02516 [pdf, other]

doi 10.1145/3514094.3534154

Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation

Authors: I. Elizabeth Kumar, Keegan E. Hines, John P. Dickerson

Abstract: Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of… ▽ More Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of "unfairness," thus raising the concern that banks and other financial institutions could -- potentially unwittingly -- engage in illegal discrimination through the use of this technology. In the US, there are laws in place to make sure discrimination does not happen in lending and agencies charged with enforcing them. However, conversations around fair credit models in computer science and in policy are often misaligned: fair machine learning research often lacks legal and practical considerations specific to existing fair lending policy, and regulators have yet to issue new guidance on how, if at all, credit risk models should be utilizing practices and techniques from the research community. This paper aims to better align these sides of the conversation. We describe the current state of credit discrimination regulation in the United States, contextualize results from fair ML research to identify the specific fairness concerns raised by the use of machine learning in lending, and discuss regulatory opportunities to address these concerns. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Journal ref: AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

arXiv:2203.07490 [pdf, other]

Repairing Regressors for Fair Binary Classification at Any Decision Threshold

Authors: Kweku Kwegyir-Aggrey, A. Feder Cooper, Jessica Dai, John Dickerson, Keegan Hines, Suresh Venkatasubramanian

Abstract: We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal m… ▽ More We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically, while also outperforming and generalizing similar techniques from related work. △ Less

Submitted 10 December, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2106.07756 [pdf, other]

Counterfactual Explanations for Machine Learning: Challenges Revisited

Authors: Sahil Verma, John Dickerson, Keegan Hines

Abstract: Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models. They provide ``what if'' feedback of the form ``if an input datapoint were $x'$ instead of $x$, then an ML model's output would be $y'$ instead of $y$.'' Counterfactual explainability for ML models has yet to see widespread adoption in industry. In this short paper,… ▽ More Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models. They provide ``what if'' feedback of the form ``if an input datapoint were $x'$ instead of $x$, then an ML model's output would be $y'$ instead of $y$.'' Counterfactual explainability for ML models has yet to see widespread adoption in industry. In this short paper, we posit reasons for this slow uptake. Leveraging recent work outlining desirable properties of CFEs and our experience running the ML wing of a model monitoring startup, we identify outstanding obstacles hindering CFE deployment in industry. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: Presented at CHI HCXAI 2021 workshop

arXiv:2106.03962 [pdf, other]

Amortized Generation of Sequential Algorithmic Recourses for Black-box Models

Authors: Sahil Verma, Keegan Hines, John P. Dickerson

Abstract: Explainable machine learning (ML) has gained traction in recent years due to the increasing adoption of ML-based systems in many sectors. Algorithmic Recourses (ARs) provide "what if" feedback of the form "if an input datapoint were x' instead of x, then an ML-based system's output would be y' instead of y." ARs are attractive due to their actionable feedback, amenability to existing legal framewo… ▽ More Explainable machine learning (ML) has gained traction in recent years due to the increasing adoption of ML-based systems in many sectors. Algorithmic Recourses (ARs) provide "what if" feedback of the form "if an input datapoint were x' instead of x, then an ML-based system's output would be y' instead of y." ARs are attractive due to their actionable feedback, amenability to existing legal frameworks, and fidelity to the underlying ML model. Yet, current AR approaches are single shot -- that is, they assume x can change to x' in a single time period. We propose a novel stochastic-control-based approach that generates sequential ARs, that is, ARs that allow x to move stochastically and sequentially across intermediate states to a final state x'. Our approach is model agnostic and black box. Furthermore, the calculation of ARs is amortized such that once trained, it applies to multiple datapoints without the need for re-optimization. In addition to these primary characteristics, our approach admits optional desiderata such as adherence to the data manifold, respect for causal relations, and sparsity -- identified by past research as desirable properties of ARs. We evaluate our approach using three real-world datasets and show successful generation of sequential ARs that respect other recourse desiderata. △ Less

Submitted 16 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted at AAAI 2022

arXiv:2010.10596 [pdf, other]

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Authors: Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan E. Hines, John P. Dickerson, Chirag Shah

Abstract: Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine learning based systems. A burgeoning body of research seeks to define the goals… ▽ More Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine learning based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability. △ Less

Submitted 15 November, 2022; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 23 pages (8 pages of references)

arXiv:2007.00843 [pdf, other]

doi 10.1109/MLSP49062.2020.9231894

Low-light Environment Neural Surveillance

Authors: Michael Potter, Henry Gridley, Noah Lichtenstein, Kevin Hines, John Nguyen, Jacob Walsh

Abstract: We design and implement an end-to-end system for real-time crime detection in low-light environments. Unlike Closed-Circuit Television, which performs reactively, the Low-Light Environment Neural Surveillance provides real time crime alerts. The system uses a low-light video feed processed in real-time by an optical-flow network, spatial and temporal networks, and a Support Vector Machine to ident… ▽ More We design and implement an end-to-end system for real-time crime detection in low-light environments. Unlike Closed-Circuit Television, which performs reactively, the Low-Light Environment Neural Surveillance provides real time crime alerts. The system uses a low-light video feed processed in real-time by an optical-flow network, spatial and temporal networks, and a Support Vector Machine to identify shootings, assaults, and thefts. We create a low-light action-recognition dataset, LENS-4, which will be publicly available. An IoT infrastructure set up via Amazon Web Services interprets messages from the local board hosting the camera for action recognition and parses the results in the cloud to relay messages. The system achieves 71.5% accuracy at 20 FPS. The user interface is a mobile app which allows local authorities to receive notifications and to view a video of the crime scene. Citizens have a public app which enables law enforcement to push crime alerts based on user proximity. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: Pre-print, accepted to IEEE International Workshop on Machine Learning for Signal Processing 2020 Conference Proceedings. Code and dataset are available at https://github.com/mcgridles/

ACM Class: I.4.9; I.5.4

arXiv:2006.10252 [pdf, other]

Quantifying Challenges in the Application of Graph Representation Learning

Authors: Antonia Gogoglou, C. Bayan Bruss, Brian Nguyen, Reza Sarshogh, Keegan E. Hines

Abstract: Graph Representation Learning (GRL) has experienced significant progress as a means to extract structural information in a meaningful way for subsequent learning tasks. Current approaches including shallow embeddings and Graph Neural Networks have mostly been tested with node classification and link prediction tasks. In this work, we provide an application oriented perspective to a set of popular… ▽ More Graph Representation Learning (GRL) has experienced significant progress as a means to extract structural information in a meaningful way for subsequent learning tasks. Current approaches including shallow embeddings and Graph Neural Networks have mostly been tested with node classification and link prediction tasks. In this work, we provide an application oriented perspective to a set of popular embedding approaches and evaluate their representational power with respect to real-world graph properties. We implement an extensive empirical data-driven framework to challenge existing norms regarding the expressive power of embedding approaches in graphs with varying patterns along with a theoretical analysis of the limitations we discovered in this process. Our results suggest that "one-to-fit-all" GRL approaches are hard to define in real-world scenarios and as new methods are being introduced they should be explicit about their ability to capture graph properties and their applicability in datasets with non-trivial structural differences. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:1910.03081 [pdf, ps, other]

On the Interpretability and Evaluation of Graph Representation Learning

Authors: Antonia Gogoglou, C. Bayan Bruss, Keegan E. Hines

Abstract: With the rising interest in graph representation learning, a variety of approaches have been proposed to effectively capture a graph's properties. While these approaches have improved performance in graph machine learning tasks compared to traditional graph techniques, they are still perceived as techniques with limited insight into the information encoded in these representations. In this work, w… ▽ More With the rising interest in graph representation learning, a variety of approaches have been proposed to effectively capture a graph's properties. While these approaches have improved performance in graph machine learning tasks compared to traditional graph techniques, they are still perceived as techniques with limited insight into the information encoded in these representations. In this work, we explore methods to interpret node embeddings and propose the creation of a robust evaluation framework for comparing graph representation learning algorithms and hyperparameters. We test our methods on graphs with different properties and investigate the relationship between embedding training parameters and the ability of the produced embedding to recover the structure of the original graph in a downstream task. △ Less

Submitted 7 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019 Graph Representation Learning workshop

arXiv:1908.05557 [pdf, other]

doi 10.1109/ICTAI.2019.00209

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Authors: Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, C. Bayan Bruss, Reza Farivar

Abstract: There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pr… ▽ More There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases. △ Less

Submitted 3 September, 2019; v1 submitted 15 August, 2019; originally announced August 2019.

arXiv:1907.07225 [pdf, other]

DeepTrax: Embedding Graphs of Financial Transactions

Authors: C. Bayan Bruss, Anish Khazane, Jonathan Rider, Richard Serpe, Antonia Gogoglou, Keegan E. Hines

Abstract: Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money. For financial institutions, such a graph is likely large (with millions or billions of edges) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. Graph representation learning seeks to embed the nodes of a… ▽ More Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money. For financial institutions, such a graph is likely large (with millions or billions of edges) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. Graph representation learning seeks to embed the nodes of a graph into a Euclidean vector space such that graph topological properties are preserved after the transformation. In this paper, we present a novel application of representation learning to bipartite graphs of credit card transactions in order to learn embeddings of account and merchant entities. Our framework is inspired by popular approaches in graph embeddings and is trained on two internal transaction datasets. This approach yields highly effective embeddings, as quantified by link prediction AUC and F1 score. Further, the resulting entity vectors retain intuitive semantic similarity that is explored through visualizations and other qualitative analyses. Finally, we show how these embeddings can be used as features in downstream machine learning business applications such as fraud detection. △ Less

Submitted 16 July, 2019; originally announced July 2019.

arXiv:1907.01705 [pdf, other]

Graph Embeddings at Scale

Authors: C. Bayan Bruss, Anish Khazane, Jonathan Rider, Richard Serpe, Saurabh Nagrecha, Keegan E. Hines

Abstract: Graph embedding is a popular algorithmic approach for creating vector representations for individual vertices in networks. Training these algorithms at scale is important for creating embeddings that can be used for classification, ranking, recommendation and other common applications in industry. While industrial systems exist for training graph embeddings on large datasets, many of these distrib… ▽ More Graph embedding is a popular algorithmic approach for creating vector representations for individual vertices in networks. Training these algorithms at scale is important for creating embeddings that can be used for classification, ranking, recommendation and other common applications in industry. While industrial systems exist for training graph embeddings on large datasets, many of these distributed architectures are forced to partition copious amounts of data and model logic across many worker nodes. In this paper, we propose a distributed infrastructure that completely avoids graph partitioning, dynamically creates size constrained computational graphs across worker nodes, and uses highly efficient indexing operations for updating embeddings that allow the system to function at scale. We show that our system can scale an existing embeddings algorithm - skip-gram - to train on the open-source Friendster network (68 million vertices) and on an internal heterogeneous graph (50 million vertices). We measure the performance of our system on two key quantitative metrics: link-prediction accuracy and rate of convergence. We conclude this work by analyzing how a greater number of worker nodes actually improves our system's performance on the aforementioned metrics and discuss our next steps for rigorously evaluating the embedding vectors produced by our system. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: Workshop on Mining and Learning with Graphs 2019

arXiv:1906.09266 [pdf, other]

A Multitask Network for Localization and Recognition of Text in Images

Authors: Mohammad Reza Sarshogh, Keegan E. Hines

Abstract: We present an end-to-end trainable multi-task network that addresses the problem of lexicon-free text extraction from complex documents. This network simultaneously solves the problems of text localization and text recognition and text segments are identified with no post-processing, crop**, or word grou**. A convolutional backbone and Feature Pyramid Network are combined to provide a shared r… ▽ More We present an end-to-end trainable multi-task network that addresses the problem of lexicon-free text extraction from complex documents. This network simultaneously solves the problems of text localization and text recognition and text segments are identified with no post-processing, crop**, or word grou**. A convolutional backbone and Feature Pyramid Network are combined to provide a shared representation that benefits each of three model heads: text localization, classification, and text recognition. To improve recognition accuracy, we describe a dynamic pooling mechanism that retains high-resolution information across all RoIs. For text recognition, we propose a convolutional mechanism with attention which out-performs more common recurrent architectures. Our model is evaluated against benchmark datasets and comparable methods and achieves high performance in challenging regimes of non-traditional OCR. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Comments: ICDAR 2019

arXiv:1808.10742 [pdf, other]

Anomaly Detection in Cyber Network Data Using a Cyber Language Approach

Authors: Bartley D. Richardson, Benjamin J. Radford, Shawn E. Davis, Keegan Hines, David Pekarek

Abstract: As the amount of cyber data continues to grow, cyber network defenders are faced with increasing amounts of data they must analyze to ensure the security of their networks. In addition, new types of attacks are constantly being created and executed globally. Current rules-based approaches are effective at characterizing and flagging known attacks, but they typically fail when presented with a new… ▽ More As the amount of cyber data continues to grow, cyber network defenders are faced with increasing amounts of data they must analyze to ensure the security of their networks. In addition, new types of attacks are constantly being created and executed globally. Current rules-based approaches are effective at characterizing and flagging known attacks, but they typically fail when presented with a new attack or new types of data. By comparison, unsupervised machine learning offers distinct advantages by not requiring labeled data to learn from large amounts of network traffic. In this paper, we present a natural language-based technique (suffix trees) as applied to cyber anomaly detection. We illustrate one methodology to generate a language using cyber data features, and our experimental results illustrate positive preliminary results in applying this technique to flow-type data. As an underlying assumption to this work, we make the claim that malicious cyber actors leave observables in the data as they execute their attacks. This work seeks to identify those artifacts and exploit them to identify a wide range of cyber attacks without the need for labeled ground-truth data. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Showing 1–15 of 15 results for author: Hines, K