Skip to main content

Showing 1–16 of 16 results for author: Morris, J X

.
  1. arXiv:2407.08970  [pdf, other

    cs.CR cs.AI cs.LG

    Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

    Authors: Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasaryan, Vitaly Shmatikov

    Abstract: We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2406.05087  [pdf, other

    cs.IR

    Corpus Poisoning via Approximate Greedy Gradient Descent

    Authors: **yan Su, John X. Morris, Preslav Nakov, Claire Cardie

    Abstract: Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks in which a malicious user injects a small fraction of adversarial passages into the retrieval corpu… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2405.16714  [pdf, other

    cs.CL cs.AI cs.LG q-bio.NC

    Crafting Interpretable Embeddings by Asking LLMs Questions

    Authors: Vinamra Benara, Chandan Singh, John X. Morris, Richard Antonello, Ion Stoica, Alexander G. Huth, Jianfeng Gao

    Abstract: Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb),… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2405.15012  [pdf, other

    cs.CL cs.LG

    Extracting Prompts by Inverting LLM Outputs

    Authors: Collin Zhang, John X. Morris, Vitaly Shmatikov

    Abstract: We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. We develop a new black-box method, output2prompt, that learns to extract prompts without access to the model's logits and without adversarial or jailbreaking queries. In contrast to previous work, output2prompt only needs outputs of normal user queries… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2404.00859  [pdf, other

    cs.LG cs.CL

    Do language models plan ahead for future tokens?

    Authors: Wilson Wu, John X. Morris, Lionel Levine

    Abstract: Do transformers "think ahead" during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at $t$ that is then used in future forward passes $t+τ$. We posit two explanations for this phenomenon: pre-caching, in which off-diagonal gradient terms present in training result in the model computing features at $t$ irrelevant to the present… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  6. arXiv:2402.01613  [pdf, other

    cs.CL cs.AI

    Nomic Embed: Training a Reproducible Long Context Text Embedder

    Authors: Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar

    Abstract: This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short and long-context tasks. We release the training code and model weights under an Apache 2 license. In contrast with other open-source m… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  7. arXiv:2311.13647  [pdf, other

    cs.CL cs.LG

    Language Model Inversion

    Authors: John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush

    Abstract: Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompt… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  8. arXiv:2310.14034  [pdf, other

    cs.CL cs.LG

    Tree Prompting: Efficient Task Adaptation without Fine-Tuning

    Authors: John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng

    Abstract: Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds a decision tree of prompts, linking multiple LM calls together to solve a task. At inference time, each call to the LM is determined by efficiently routing the o… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Both first authors contributed equally; accepted to EMNLP 2023

  9. arXiv:2310.06816  [pdf, other

    cs.CL cs.LG

    Text Embeddings Reveal (Almost) As Much As Text

    Authors: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush

    Abstract: How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a naïve model conditioned on the embedding… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  10. arXiv:2210.11528  [pdf, other

    cs.CL

    Unsupervised Text Deidentification

    Authors: John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush

    Abstract: Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that masks words that leak personally-identifying information. The approach utilizes a specially trained reidentification model to identify individuals from redacted p… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  11. arXiv:2210.01848  [pdf, other

    cs.LG cs.AI cs.CL q-bio.NC stat.ML

    Explaining Patterns in Data with Language Models via Interpretable Autoprompting

    Authors: Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao

    Abstract: Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explainin… ▽ More

    Submitted 26 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: The two first authors contributed equally

  12. arXiv:2010.01770  [pdf, other

    cs.CL

    Second-Order NLP Adversarial Examples

    Authors: John X. Morris

    Abstract: Adversarial example generation methods in NLP rely on models like language models or sentence encoders to determine if potential adversarial examples are valid. In these methods, a valid adversarial example fools the model being attacked, and is determined to be semantically or syntactically valid by a second model. Research to date has counted all such examples as errors by the attacked model. We… ▽ More

    Submitted 5 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: 8 pages

  13. arXiv:2010.01724  [pdf, other

    cs.SE

    TextAttack: Lessons learned in designing Python frameworks for NLP

    Authors: John X. Morris, ** Yong Yoo, Yanjun Qi

    Abstract: TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models. To build such an open-source N… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Comments: 4 pages

  14. arXiv:2009.06368  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

    Authors: ** Yong Yoo, John X. Morris, Eli Lifland, Yanjun Qi

    Abstract: We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. W… ▽ More

    Submitted 12 October, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 14 pages, 5 figures, 4 tables; Accepted by EMNLP BlackBox NLP Workshop 2020 @ https://blackboxnlp.github.io/cfp.html

  15. arXiv:2005.05909  [pdf, other

    cs.CL cs.AI cs.LG

    TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

    Authors: John X. Morris, Eli Lifland, ** Yong Yoo, Jake Grigsby, Di **, Yanjun Qi

    Abstract: While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from… ▽ More

    Submitted 4 October, 2020; v1 submitted 29 April, 2020; originally announced May 2020.

    Comments: 6 pages. More details are shared at https://github.com/QData/TextAttack

  16. Reevaluating Adversarial Examples in Natural Language

    Authors: John X. Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, Yanjun Qi

    Abstract: State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the model and follows some linguistic constraints. We then analyze the outputs of two state-of-the-art synonym substitution attacks. We find that their pert… ▽ More

    Submitted 21 December, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: 15 pages; 9 Tables; 5 Figures

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2020