Skip to main content

Showing 1–2 of 2 results for author: Bashkansky, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.10962  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

    Authors: Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda ViƩgas, Hanspeter Pfister, Martin Wattenberg

    Abstract: System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating… ▽ More

    Submitted 1 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Code: https://github.com/likenneth/persona_drift

  2. arXiv:2312.03096  [pdf, other

    cs.LG cs.AI cs.NE

    What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes

    Authors: Victor Lecomte, Kushal Thaman, Rylan Schaeffer, Naomi Bashkansky, Trevor Chow, Sanmi Koyejo

    Abstract: Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with implications for AI safety. The classic origin story of polysemanticity is that the data contains more ``features" than neurons, such that learning to perform a task forces the network to co-allocate multiple unrela… ▽ More

    Submitted 13 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.