Skip to main content

Showing 1–9 of 9 results for author: Wies, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.16332  [pdf, other

    cs.CL cs.AI

    Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering

    Authors: Yotam Wolf, Noam Wies, Dorin Shteyman, Binyamin Rothberg, Yoav Levine, Amnon Shashua

    Abstract: Language model alignment has become an important component of AI safety, allowing safe interactions between humans and language models, by enhancing desired behaviors and inhibiting undesired ones. It is often done by tuning the model or inserting preset aligning prompts. Recently, representation engineering, a method which alters the model's behavior via changing its representations post-training… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  2. arXiv:2307.01715  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

    Authors: Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein

    Abstract: Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments f… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: ICLR 2024

  3. arXiv:2304.11082  [pdf, other

    cs.CL cs.AI

    Fundamental Limitations of Alignment in Large Language Models

    Authors: Yotam Wolf, Noam Wies, Oshri Avnery, Yoav Levine, Amnon Shashua

    Abstract: An important aspect in develo** language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which… ▽ More

    Submitted 3 June, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

  4. arXiv:2303.07895  [pdf, ps, other

    cs.CL

    The Learnability of In-Context Learning

    Authors: Noam Wies, Yoav Levine, Amnon Shashua

    Abstract: In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  5. arXiv:2204.02892  [pdf, other

    cs.CL cs.LG

    Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

    Authors: Noam Wies, Yoav Levine, Amnon Shashua

    Abstract: The field of Natural Language Processing has experienced a dramatic leap in capabilities with the recent introduction of huge Language Models. Despite this success, natural language problems that involve several compounded steps are still practically unlearnable, even by the largest LMs. This complies with experimental failures for end-to-end learning of composite problems that were demonstrated i… ▽ More

    Submitted 15 February, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: ICLR 2023

  6. arXiv:2110.04541  [pdf, other

    cs.CL cs.LG

    The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

    Authors: Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon Shashua

    Abstract: Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can… ▽ More

    Submitted 21 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  7. arXiv:2105.03928  [pdf, other

    cs.LG cs.CL

    Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

    Authors: Noam Wies, Yoav Levine, Daniel Jannai, Amnon Shashua

    Abstract: After their successful debut in natural language processing, Transformer architectures are now becoming the de-facto standard in many domains. An obstacle for their deployment over new modalities is the architectural configuration: the optimal depth-to-width ratio has been shown to dramatically vary across data types (e.g., $10$x larger over images than over language). We theoretically predict the… ▽ More

    Submitted 9 June, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: ICML 2021

  8. arXiv:2006.12467  [pdf, other

    cs.LG cs.CL stat.ML

    The Depth-to-Width Interplay in Self-Attention

    Authors: Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua

    Abstract: Self-attention architectures, which are rapidly pushing the frontier in natural language processing, demonstrate a surprising depth-inefficient behavior: previous works indicate that increasing the internal representation (network width) is just as useful as increasing the number of self-attention layers (network depth). We theoretically predict a width-dependent transition between depth-efficienc… ▽ More

    Submitted 17 January, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  9. arXiv:1902.04057  [pdf, other

    cond-mat.dis-nn cond-mat.str-el cs.LG

    Deep autoregressive models for the efficient variational simulation of many-body quantum systems

    Authors: Or Sharir, Yoav Levine, Noam Wies, Giuseppe Carleo, Amnon Shashua

    Abstract: Artificial Neural Networks were recently shown to be an efficient representation of highly-entangled many-body quantum states. In practical applications, neural-network states inherit numerical schemes used in Variational Monte Carlo, most notably the use of Markov-Chain Monte-Carlo (MCMC) sampling to estimate quantum expectations. The local stochastic sampling in MCMC caps the potential advantage… ▽ More

    Submitted 19 January, 2020; v1 submitted 11 February, 2019; originally announced February 2019.

    Journal ref: Phys. Rev. Lett. 124, 020503 (2020)