Skip to main content

Showing 1–5 of 5 results for author: Ildız, M E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08081  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Mechanics of Next Token Prediction with Self-Attention

    Authors: Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

    Abstract: Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success is the self-attention mechanism. In this work, we ask: $\textit{What}$ $\textit{does}$ $\textit{a}$ $\textit{single}$ $\textit{self-attention}$… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to AISTATS 2024

  2. arXiv:2402.13512  [pdf, other

    cs.LG cs.AI cs.CL

    From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

    Authors: M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

    Abstract: Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data sampled from the model. We first establish a precise map** between the self-attention mechanism and Markov models: Inputting a prompt to the model… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 30 pages

  3. arXiv:2301.07067  [pdf, other

    cs.LG cs.CL stat.ML

    Transformers as Algorithms: Generalization and Stability in In-context Learning

    Authors: Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

    Abstract: In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through t… ▽ More

    Submitted 6 February, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: Revised version significantly improves the stability guarantees and provides new experiments

  4. arXiv:2111.02309  [pdf, other

    cs.IT cs.NI

    Pull or Wait: How to Optimize Query Age of Information

    Authors: M. Emrullah Ildiz, Orhan T. Yavascan, Elif Uysal, O. Tugberk Kartal

    Abstract: We study a pull-based status update communication model where a source node submits update packets to a channel with random transmission delay, at times requested by a remote destination node. The objective is to minimize the average query-age-of-information (QAoI), defined as the average age-of-information (AoI) measured at query instants that occur at the destination side according to a stochast… ▽ More

    Submitted 4 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

  5. arXiv:1805.10704  [pdf

    cs.CV

    Synergistic Reconstruction and Synthesis via Generative Adversarial Networks for Accelerated Multi-Contrast MRI

    Authors: Salman Ul Hassan Dar, Mahmut Yurt, Mohammad Shahdloo, Muhammed Emrullah Ildız, Tolga Çukur

    Abstract: Multi-contrast MRI acquisitions of an anatomy enrich the magnitude of information available for diagnosis. Yet, excessive scan times associated with additional contrasts may be a limiting factor. Two mainstream approaches for enhanced scan efficiency are reconstruction of undersampled acquisitions and synthesis of missing acquisitions. In reconstruction, performance decreases towards higher accele… ▽ More

    Submitted 27 May, 2018; originally announced May 2018.