Skip to main content

Showing 1–4 of 4 results for author: Orlanski, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.01973  [pdf, other

    cs.LG cs.CL cs.PL

    Measuring The Impact Of Programming Language Distribution

    Authors: Gabriel Orlanski, Kefan Xiao, Xavier Garcia, Jeffrey Hui, Joshua Howland, Jonathan Malmaud, Jacob Austin, Rishabh Singh, Michele Catasta

    Abstract: Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual… ▽ More

    Submitted 24 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Accepted to ICML 2023, Code and data release: https://github.com/google-research/babelcode

  2. arXiv:2211.07842  [pdf, other

    cs.LG cs.AI cs.CL cs.SE

    Evaluating How Fine-tuning on Bimodal Data Effects Code Generation

    Authors: Gabriel Orlanski, Seonhye Yang, Michael Healy

    Abstract: Despite the increase in popularity of language models for code generation, it is still unknown how training on bimodal coding forums affects a model's code generation performance and reliability. We, therefore, collect a dataset of over 2.2M StackOverflow questions with answers for finetuning. These fine-tuned models have average $pass@k$ improvements of 54.64% and 85.35% on the HumanEval (Chen et… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 4 pages, 4 figures

  3. arXiv:2203.15754  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

    Authors: Gabriel Orlanski

    Abstract: Large language models have shown that impressive zero-shot performance can be achieved through natural language prompts (Radford et al., 2019; Brown et al., 2020; Sanh et al., 2021). Creating an effective prompt, however, requires significant trial and error. That \textit{prompts} the question: how do the qualities of a prompt effects its performance? To this end, we collect and standardize prompt… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 4 pages, 4 figures

  4. arXiv:2106.04447  [pdf, other

    cs.CL

    Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

    Authors: Gabriel Orlanski, Alex Gittens

    Abstract: Answering a programming question using only its title is difficult as salient contextual information is omitted. Based on this observation, we present a corpus of over 40,000 StackOverflow question texts to be used in conjunction with their corresponding intents from the CoNaLa dataset (Yin et al., 2018). Using both the intent and question body, we use BART to establish a baseline BLEU score of 34… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: To be published in ACL-IJCNLP NLP4Prog workshop. (The First Workshop on Natural Language Processing for Programming)