Skip to main content

Showing 1–3 of 3 results for author: Mudide, A

.
  1. arXiv:2406.08467  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    DafnyBench: A Benchmark for Formal Software Verification

    Authors: Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark

    Abstract: We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the Dafny formal verification engine to successfully verify over 750 programs with about 53,000 lines of code. The best model and prompting scheme achieved 68% succe… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code & dataset available at: https://github.com/sun-wendy/DafnyBench

  2. arXiv:2402.05110  [pdf, other

    cs.LG

    Opening the AI black box: program synthesis via mechanistic interpretability

    Authors: Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

    Abstract: We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 24 pages

  3. arXiv:2006.16797  [pdf, ps, other

    math.HO math.CO

    Confirming the Labels of Coins in One Weighing

    Authors: Isha Agarwal, Paul Braverman, Patrick Chen, William Du, Kaylee Ji, Akhil Kammila, Tanya Khovanova, Shane Lee, Alicia Li, Anish Mudide, Jeffrey Shi, Maya Smith, Isabel Tu

    Abstract: There are $n$ bags with coins that look the same. Each bag has an infinite number of coins and all coins in the same bag weigh the same amount. Coins in different bags weigh 1, 2, 3, and so on to $n$ grams exactly. There is a unique label from the set 1 through $n$ attached to each bag that is supposed to correspond to the weight of the coins in that bag. The task is to confirm all the labels by u… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

    Comments: 23 pages

    MSC Class: 05A17; 00A08