Search | arXiv e-print repository

RepoQA: Evaluating Long Context Code Understanding

Authors: Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, Lingming Zhang

Abstract: Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we in… ▽ More Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we initiate the RepoQA benchmark to evaluate LLMs on long-context code understanding. Traditional needle testers ask LLMs to directly retrieve the answer from the context without necessary deep understanding. In RepoQA, we built our initial task, namely Searching Needle Function (SNF), which exercises LLMs to search functions given their natural-language description, i.e., LLMs cannot find the desired function if they cannot understand the description and code. RepoQA is multilingual and comprehensive: it includes 500 code search tasks gathered from 50 popular repositories across 5 modern programming languages. By evaluating 26 general and code-specific LLMs on RepoQA, we show (i) there is still a small gap between the best open and proprietary models; (ii) different models are good at different languages; and (iii) models may understand code better without comments. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2308.04748 [pdf, other]

Fuzz4All: Universal Fuzzing with Large Language Models

Authors: Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, Lingming Zhang

Abstract: Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuz… ▽ More Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 98 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 64 bugs already confirmed by developers as previously unknown. △ Less

Submitted 15 January, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted at ICSE 2024

arXiv:2107.09981 [pdf, other]

doi 10.1088/1674-1137/ac29a3

Potential energy surfaces and fission fragment mass yields of even-even superheavy nuclei

Authors: P. V. Kostryukov, A. Dobrowolski, B. Nerlo-Pomorska, M. Warda, Z. G. Xiao, Y. J. Chen, L. L. Liu, J. L. Tian, K. Pomorski

Abstract: Potential energy surfaces and fission barriers of superheavy nuclei are analyzed in the macroscopic-microscopic model. The Lublin-Strasbourg Drop (LSD) is used to obtain the macroscopic part of the energy, whereas the shell and pairing energy corrections are evaluated using the Yukawa-folded potential. A standard flooding technique has been used to determine the barrier heights. It was shown the F… ▽ More Potential energy surfaces and fission barriers of superheavy nuclei are analyzed in the macroscopic-microscopic model. The Lublin-Strasbourg Drop (LSD) is used to obtain the macroscopic part of the energy, whereas the shell and pairing energy corrections are evaluated using the Yukawa-folded potential. A standard flooding technique has been used to determine the barrier heights. It was shown the Fourier shape parametrization containing only three deformation parameters reproduces well the nuclear shapes of nuclei on their way to fission. In addition, the non-axial degree of freedom is taken into account to describe better the form of nuclei around the ground state and in the saddles region. Apart from the symmetric fission valley, a new very asymmetric fission mode is predicted in most superheavy nuclei. The fission fragment mass distributions of considered nuclei are obtained by solving the 3D Langevin equations. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: 20 pages, 17 figures

arXiv:1504.00455 [pdf, ps, other]

doi 10.1103/PhysRevC.91.044308

Mass dependence of symmetry energy coefficients in Skyrme force

Authors: N. Wang, M. Liu, H. Jiang, J. L. Tian, Y. M. Zhao

Abstract: Based on the semi-classical extended Thomas-Fermi approach, we study the mass dependence of the symmetry energy coefficients of finite nuclei for 36 different Skyrme forces. The reference densities of both light and heavy nuclei are obtained. Eight models based on nuclear liquid drop concept and the Skyrme force SkM* suggest the symmetry energy coefficient $a_{\rm sym}=22.90 \pm 0.15 $ MeV at… ▽ More Based on the semi-classical extended Thomas-Fermi approach, we study the mass dependence of the symmetry energy coefficients of finite nuclei for 36 different Skyrme forces. The reference densities of both light and heavy nuclei are obtained. Eight models based on nuclear liquid drop concept and the Skyrme force SkM* suggest the symmetry energy coefficient $a_{\rm sym}=22.90 \pm 0.15 $ MeV at $A=260$, and the corresponding reference density is $ρ_A\simeq 0.1$ fm$^{-3}$ at this mass region. The standard Skyrme energy density functionals give negative values for the coefficient of the $I^4$ term in the binding energy formula, whereas the latest Weizsäcker-Skyrme formula and the experimental data suggest positive values for the coefficient. △ Less

Submitted 2 April, 2015; originally announced April 2015.

Comments: 6 figures, accepted for publication in Phys. Rev. C

Journal ref: Phys.Rev. C91: 044308, 2015

arXiv:1107.4835 [pdf, ps, other]

Topology of Entanglement in Multipartite States with Translational Invariance

Authors: H. T. Cui, J. L. Tian, C. M. Wang, Y. C. Chen

Abstract: The topology of entanglement in multipartite states with translational invariance is discussed in this article. Two global features are foundby which one can distinguish distinct states. These are the cyclic unit and the quantised geometric phase. Furthermore the topology is indicated by the fractional spin. Finally a scheme is presented for preparation of these types of states in spin chain syste… ▽ More The topology of entanglement in multipartite states with translational invariance is discussed in this article. Two global features are foundby which one can distinguish distinct states. These are the cyclic unit and the quantised geometric phase. Furthermore the topology is indicated by the fractional spin. Finally a scheme is presented for preparation of these types of states in spin chain systems, in which the degeneracy of the energy levels characterises the robustness of the states with translational invariance. △ Less

Submitted 17 June, 2013; v1 submitted 24 July, 2011; originally announced July 2011.

Comments: major revision. accepted by EPJD

arXiv:1008.3486 [pdf, ps, other]

doi 10.1103/PhysRevA.82.062116

Maximal Overlap with a Fully Separable State and Translational Invariance in Multipartite Entangled States

Authors: H. T. Cui, Di Yuan, J. L. Tian

Abstract: The maximal overlap with the fully separable state for the multipartite entangled pure state with translational invariance is studied explicitly by some exact and numerical evaluations, focusing on the one-dimensional qubit system and some representative types of translational invariance. The results show that the translational invariance of the multipartite state could have an intrinsic effect on… ▽ More The maximal overlap with the fully separable state for the multipartite entangled pure state with translational invariance is studied explicitly by some exact and numerical evaluations, focusing on the one-dimensional qubit system and some representative types of translational invariance. The results show that the translational invariance of the multipartite state could have an intrinsic effect on the determinations of the maximal overlap and the nearest fully separable state for multipartite entangled states. Furthermore a hierarchy of the basic entangled states with translational invariance is founded, from which one could readily find the maximal overlap and a related fully separable state for the multipartite state composed of different translational invariance structures. △ Less

Submitted 4 January, 2011; v1 submitted 20 August, 2010; originally announced August 2010.

Comments: published version. comments welcome

Journal ref: PhysRevA_82_062116 (2010)

Showing 1–6 of 6 results for author: Tian, J L