-
RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion
Authors:
Huy N. Phan,
Hoang N. Phan,
Tien N. Nguyen,
Nghi D. Q. Bui
Abstract:
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed…
▽ More
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed to address the complex challenges associated with repository-level code completion. Central to \tool is the {\em Repo-level Semantic Graph} (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Furthermore, RepoHyper leverages \textit{Expand and Refine} retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets. Our evaluations show that \tool markedly outperforms existing techniques in repository-level code completion, showcasing enhanced accuracy across various datasets when compared to several strong baselines. Our implementation of RepoHyper can be found at~\url{https://github.com/FSoft-AI4Code/RepoHyper}.
△ Less
Submitted 16 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Rig Inversion by Training a Differentiable Rig Function
Authors:
Mathieu Marquis Bolduc,
Hau Nghiep Phan
Abstract:
Rig inversion is the problem of creating a method that can find the rig parameter vector that best approximates a given input mesh. In this paper we propose to solve this problem by first obtaining a differentiable rig function by training a multi layer perceptron to approximate the rig function. This differentiable rig function can then be used to train a deep learning model of rig inversion.
Rig inversion is the problem of creating a method that can find the rig parameter vector that best approximates a given input mesh. In this paper we propose to solve this problem by first obtaining a differentiable rig function by training a multi layer perceptron to approximate the rig function. This differentiable rig function can then be used to train a deep learning model of rig inversion.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
DPER: Dynamic Programming for Exist-Random Stochastic SAT
Authors:
Vu H. N. Phan,
Moshe Y. Vardi
Abstract:
In Bayesian inference, the maximum a posteriori (MAP) problem combines the most probable explanation (MPE) and marginalization (MAR) problems. The counterpart in propositional logic is the exist-random stochastic satisfiability (ER-SSAT) problem, which combines the satisfiability (SAT) and weighted model counting (WMC) problems. Both MAP and ER-SSAT have the form…
▽ More
In Bayesian inference, the maximum a posteriori (MAP) problem combines the most probable explanation (MPE) and marginalization (MAR) problems. The counterpart in propositional logic is the exist-random stochastic satisfiability (ER-SSAT) problem, which combines the satisfiability (SAT) and weighted model counting (WMC) problems. Both MAP and ER-SSAT have the form $\operatorname{argmax}_X \sum_Y f(X, Y)$, where $f$ is a real-valued function over disjoint sets $X$ and $Y$ of variables. These two optimization problems request a value assignment for the $X$ variables that maximizes the weighted sum of $f(X, Y)$ over all value assignments for the $Y$ variables. ER-SSAT has been shown to be a promising approach to formally verify fairness in supervised learning. Recently, dynamic programming on graded project-join trees has been proposed to solve weighted projected model counting (WPMC), a related problem that has the form $\sum_X \max_Y f(X, Y)$. We extend this WPMC framework to exactly solve ER-SSAT and implement a dynamic-programming solver named DPER. Our empirical evaluation indicates that DPER contributes to the portfolio of state-of-the-art ER-SSAT solvers (DC-SSAT and erSSAT) through competitive performance on low-width problem instances.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
DPO: Dynamic-Programming Optimization on Hybrid Constraints
Authors:
Vu H. N. Phan,
Moshe Y. Vardi
Abstract:
In Bayesian inference, the most probable explanation (MPE) problem requests a variable instantiation with the highest probability given some evidence. Since a Bayesian network can be encoded as a literal-weighted CNF formula $\varphi$, we study Boolean MPE, a more general problem that requests a model $τ$ of $\varphi$ with the highest weight, where the weight of $τ$ is the product of weights of li…
▽ More
In Bayesian inference, the most probable explanation (MPE) problem requests a variable instantiation with the highest probability given some evidence. Since a Bayesian network can be encoded as a literal-weighted CNF formula $\varphi$, we study Boolean MPE, a more general problem that requests a model $τ$ of $\varphi$ with the highest weight, where the weight of $τ$ is the product of weights of literals satisfied by $τ$. It is known that Boolean MPE can be solved via reduction to (weighted partial) MaxSAT. Recent work proposed DPMC, a dynamic-programming model counter that leverages graph-decomposition techniques to construct project-join trees. A project-join tree is an execution plan that specifies how to conjoin clauses and project out variables. We build on DPMC and introduce DPO, a dynamic-programming optimizer that exactly solves Boolean MPE. By using algebraic decision diagrams (ADDs) to represent pseudo-Boolean (PB) functions, DPO is able to handle disjunctive clauses as well as XOR clauses. (Cardinality constraints and PB constraints may also be compactly represented by ADDs, so one can further extend DPO's support for hybrid inputs.) To test the competitiveness of DPO, we generate random XOR-CNF formulas. On these hybrid benchmarks, DPO significantly outperforms MaxHS, UWrMaxSat, and GaussMaxHS, which are state-of-the-art exact solvers for MaxSAT.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
CDN-MEDAL: Two-stage Density and Difference Approximation Framework for Motion Analysis
Authors:
Synh Viet-Uyen Ha,
Cuong Tien Nguyen,
Hung Ngoc Phan,
Nhat Minh Chung,
Phuong Hoai Ha
Abstract:
Background modeling and subtraction is a promising research area with a variety of applications for video surveillance. Recent years have witnessed a proliferation of effective learning-based deep neural networks in this area. However, the techniques have only provided limited descriptions of scenes' properties while requiring heavy computations, as their single-valued map** functions are learne…
▽ More
Background modeling and subtraction is a promising research area with a variety of applications for video surveillance. Recent years have witnessed a proliferation of effective learning-based deep neural networks in this area. However, the techniques have only provided limited descriptions of scenes' properties while requiring heavy computations, as their single-valued map** functions are learned to approximate the temporal conditional averages of observed target backgrounds and foregrounds. On the other hand, statistical learning in imagery domains has been a prevalent approach with high adaptation to dynamic context transformation, notably using Gaussian Mixture Models (GMM) with its generalization capabilities. By leveraging both, we propose a novel method called CDN-MEDAL-net for background modeling and subtraction with two convolutional neural networks. The first architecture, CDN-GM, is grounded on an unsupervised GMM statistical learning strategy to describe observed scenes' salient features. The second one, MEDAL-net, implements a light-weighted pipeline of online video background subtraction. Our two-stage architecture is small, but it is very effective with rapid convergence to representations of intricate motion patterns. Our experiments show that the proposed approach is not only capable of effectively extracting regions of moving objects in unseen cases, but it is also very efficient.
△ Less
Submitted 21 September, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations
Authors:
Ha Q. Nguyen,
Khanh Lam,
Linh T. Le,
Hieu H. Pham,
Dat Q. Tran,
Dung B. Nguyen,
Dung D. Le,
Chi M. Pham,
Hang T. T. Tong,
Diep H. Dinh,
Cuong D. Do,
Luu T. Doan,
Cuong N. Nguyen,
Binh T. Nguyen,
Que V. Nguyen,
Au D. Hoang,
Hien N. Phan,
Anh T. Nguyen,
Phuong H. Ho,
Dat T. Ngo,
Nghia T. Nguyen,
Nhan T. Nguyen,
Minh Dao,
Van Vu
Abstract:
Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam…
▽ More
Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available (https://www.physionet.org/content/vindr-cxr/1.0.0/) in DICOM format along with the labels of both the training set and the test set.
△ Less
Submitted 20 March, 2022; v1 submitted 29 December, 2020;
originally announced December 2020.
-
DPMC: Weighted Model Counting by Dynamic Programming on Project-Join Trees
Authors:
Jeffrey M. Dudek,
Vu H. N. Phan,
Moshe Y. Vardi
Abstract:
We propose a unifying dynamic-programming framework to compute exact literal-weighted model counts of formulas in conjunctive normal form. At the center of our framework are project-join trees, which specify efficient project-join orders to apply additive projections (variable eliminations) and joins (clause multiplications). In this framework, model counting is performed in two phases. First, the…
▽ More
We propose a unifying dynamic-programming framework to compute exact literal-weighted model counts of formulas in conjunctive normal form. At the center of our framework are project-join trees, which specify efficient project-join orders to apply additive projections (variable eliminations) and joins (clause multiplications). In this framework, model counting is performed in two phases. First, the planning phase constructs a project-join tree from a formula. Second, the execution phase computes the model count of the formula, employing dynamic programming as guided by the project-join tree. We empirically evaluate various methods for the planning phase and compare constraint-satisfaction heuristics with tree-decomposition tools. We also investigate the performance of different data structures for the execution phase and compare algebraic decision diagrams with tensors. We show that our dynamic-programming model-counting framework DPMC is competitive with the state-of-the-art exact weighted model counters cachet, c2d, d4, and miniC2D.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
ADDMC: Weighted Model Counting with Algebraic Decision Diagrams
Authors:
Jeffrey M. Dudek,
Vu H. N. Phan,
Moshe Y. Vardi
Abstract:
We present an algorithm to compute exact literal-weighted model counts of Boolean formulas in Conjunctive Normal Form. Our algorithm employs dynamic programming and uses Algebraic Decision Diagrams as the primary data structure. We implement this technique in ADDMC, a new model counter. We empirically evaluate various heuristics that can be used with ADDMC. We then compare ADDMC to state-of-the-ar…
▽ More
We present an algorithm to compute exact literal-weighted model counts of Boolean formulas in Conjunctive Normal Form. Our algorithm employs dynamic programming and uses Algebraic Decision Diagrams as the primary data structure. We implement this technique in ADDMC, a new model counter. We empirically evaluate various heuristics that can be used with ADDMC. We then compare ADDMC to state-of-the-art exact weighted model counters (Cachet, c2d, d4, and miniC2D) on 1914 standard model counting benchmarks and show that ADDMC significantly improves the virtual best solver.
△ Less
Submitted 2 June, 2020; v1 submitted 11 July, 2019;
originally announced July 2019.
-
Fluid-structure interaction problems: An application to anchored and unanchored steel storage tanks subjected to seismic loadings
Authors:
Hoang Nam Phan,
Fabrizio Paolacci
Abstract:
The estimation of the degree of risk of steel storage tanks from an industrial or nuclear power plant during an earthquake event is a very difficult task since the dynamic behaviour of the tank-liquid system is highly complex. The seismic response of steel storage tanks is quite different from building structures not only due to hydrodynamic effects acting on the shell and bottom plate but also be…
▽ More
The estimation of the degree of risk of steel storage tanks from an industrial or nuclear power plant during an earthquake event is a very difficult task since the dynamic behaviour of the tank-liquid system is highly complex. The seismic response of steel storage tanks is quite different from building structures not only due to hydrodynamic effects acting on the shell and bottom plate but also because of many sources of nonlinear behaviour mechanisms. For the earthquake-resistant design of tanks, it is important to use a rational and reliable nonlinear dynamic analysis procedure, which is capable of capturing the seismic behaviour of the tanks under artificial or real earthquakes. The present paper deals with the nonlinear finite element modelling of steel storage tanks subjected to seismic loadings. The interaction effects of fluid and structure are modelled using a surface-based acoustic-structural interaction in the ABAQUS software. The successive contact and separation between the bottom plate and its rigid foundation caused by sliding and uplift are taken into account by a contact modelling approach. Reduced scales of both slender and broad cylindrical steel storage tanks from a shaking table campaign within the framework of the INDUSE-2-SAFETY project are selected for the study. The seismic responses of the tanks including the hydrodynamic pressure distribution acting on the shell plate, the elevation of the liquid free surface, and the uplift response of the bottom plate when the tank is unanchored are presented and compared with the experimental results. The responses obtained from the refined models are also compared with those obtained using a simplified approach, which is based on the lumped mass model of the liquid motion.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.