-
Resource Leak Checker (RLC#) for C# Code using CodeQL
Authors:
Pritam Gharat,
Narges Shadab,
Shrey Tiwari,
Shuvendu Lahiri,
Akash Lal
Abstract:
Resource leaks occur when a program fails to release a finite resource after it is no longer needed. These leaks are a significant cause of real-world crashes and performance issues. Given their critical impact on software performance and security, detecting and preventing resource leaks is a crucial problem.
Recent research has proposed a specify-and-check approach to prevent resource leaks. In…
▽ More
Resource leaks occur when a program fails to release a finite resource after it is no longer needed. These leaks are a significant cause of real-world crashes and performance issues. Given their critical impact on software performance and security, detecting and preventing resource leaks is a crucial problem.
Recent research has proposed a specify-and-check approach to prevent resource leaks. In this approach, programmers write resource management specifications that guide how resources are stored, passed around, and released within an application. We have developed a tool called RLC#, for detecting resource leaks in C# code. Inspired by the Resource Leak Checker (RLC) from the Checker Framework, RLC# employs CodeQL for intraprocedural data flow analysis. The tool operates in a modular fashion and relies on resource management specifications integrated at method boundaries for interprocedural analysis.
In practice, RLC# has successfully identified 24 resource leaks in open-source projects and internal proprietary Azure microservices. Its implementation is declarative, and it scales well. While it incurs a reasonable false positive rate, the burden on developers is minimal, involving the addition of specifications to the source code.
△ Less
Submitted 5 December, 2023; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Inference of Resource Management Specifications
Authors:
Narges Shadab,
Pritam Gharat,
Shrey Tiwari,
Michael D. Ernst,
Martin Kellogg,
Shuvendu Lahiri,
Akash Lal,
Manu Sridharan
Abstract:
A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around…
▽ More
A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around, and releases them; it also tracks the ownership relationship between objects and resources, and aliasing relationships between objects. While this specify-and-verify approach has several advantages compared to prior techniques, the need to manually write annotations presents a significant barrier to its practical adoption.
This paper presents a novel technique to automatically infer a resource management specification for a program, broadening the applicability of specify-and-check verification for resource leaks. Inference in this domain is challenging because resource management specifications differ significantly in nature from the types that most inference techniques target. Further, for practical effectiveness, we desire a technique that can infer the resource management specification intended by the developer, even in cases when the code does not fully adhere to that specification. We address these challenges through a set of inference rules carefully designed to capture real-world coding patterns, yielding an effective fixed-point-based inference algorithm.
We have implemented our inference algorithm in two different systems, targeting programs written in Java and C#. In an experimental evaluation, our technique inferred 85.5% of the annotations that programmers had written manually for the benchmarks. Further, the verifier issued nearly the same rate of false alarms with the manually-written and automatically-inferred annotations.
△ Less
Submitted 21 September, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Authors:
Pritam M. Gharat,
Uday P. Khedker,
Alan Mycroft
Abstract:
Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as mem…
▽ More
Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations.
Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs.
△ Less
Submitted 28 January, 2018;
originally announced January 2018.
-
Flow- and Context-Sensitive Points-to Analysis using Generalized Points-to Graphs
Authors:
Pritam M. Gharat,
Uday P. Khedker,
Alan Mycroft
Abstract:
Computing precise (fully flow-sensitive and context-sensitive) and exhaustive points-to information is computationally expensive. Many practical tools approximate the points-to information trading precision for efficiency. This has adverse impact on computationally intensive analyses such as model checking. Past explorations in top-down approaches of fully flow- and context-sensitive points-to ana…
▽ More
Computing precise (fully flow-sensitive and context-sensitive) and exhaustive points-to information is computationally expensive. Many practical tools approximate the points-to information trading precision for efficiency. This has adverse impact on computationally intensive analyses such as model checking. Past explorations in top-down approaches of fully flow- and context-sensitive points-to analysis (FCPA) have not scaled. We explore the alternative of bottom-up interprocedural approach which constructs summary flow functions for procedures to represent the effect of their calls. This approach has been effectively used for many analyses. However, it is computationally expensive for FCPA which requires modelling unknown locations accessed indirectly through pointers. Such accesses are commonly handled by using placeholders to explicate unknown locations or by using multiple call-specific summary flow functions.
We generalize the concept of points-to relations by using the counts of indirection levels leaving the unknown locations implicit. This allows us to create summary flow functions in the form of generalized points-to graphs (GPGs) without the need of placeholders. By design, GPGs represent both memory (in terms of classical points-to facts) and memory transformers (in terms of generalized points-to facts). We perform FCPA by progressively reducing generalized points-to facts to classical points-to facts. GPGs distinguish between may and must pointer updates thereby facilitating strong updates within calling contexts.
The size of GPG is linearly bounded by the number of variables and is independent of the number of statements in the procedure. Empirical measurements on SPEC benchmarks show that GPGs are indeed compact in spite of large procedure sizes. This allows us to scale FCPA to 158 kLoC using GPGs (compared to 35 kLoC reported by liveness-based FCPA).
△ Less
Submitted 5 August, 2016; v1 submitted 31 March, 2016;
originally announced March 2016.