1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Jia, Ang; Fan, Ming; **, Wuxia; Xu, ** is a more complex problem of "1-to-n" or even "n-to-n" due to the existence of function inlining. In this paper, we investigate the effect of function inlining on binary similarity analysis. We first construct 4 inlining-oriented datasets for four similarity analysis tasks, including code search, OSS reuse detection, vulnerability detection, and patch presence test. Then, we further study the extent of function inlining, the performance of existing works under function inlining, and the effectiveness of existing inlining-simulation strategies. Results show that the proportion of function inlining can reach nearly 70%, while most existing works neglect it and use "1-to-1" mechanism. The mismatches cause a 30% loss in performance during code search and a 40% loss during vulnerability detection. Moreover, two existing inlining-simulation strategies can only recover 60% of the inlined functions. We discover that inlining is usually cumulative when optimization increases. Conditional inlining and incremental inlining are suggested to design low-cost and high-coverage inlining-simulation strategies.

Computer Science > Software Engineering

arXiv:2112.12928 (cs)

[Submitted on 24 Dec 2021 (v1), last revised 5 May 2022 (this version, v2)]

Title:1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Authors:Ang Jia, Ming Fan, Wuxia **, Xi Xu, Zhaohui Zhou, Qiyi Tang, Sen Nie, Shi Wu, Ting Liu

View PDF

Abstract:Binary similarity analysis is critical to many code-reuse-related issues and "1-to-1" mechanism is widely applied, where one function in a binary file is matched against one function in a source file or binary file. However, we discover that function map** is a more complex problem of "1-to-n" or even "n-to-n" due to the existence of function inlining.
In this paper, we investigate the effect of function inlining on binary similarity analysis. We first construct 4 inlining-oriented datasets for four similarity analysis tasks, including code search, OSS reuse detection, vulnerability detection, and patch presence test. Then, we further study the extent of function inlining, the performance of existing works under function inlining, and the effectiveness of existing inlining-simulation strategies. Results show that the proportion of function inlining can reach nearly 70%, while most existing works neglect it and use "1-to-1" mechanism. The mismatches cause a 30% loss in performance during code search and a 40% loss during vulnerability detection. Moreover, two existing inlining-simulation strategies can only recover 60% of the inlined functions. We discover that inlining is usually cumulative when optimization increases. Conditional inlining and incremental inlining are suggested to design low-cost and high-coverage inlining-simulation strategies.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2112.12928 [cs.SE]
	(or arXiv:2112.12928v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2112.12928

Submission history

From: Ang Jia [view email]
[v1] Fri, 24 Dec 2021 03:37:19 UTC (1,179 KB)
[v2] Thu, 5 May 2022 13:33:17 UTC (1,126 KB)

Computer Science > Software Engineering

Title:1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators