-
Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification
Authors:
Nanxing Meng,
Qizao Wang,
Bin Li,
Xiangyang Xue
Abstract:
With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply averag…
▽ More
With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into "sub-tracklets". Then, we cluster and further merge sub-tracklets using the self-supervised signal from tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
An empirical study of ChatGPT-3.5 on question answering and code maintenance
Authors:
Md Mahir Asef Kabir,
Sk Adnan Hassan,
Xiaoyin Wang,
Ying Wang,
Hai Yu,
Na Meng
Abstract:
Ever since the launch of ChatGPT in 2022, a rising concern is whether ChatGPT will replace programmers and kill jobs. Motivated by this widespread concern, we conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining. We reused a dataset introduced by prior work, which includes 130 StackOverflow (SO) discussion threads referre…
▽ More
Ever since the launch of ChatGPT in 2022, a rising concern is whether ChatGPT will replace programmers and kill jobs. Motivated by this widespread concern, we conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining. We reused a dataset introduced by prior work, which includes 130 StackOverflow (SO) discussion threads referred to by the Java developers of 357 GitHub projects. We mainly investigated three research questions (RQs). First, how does ChatGPT compare with programmers when answering technical questions? Second, how do developers perceive the differences between ChatGPT's answers and SO answers? Third, how does ChatGPT compare with humans when revising code for maintenance requests?
For RQ1, we provided the 130 SO questions to ChatGPT, and manually compared ChatGPT answers with the accepted/most popular SO answers in terms of relevance, readability, informativeness, comprehensiveness, and reusability. For RQ2, we conducted a user study with 30 developers, asking each developer to assess and compare 10 pairs of answers, without knowing the information source (i.e., ChatGPT or SO). For RQ3, we distilled 48 software maintenance tasks from 48 GitHub projects citing the studied SO threads. We queried ChatGPT to revise a given Java file, and to incorporate the code implementation for any prescribed maintenance requirement. Our study reveals interesting phenomena: For the majority of SO questions (97/130), ChatGPT provided better answers; in 203 of 300 ratings, developers preferred ChatGPT answers to SO answers; ChatGPT revised code correctly for 22 of the 48 tasks. Our research will expand people's knowledge of ChatGPT capabilities, and shed light on future adoption of ChatGPT by the software industry.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
How well does LLM generate security tests?
Authors:
Ying Zhang,
Wenjia Song,
Zhengjie Ji,
Danfeng,
Yao,
Na Meng
Abstract:
Developers often build software on top of third-party libraries (Libs) to improve programmer productivity and software quality. The libraries may contain vulnerabilities exploitable by hackers to attack the applications (Apps) built on top of them. People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022. People created tools to mitigate such…
▽ More
Developers often build software on top of third-party libraries (Libs) to improve programmer productivity and software quality. The libraries may contain vulnerabilities exploitable by hackers to attack the applications (Apps) built on top of them. People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022. People created tools to mitigate such attacks, by scanning the library dependencies of Apps, identifying the usage of vulnerable library versions, and suggesting secure alternatives to vulnerable dependencies. However, recent studies show that many developers do not trust the reports by these tools; they ask for code or evidence to demonstrate how library vulnerabilities lead to security exploits, in order to assess vulnerability severity and modification necessity. Unfortunately, manually crafting demos of application-specific attacks is challenging and time-consuming, and there is insufficient tool support to automate that procedure.
In this study, we used ChatGPT-4.0 to generate security tests, and to demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps. We explored various prompt styles/templates, and found that ChatGPT-4.0 generated tests for all 55 Apps, demonstrating 24 attacks successfully. It outperformed two state-of-the-art security test generators -- TRANSFER and SIEGE -- by generating a lot more tests and achieving more exploits. ChatGPT-4.0 worked better when prompts described more on the vulnerabilities, possible exploits, and code context. Our research will shed light on new research in security test generation. The generated tests will help developers create secure by design and secure by default software.
△ Less
Submitted 2 October, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects?
Authors:
Juntong Chen,
Kulendra Kumar Kaushal,
Rutwik Kulkarni,
Na Meng
Abstract:
StackOverflow (SO) is a widely used question-and-answer (Q\&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms to link user accounts or compare developers' activities across platforms. However, not much work is done to ch…
▽ More
StackOverflow (SO) is a widely used question-and-answer (Q\&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms to link user accounts or compare developers' activities across platforms. However, not much work is done to characterize the SO answers reused by GitHub projects. For this paper, we did an empirical study by mining the SO answers reused by Java projects available on GitHub. We created a hybrid approach of clone detection, keyword-based search, and manual inspection, to identify the answer(s) actually leveraged by developers. Based on the identified answers, we further studied topics of the discussion threads, answer characteristics (e.g., scores, ages, code lengths, and text lengths), and developers' reuse practices.
We observed that most reused answers offer programs to implement specific coding tasks. Among all analyzed SO discussion threads, the reused answers often have relatively higher scores, older ages, longer code, and longer text than unused answers. In only 9% of scenarios (40/430), developers fully copied answer code for reuse. In the remaining scenarios, they reused partial code or created brand new code from scratch. Our study characterized 130 SO discussion threads referred to by Java developers in 357 GitHub projects. Our empirical findings can guide SO answerers to provide better answers, and shed lights on future research related to SO and GitHub.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Crypto-ransomware Detection through Quantitative API-based Behavioral Profiling
Authors:
Wenjia Song,
Sanjula Karanam,
Ya Xiao,
**gyuan Qi,
Nathan Dautenhahn,
Na Meng,
Elena Ferrari,
Danfeng,
Yao
Abstract:
Crypto-ransomware has caused an unprecedented scope of impact in recent years with an evolving level of sophistication. We are in urgent need to pinpoint the security gap and improve the effectiveness of defenses by identifying new detection approaches. In this paper, we quantitatively characterized the runtime behaviors of 54 ransomware samples from 35 distinct families, with a focus on the core…
▽ More
Crypto-ransomware has caused an unprecedented scope of impact in recent years with an evolving level of sophistication. We are in urgent need to pinpoint the security gap and improve the effectiveness of defenses by identifying new detection approaches. In this paper, we quantitatively characterized the runtime behaviors of 54 ransomware samples from 35 distinct families, with a focus on the core encryption and file access behaviors. Based on our characterization results on dynamic API behaviors of ransomware, we present a new API profiling-based detection mechanism. Our method consists of two operations, namely consistency analysis and API-contrast-based refinement. We evaluate our approach against a set of real-world ransomware as well as benign samples. The results show that our method effectively detects all ransomware executions in the consistency analysis and further reduces the false positive case in API-contrast refinement. We also conduct in-depth case studies to investigate the most informative APIs for detection with context.
△ Less
Submitted 9 October, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Two-Stage Gras**: A New Bin Picking Framework for Small Objects
Authors:
Hanwen Cao,
Jianshu Zhou,
Junda Huang,
Yichuan Li,
Ng Cheng Meng,
Rui Cao,
Qi Dou,
Yunhui Liu
Abstract:
This paper proposes a novel bin picking framework, two-stage gras**, aiming at precise gras** of cluttered small objects. Object density estimation and rough gras** are conducted in the first stage. Fine segmentation, detection, gras**, and pushing are performed in the second stage. A small object bin picking system has been realized to exhibit the concept of two-stage gras**. Experiment…
▽ More
This paper proposes a novel bin picking framework, two-stage gras**, aiming at precise gras** of cluttered small objects. Object density estimation and rough gras** are conducted in the first stage. Fine segmentation, detection, gras**, and pushing are performed in the second stage. A small object bin picking system has been realized to exhibit the concept of two-stage gras**. Experiments have shown the effectiveness of the proposed framework. Unlike traditional bin picking methods focusing on vision-based gras** planning using classic frameworks, the challenges of picking cluttered small objects can be solved by the proposed new framework with simple vision detection and planning.
△ Less
Submitted 6 May, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Unsupervised Light Field Depth Estimation via Multi-view Feature Matching with Occlusion Prediction
Authors:
Shansi Zhang,
Nan Meng,
Edmund Y. Lam
Abstract:
Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity e…
▽ More
Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity estimation network (DispNet) with a coarse-to-fine structure to predict disparity maps from different view combinations. It explicitly performs multi-view feature matching to learn the correspondences effectively. As occlusions may cause the violation of photo-consistency, we introduce an occlusion prediction network (OccNet) to predict the occlusion maps, which are used as the element-wise weights of photometric loss to solve the occlusion issue and assist the disparity learning. With the disparity maps estimated by multiple input combinations, we then propose a disparity fusion strategy based on the estimated errors with effective occlusion handling to obtain the final disparity map with higher accuracy. Experimental results demonstrate that our method achieves superior performance on both the dense and sparse LF images, and also shows better robustness and generalization on the real-world LF images compared to the other methods.
△ Less
Submitted 18 August, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Identifying Unique Causal Network from Nonstationary Time Series
Authors:
Mingyu Kang,
Duxin Chen,
Ning Meng,
Gang Yan,
Wenwu Yu
Abstract:
Identifying causality is a challenging task in many data-intensive scenarios. Many algorithms have been proposed for this critical task. However, most of them consider the learning algorithms for directed acyclic graph (DAG) of Bayesian network (BN). These BN-based models only have limited causal explainability because of the issue of Markov equivalence class. Moreover, they are dependent on the a…
▽ More
Identifying causality is a challenging task in many data-intensive scenarios. Many algorithms have been proposed for this critical task. However, most of them consider the learning algorithms for directed acyclic graph (DAG) of Bayesian network (BN). These BN-based models only have limited causal explainability because of the issue of Markov equivalence class. Moreover, they are dependent on the assumption of stationarity, whereas many sampling time series from complex system are nonstationary. The nonstationary time series bring dataset shift problem, which leads to the unsatisfactory performances of these algorithms. To fill these gaps, a novel causation model named Unique Causal Network (UCN) is proposed in this paper. Different from the previous BN-based models, UCN considers the influence of time delay, and proves the uniqueness of obtained network structure, which addresses the issue of Markov equivalence class. Furthermore, based on the decomposability property of UCN, a higher-order causal entropy (HCE) algorithm is designed to identify the structure of UCN in a distributed way. HCE algorithm measures the strength of causality by using nearest-neighbors entropy estimator, which works well on nonstationary time series. Finally, lots of experiments validate that HCE algorithm achieves state-of-the-art accuracy when time series are nonstationary, compared to the other baseline algorithms.
△ Less
Submitted 29 August, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field Images
Authors:
Shansi Zhang,
Nan Meng,
Edmund Y. Lam
Abstract:
Light field (LF) images containing information for multiple views have numerous applications, which can be severely affected by low-light imaging. Recent learning-based methods for low-light enhancement have some disadvantages, such as a lack of noise suppression, complex training process and poor performance in extremely low-light conditions. To tackle these deficiencies while fully utilizing the…
▽ More
Light field (LF) images containing information for multiple views have numerous applications, which can be severely affected by low-light imaging. Recent learning-based methods for low-light enhancement have some disadvantages, such as a lack of noise suppression, complex training process and poor performance in extremely low-light conditions. To tackle these deficiencies while fully utilizing the multi-view information, we propose an efficient Low-light Restoration Transformer (LRT) for LF images, with multiple heads to perform intermediate tasks within a single network, including denoising, luminance adjustment, refinement and detail enhancement, achieving progressive restoration from small scale to full scale. Moreover, we design an angular transformer block with an efficient view-token scheme to model the global angular dependencies, and a multi-scale spatial transformer block to encode the multi-scale local and global information within each view. To address the issue of insufficient training data, we formulate a synthesis pipeline by simulating the major noise sources with the estimated noise parameters of LF camera. Experimental results demonstrate that our method achieves the state-of-the-art performance on low-light LF restoration with high efficiency.
△ Less
Submitted 15 March, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Machine learning approach in the development of building occupant personas
Authors:
Sheik Murad Hassan Anik,
Xinghua Gao,
Na Meng
Abstract:
The user persona is a communication tool for designers to generate a mental model that describes the archetype of users. Develo** building occupant personas is proven to be an effective method for human-centered smart building design, which considers occupant comfort, behavior, and energy consumption. Optimization of building energy consumption also requires a deep understanding of occupants' pr…
▽ More
The user persona is a communication tool for designers to generate a mental model that describes the archetype of users. Develo** building occupant personas is proven to be an effective method for human-centered smart building design, which considers occupant comfort, behavior, and energy consumption. Optimization of building energy consumption also requires a deep understanding of occupants' preferences and behaviors. The current approaches to develo** building occupant personas face a major obstruction of manual data processing and analysis. In this study, we propose and evaluate a machine learning-based semi-automated approach to generate building occupant personas. We investigate the 2015 Residential Energy Consumption Dataset with five machine learning techniques - Linear Discriminant Analysis, K Nearest Neighbors, Decision Tree (Random Forest), Support Vector Machine, and AdaBoost classifier - for the prediction of 16 occupant characteristics, such as age, education, and, thermal comfort. The models achieve an average accuracy of 61% and accuracy over 90% for attributes including the number of occupants in the household, their age group, and preferred usage of heating or cooling equipment. The results of the study show the feasibility of using machine learning techniques for the development of building occupant persona to minimize human effort.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Heavy-Ball-Based Hard Thresholding Algorithms for Sparse Signal Recovery
Authors:
Zhong-Feng Sun,
**-Chuan Zhou,
Yun-Bin Zhao,
Nan Meng
Abstract:
The hard thresholding technique plays a vital role in the development of algorithms for sparse signal recovery. By merging this technique and heavy-ball acceleration method which is a multi-step extension of the traditional gradient descent method, we propose the so-called heavy-ball-based hard thresholding (HBHT) and heavy-ball-based hard thresholding pursuit (HBHTP) algorithms for signal recover…
▽ More
The hard thresholding technique plays a vital role in the development of algorithms for sparse signal recovery. By merging this technique and heavy-ball acceleration method which is a multi-step extension of the traditional gradient descent method, we propose the so-called heavy-ball-based hard thresholding (HBHT) and heavy-ball-based hard thresholding pursuit (HBHTP) algorithms for signal recovery. It turns out that the HBHT and HBHTP can successfully recover a $k$-sparse signal if the restricted isometry constant of the measurement matrix satisfies $δ_{3k}<0.618 $ and $δ_{3k}<0.577,$ respectively. The guaranteed success of HBHT and HBHTP is also shown under the conditions $δ_{2k}<0.356$ and $δ_{2k}<0.377,$ respectively. Moreover, the finite convergence and stability of the two algorithms are also established in this paper. Simulations on random problem instances are performed to compare the performance of the proposed algorithms and several existing ones. Empirical results indicate that the HBHTP performs very comparably to a few existing algorithms and it takes less average time to achieve the signal recovery than these existing methods.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Example-Based Vulnerability Detection and Repair in Java Code
Authors:
Ying Zhang,
Ya Xiao,
Md Mahir Asef Kabir,
Danfeng,
Yao,
Na Meng
Abstract:
The Java libraries JCA and JSSE offer cryptographic APIs to facilitate secure coding. When developers misuse some of the APIs, their code becomes vulnerable to cyber-attacks. To eliminate such vulnerabilities, people built tools to detect security-API misuses via pattern matching. However, most tools do not (1) fix misuses or (2) allow users to extend tools' pattern sets. To overcome both limitati…
▽ More
The Java libraries JCA and JSSE offer cryptographic APIs to facilitate secure coding. When developers misuse some of the APIs, their code becomes vulnerable to cyber-attacks. To eliminate such vulnerabilities, people built tools to detect security-API misuses via pattern matching. However, most tools do not (1) fix misuses or (2) allow users to extend tools' pattern sets. To overcome both limitations, we created Seader-an example-based approach to detect and repair security-API misuses. Given an exemplar <insecure, secure>code pair, Seader compares the snippets to infer any API-misuse template and corresponding fixing edit. Based on the inferred info, given a program, Seader performs inter-procedural static analysis to search for security-API misuses and to propose customized fixes. For evaluation, we applied Seader to 28 <insecure, secure> codepairs; Seader successfully inferred 21 unique API-misuse templates and related fixes. With these <vulnerability, fix> patterns, we applied SEADER to a program benchmark that has 86 known vulnerabilities. Seader detected vulnerabilities with 95% precision, 72% recall, and82% F-score. We also applied Seader to 100 open-source projects and manually checked 77 suggested repairs; 76 of the repairs were correct. Seader can help developers correctly use security APIs.
△ Less
Submitted 28 April, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
FAPR: Fast and Accurate Program Repair for Introductory Programming Courses
Authors:
Yunlong Lu,
Na Meng,
Wenxin Li
Abstract:
In introductory programming courses, it is challenging for instructors to provide debugging feedback on students' incorrect programs. Some recent tools automatically offer program repair feedback by identifying any differences between incorrect and correct programs, but suffer from issues related to scalability, accuracy, and cross-language portability. This paper presents FAPR -- our novel approa…
▽ More
In introductory programming courses, it is challenging for instructors to provide debugging feedback on students' incorrect programs. Some recent tools automatically offer program repair feedback by identifying any differences between incorrect and correct programs, but suffer from issues related to scalability, accuracy, and cross-language portability. This paper presents FAPR -- our novel approach that suggests repairs based on program differences in a fast and accurate manner. FAPR is different from current tools in three aspects. First, it encodes syntactic information into token sequences to enable high-speed comparison between incorrect and correct programs. Second, to accurately extract program differences, FAPR adopts a novel matching algorithm that maximizes token-level matches and minimizes statement-level differences. Third, FAPR relies on testing instead of static/dynamic analysis to validate and refine candidate repairs, so it eliminates the language dependency or high runtime overhead incurred by complex program analysis. We implemented FAPR to suggest repairs for both C and C++ programs; our experience shows the great cross-language portability of FAPR. More importantly, we empirically compared FAPR with a state-of-the-art tool Clara. FAPR suggested repairs for over 95.5% of incorrect solutions. We sampled 250 repairs among FAPR's suggestions, and found 89.6% of the samples to be minimal and correct. FAPR outperformed Clara by suggesting repairs for more cases, creating smaller repairs, producing higher-quality fixes, and causing lower runtime overheads. Our results imply that FAPR can potentially help instructors or TAs to effectively locate bugs in incorrect code, and to provide debugging hints/guidelines based on those generated repairs.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Migrating Client Code without Change Examples
Authors:
Hao Zhong,
Na Meng
Abstract:
API developers evolve software libraries to fix bugs, add new features, or refactor code. To benefit from such library evolution, the programmers of client projects have to repetitively upgrade their library usages and adapt their codebases to any library API breaking changes (e.g., API renaming). Such adaptive changes can be tedious and error-prone. Existing tools provide limited support to help…
▽ More
API developers evolve software libraries to fix bugs, add new features, or refactor code. To benefit from such library evolution, the programmers of client projects have to repetitively upgrade their library usages and adapt their codebases to any library API breaking changes (e.g., API renaming). Such adaptive changes can be tedious and error-prone. Existing tools provide limited support to help programmers migrate client projects from old library versions to new ones. For instance, some tools extract API map**s be-tween library versions and only suggest simple adaptive changes (i.e., statement updates); other tools suggest or automate more complicated edits (e.g., statement insertions) based on user-provided exemplar code migrations. However, when new library versions are available, it is usually cumbersome and time-consuming for users to provide sufficient human-crafted samples in order to guide automatic migration. In this paper, we propose a novel approach, AutoUpdate, to further improve the state of the art. Instead of learning from change examples, we designed AutoUpdate to automate migration in a compiler-directed way. Namely, given a compilation error triggered by upgrading libraries, AutoUpdate exploits 13 migration opera-tors to generate candidate edits, and tentatively applies each edit until the error is resolved or all edits are explored. We conducted two experiments. The first experiment involves migrating 371 tutorial examples between versions of 5 popular libraries. AutoUpdate reduced migration-related compilation errors for 92.7% of tasks. It eliminated such errors for 32.4% of tasks, and 33.9% of the tasks have identical edits to manual migrations. In the second experiment, we applied AutoUpdate to migrate two real client projects of lucene. AutoUpdate successfully migrated both projects, and the migrated code passed all tests.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
A Lightweight Approach of Human-Like Playtesting
Authors:
Yan Zhao,
Weihao Zhang,
Enyi Tang,
Haipeng Cai,
Xi Guo,
Na Meng
Abstract:
A playtest is the process in which human testers are recruited to play video games and to reveal software bugs. Manual testing is expensive and time-consuming, especially when there are many mobile games to test and every software version requires for extensive testing before being released. Existing testing frameworks (e.g., Android Monkey) are limited because they adopt no domain knowledge to pl…
▽ More
A playtest is the process in which human testers are recruited to play video games and to reveal software bugs. Manual testing is expensive and time-consuming, especially when there are many mobile games to test and every software version requires for extensive testing before being released. Existing testing frameworks (e.g., Android Monkey) are limited because they adopt no domain knowledge to play games. Learning-based tools (e.g., Wuji) involve a huge amount of training data and computation before testing any game. This paper presents LIT -- our lightweight approach to generalize playtesting tactics from manual testing, and to adopt the generalized tactics to automate game testing. LIT consists of two phases. In Phase I, while a human plays an Android game app G for a short period of time (e.g., eight minutes), \tool records the user's actions (e.g., swipe) and the scene before each action. Based on the collected data, LIT generalizes a set of \emph{context-aware, abstract playtesting tactics} which describe under what circumstances, what actions can be taken to play the game. In Phase II, LIT tests G based on the generalized tactics. Namely, given a randomly generated game scene, LIT searches match for the abstract context of any inferred tactic; if there is a match, LIT customizes the tactic and generates a feasible event to play the game. Our evaluation with nine games shows LIT to outperform two state-of-the-art tools. This implies that by automating playtest, LIT will significantly reduce manual testing and boost the quality of game apps.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Hero: On the Chaos When PATH Meets Modules
Authors:
Ying Wang,
Liang Qiao,
Chang Xu,
Yepang Liu,
Shing-Chi Cheung,
Na Meng,
Hai Yu,
Zhiliang Zhu
Abstract:
Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overco…
▽ More
Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overcome the limitations of the original one. While these two library modes are incompatible, both are supported by the Golang ecosystem. The heterogeneous use of library-referencing modes across Golang projects has caused numerous dependency management (DM) issues, incurring reference inconsistencies and even build failures. Motivated by the problem, we conducted an empirical study to characterize the DM issues, understand their root causes, and examine their fixing solutions. Based on our findings, we developed \textsc{Hero}, an automated technique to detect DM issues and suggest proper fixing solutions. We applied \textsc{Hero} to 19,000 popular Golang projects. The results showed that \textsc{Hero} achieved a high detection rate of 98.5\% on a DM issue benchmark and found 2,422 new DM issues in 2,356 popular Golang projects. We reported 280 issues, among which 181 (64.6\%) issues have been confirmed, and 160 of them (88.4\%) have been fixed or are under fixing. Almost all the fixes have adopted our fixing suggestions.
△ Less
Submitted 24 February, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Automatic Detection and Resolution of Software Merge Conflicts: Are We There Yet?
Authors:
Bowen Shen,
Cihan Xiao,
Na Meng,
Fei He
Abstract:
Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging…
▽ More
Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging and time-consuming for developers to eliminate merge conflicts. Prior studies examined %the popularity of merge conflicts and how conflicts were related to code smells or software development process; tools were built to find and solve conflicts.
However, some fundamental research questions are still not comprehensively explored, including (1) how conflicts were introduced, (2) how developers manually resolved conflicts, and (3) what conflicts cannot be handled by current tools.
For this paper, we took a hybrid approach that combines automatic detection with manual inspection to reveal 204 merge conflicts and their resolutions in 15 open-source repositories. %in the version history of 15 open-source projects. Our data analysis reveals three phenomena. First, compiling and dynamic conflicts are harder to detect, although current tools mainly focus on textual conflicts. Second, in the same merging context, developers usually resolved similar textual conflicts with similar strategies. Third, developers manually fixed most of the inspected compiling and dynamic conflicts by similarly editing the merged version as what they did for one of the branches. Our research reveals the challenges and opportunities for automatic detection and resolution of merge conflicts; it also sheds light on related areas like systematic program editing and change recommendation.
△ Less
Submitted 2 March, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Investigating and Recommending Co-Changed Entities for JavaScript Programs
Authors:
Zijian Jiang,
Hao Zhong,
Na Meng
Abstract:
JavaScript (JS) is one of the most popular programming languages due to its flexibility and versatility, but maintaining JS code is tedious and error-prone. In our research, we conducted an empirical study to characterize the relationship between co-changed software entities (e.g., functions and variables), and built a machine learning (ML)-based approach to recommend additional entity to edit giv…
▽ More
JavaScript (JS) is one of the most popular programming languages due to its flexibility and versatility, but maintaining JS code is tedious and error-prone. In our research, we conducted an empirical study to characterize the relationship between co-changed software entities (e.g., functions and variables), and built a machine learning (ML)-based approach to recommend additional entity to edit given developers' code changes. Specifically, we first crawled 14,747 commits in 10 open-source projects; for each commit, we created one or more change dependency graphs (CDGs) to model the referencer-referencee relationship between co-changed entities. Next, we extracted the common subgraphs between CDGs to locate recurring co-change patterns between entities. Finally, based on those patterns, we extracted code features from co-changed entities and trained an ML model that recommends entities-to-change given a program commit. According to our empirical investigation, (1) three recurring patterns commonly exist in all projects; (2) 80%--90% of co-changed function pairs either invoke the same function(s), access the same variable(s), or contain similar statement(s); (3) our ML-based approach CoRec recommended entity changes with high accuracy (73%--78%). CoRec complements prior work because it suggests changes based on program syntax, textual similarity, as well as software history; it achieved higher accuracy than two existing tools in our evaluation.
△ Less
Submitted 17 February, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Data-Driven Vulnerability Detection and Repair in Java Code
Authors:
Ying Zhang,
Mahir Kabir,
Ya Xiao,
Danfeng,
Yao,
Na Meng
Abstract:
Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cybersecurity training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerab…
▽ More
Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cybersecurity training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerabilities, this paper presents SEADER -- our new approach that detects and repairs security API misuses. Given an exemplar, insecure code snippet, and its secure counterpart, SEADER compares the snippets and conducts data dependence analysis to infer the security API misuse templates and corresponding fixing operations. Based on the inferred information, given a program, SEADER performs inter-procedural static analysis to search for any security API misuse and to propose customized fixing suggestions for those vulnerabilities.
To evaluate SEADER, we applied it to 25 <insecure, secure> code pairs, and SEADER successfully inferred 18 unique API misuse templates and related fixes. With these vulnerability repair patterns, we further applied SEADER to 10 open-source projects that contain in total 32 known vulnerabilities. Our experiment shows that SEADER detected vulnerabilities with 100% precision, 84% recall, and 91% accuracy. Additionally, we applied SEADER to 100 Apache open-source projects and detected 988 vulnerabilities; SEADER always customized repair suggestions correctly. Based on SEADER's outputs, we filed 60 pull requests. Up till now, developers of 18 projects have offered positive feedbacks on SEADER's suggestions. Our results indicate that SEADER can effectively help developers detect and fix security API misuses. Whereas prior work either detects API misuses or suggests simple fixes, SEADER is the first tool to do both for nontrivial vulnerability repairs.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Light Field View Synthesis via Aperture Disparity and War** Confidence Map
Authors:
Nan Meng,
Kai Li,
Jianzhuang Liu,
Edmund Y. Lam
Abstract:
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in de…
▽ More
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing the aperture disparity map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a war** confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows better performance over several state-of-the-art techniques.
△ Less
Submitted 3 April, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Coding Practices and Recommendations of Spring Security for Enterprise Applications
Authors:
Mazharul Islam,
Sazzadur Rahaman,
Na Meng,
Behnaz Hassanshahi,
Padmanabhan Krishnan,
Danfeng,
Yao
Abstract:
Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by condu…
▽ More
Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by conducting a measurement-based approach on 28 Spring applications. Our analysis shows that security risks associated with the identified security anti-patterns and insecure defaults can leave the enterprise application vulnerable to a wide range of high-risk attacks. To prevent these high-risk attacks, we also provide recommendations for practitioners. Consequently, our study has contributed one update to the official Spring security documentation while other security issues identified in this study are being considered for future major releases by Spring security community.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
High-Order Residual Network for Light Field Super-Resolution
Authors:
Nan Meng,
Xiaofei Wu,
Jianzhuang Liu,
Edmund Y. Lam
Abstract:
Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hiera…
▽ More
Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hierarchically from the LF for reconstruction. An important component in the proposed network is the high-order residual block (HRB), which learns the local geometric features by considering the information from all input views. After fully obtaining the local features learned from each HRB, our model extracts the representative geometric features for spatio-angular upsampling through the global residual learning. Additionally, a refinement network is followed to further enhance the spatial details by minimizing a perceptual loss. Compared with previous work, our model is tailored to the rich structure inherent in the LF, and therefore can reduce the artifacts near non-Lambertian and occlusion regions. Experimental results show that our approach enables high-quality reconstruction even in challenging regions and outperforms state-of-the-art single image or LF reconstruction methods with both quantitative measurements and visual evaluation.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
High-dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction
Authors:
Nan Meng,
Hayden K. -H. So,
Xing Sun,
Edmund Y. Lam
Abstract:
We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) a…
▽ More
We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) as tensor restoration and develop a learning framework based on a two-stage restoration with 4-dimensional (4D) convolution. This allows our model to learn the features capturing the geometry information encoded in multiple adjacent views. Such geometric features vary near the occlusion regions and indicate the foreground object border. To train a feasible network, we propose a novel normalization operation based on a group of views in the feature maps, design a stage-wise loss function, and develop the multi-range training strategy to further improve the performance. Evaluations are conducted on a number of light field datasets including real-world scenes, synthetic data, and microscope light fields. The proposed method achieves superior performance and less execution time comparing with other state-of-the-art schemes.
△ Less
Submitted 17 September, 2020; v1 submitted 3 October, 2019;
originally announced October 2019.
-
How Reliable is the Crowdsourced Knowledge of Security Implementation?
Authors:
Mengsu Chen,
Felix Fischer,
Na Meng,
Xiaoyin Wang,
Jens Grossklags
Abstract:
Stack Overflow (SO) is the most popular online Q&A site for developers to share their expertise in solving programming issues. Given multiple answers to certain questions, developers may take the accepted answer, the answer from a person with high reputation, or the one frequently suggested. However, researchers recently observed exploitable security vulnerabilities in popular SO answers. This obs…
▽ More
Stack Overflow (SO) is the most popular online Q&A site for developers to share their expertise in solving programming issues. Given multiple answers to certain questions, developers may take the accepted answer, the answer from a person with high reputation, or the one frequently suggested. However, researchers recently observed exploitable security vulnerabilities in popular SO answers. This observation inspires us to explore the following questions: How much can we trust the security implementation suggestions on SO? If suggested answers are vulnerable, can developers rely on the community's dynamics to infer the vulnerability and identify a secure counterpart?
To answer these highly important questions, we conducted a study on SO posts by contrasting secure and insecure advices with the community-given content evaluation. We investigated whether SO incentive mechanism is effective in improving security properties of distributed code examples. Moreover, we also traced duplicated answers to assess whether the community behavior facilitates propagation of secure and insecure code suggestions. We compiled 953 different groups of similar security-related code examples and labeled their security, identifying 785 secure answer posts and 644 insecure ones. Compared with secure suggestions, insecure ones had higher view counts (36,508 vs. 18,713), received a higher score (14 vs. 5), and had significantly more duplicates (3.8 vs. 3.0) on average. 34% of the posts provided by highly reputable so-called trusted users were insecure.
Our findings show that there are lots of insecure snippets on SO, while the community-given feedback does not allow differentiating secure from insecure choices. Moreover, the reputation mechanism fails in indicating trustworthy users with respect to security questions, ultimately leaving other users wandering around alone in a software security minefield.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Automatic Clone Recommendation for Refactoring Based on the Present and the Past
Authors:
Ruru Yue,
Zhe Gao,
Na Meng,
Yingfei Xiong,
Xiaoyin Wang,
J. David Morgenthaler
Abstract:
When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on the past and the present information, such as the cohesion degree of individual clones or the co-evolution relations of clone peers. The existence of these too…
▽ More
When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on the past and the present information, such as the cohesion degree of individual clones or the co-evolution relations of clone peers. The existence of these tools inspired us to build an approach that considers as many factors as possible to more accurately recommend clones. This paper introduces CREC, a learning-based approach that recommends clones by extracting features from the current status and past history of software projects. Given a set of software repositories, CREC first automatically extracts the clone groups historically refactored (R-clones) and those not refactored (NR-clones) to construct the training set. CREC extracts 34 features to characterize the content and evolution behaviors of individual clones, as well as the spatial, syntactical, and co-change relations of clone peers. With these features, CREC trains a classifier that recommends clones for refactoring.
We designed the largest feature set thus far for clone recommendation, and performed an evaluation on six large projects. The results show that our approach suggested refactorings with 83% and 76% F-scores in the within-project and cross-project settings. CREC significantly outperforms a state-of-the-art similar approach on our data set, with the latter one achieving 70% and 50% F-scores. We also compared the effectiveness of different factors and different learning algorithms.
△ Less
Submitted 30 July, 2018;
originally announced July 2018.
-
Secure Coding Practices in Java: Challenges and Vulnerabilities
Authors:
Na Meng,
Stefan Nagy,
Daphne Yao,
Wenjie Zhuang,
Gustavo Arango Argoty
Abstract:
Java platform and third-party libraries provide various security features to facilitate secure coding. However, misusing these features can cost tremendous time and effort of developers or cause security vulnerabilities in software. Prior research was focused on the misuse of cryptography and SSL APIs, but did not explore the key fundamental research question: what are the biggest challenges and v…
▽ More
Java platform and third-party libraries provide various security features to facilitate secure coding. However, misusing these features can cost tremendous time and effort of developers or cause security vulnerabilities in software. Prior research was focused on the misuse of cryptography and SSL APIs, but did not explore the key fundamental research question: what are the biggest challenges and vulnerabilities in secure coding practices? In this paper, we conducted a comprehensive empirical study on StackOverflow posts to understand developers' concerns on Java secure coding, their programming obstacles, and potential vulnerabilities in their code. We observed that developers have shifted their effort to the usage of authentication and authorization features provided by Spring security--a third-party framework designed to secure enterprise applications. Multiple programming challenges are related to APIs or libraries, including the complicated cross-language data handling of cryptography APIs, and the complex Java-based or XML-based approaches to configure Spring security. More interestingly, we identified security vulnerabilities in the suggested code of accepted answers. The vulnerabilities included using insecure hash functions such as MD5, breaking SSL/TLS security through bypassing certificate validation, and insecurely disabling the default protection against Cross Site Request Forgery (CSRF) attacks. Our findings reveal the insufficiency of secure coding assistance and education, and the gap between security theory and coding practices.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.