Search | arXiv e-print repository

Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

Authors: Raffi Khatchadourian, Tatiana Castro Vélez, Mehdi Bagherzadeh, Nan Jia, Anita Raja

Abstract: Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce code that is error-prone, non-intuitive, and difficult to debug. Consequently… ▽ More Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution -- avoiding performance bottlenecks and semantically inequivalent results. We present our ongoing work on an automated refactoring approach that assists developers in specifying whether and how their otherwise eagerly-executed imperative DL code could be reliably and efficiently executed as graphs at run-time in a semantics-preserving fashion. The approach, based on a novel tensor analysis specifically for imperative DL code, consists of refactoring preconditions for automatically determining when it is safe and potentially advantageous to migrate imperative DL code to graph execution and modifying decorator parameters or eagerly executing code already running as graphs. The approach is being implemented as a PyDev Eclipse IDE plug-in and uses the WALA Ariadne analysis framework. We discuss our ongoing work towards optimizing imperative DL code to its full potential. △ Less

Submitted 10 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: To appear in the NIER track of the IEEE/ACM International Conference on Automated Software Engineering, ASE '23, Kirchberg, Luxembourg, September 2023

arXiv:2202.08761 [pdf, other]

QuerTCI: A Tool Integrating GitHub Issue Querying with Comment Classification

Authors: Ye Paing, Tatiana Castro Vélez, Raffi Khatchadourian

Abstract: Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover -- among others -- challenges developers face when incorporating new technologies, platforms, and programming language constructs. However, such threads accumulate, becoming unwieldy and hindering any insight researchers may gain. While existing approaches alleviate t… ▽ More Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover -- among others -- challenges developers face when incorporating new technologies, platforms, and programming language constructs. However, such threads accumulate, becoming unwieldy and hindering any insight researchers may gain. While existing approaches alleviate this burden by classifying issue thread comments, there is a gap between searching popular open-source software repositories (e.g., those on GitHub) for issues containing particular keywords and feeding the results into a classification model. This paper demonstrates a research infrastructure tool called QuerTCI that bridges this gap by integrating the GitHub issue comment search API with the classification models found in existing approaches. Using queries, ESE researchers can retrieve GitHub issues containing particular keywords, e.g., those related to a specific programming language construct, and, subsequently, classify the discussions occurring in those issues. We hope ESE researchers can use our tool to uncover challenges related to particular technologies using specific keywords through popular open-source repositories more seamlessly than previously possible. A tool demonstration video may be found at: https://youtu.be/fADKSxn0QUk. △ Less

Submitted 19 July, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2201.09953 [pdf, other]

doi 10.1145/3524842.3528455

Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study

Authors: Tatiana Castro Vélez, Raffi Khatchadourian, Mehdi Bagherzadeh, Anita Raja

Abstract: Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequen… ▽ More Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges -- and resultant bugs -- involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation -- the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators. △ Less

Submitted 5 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: International Conference on Mining Software Repositories, MSR 2022. ACM/IEEE, ACM, May 2022

ACM Class: D.2.m

Journal ref: ACM/IEEE International Conference on Mining Software Repositories, May 2022

arXiv:2112.02758 [pdf, other]

doi 10.1109/ICSE-Companion55297.2022.9793736

A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree of Interest

Authors: Yiming Tang, Allan Spektor, Raffi Khatchadourian, Mehdi Bagherzadeh

Abstract: Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be dangerous to software applications. Using log levels, developers can print the useful information while hiding the verbose logs during software runtime. As software… ▽ More Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be dangerous to software applications. Using log levels, developers can print the useful information while hiding the verbose logs during software runtime. As software evolves, the log levels of logging statements associated with the surrounding software feature implementation may also need to be altered. Maintaining log levels necessitates a significant amount of manual effort. In this paper, we demonstrate an automated approach that can rejuvenate feature log levels by matching the interest level of developers in the surrounding features. The approach is implemented as an open-source Eclipse plugin, using two external plug-ins (JGit and Mylyn). It was tested on 18 open-source Java projects consisting of ~3 million lines of code and ~4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by ~20%, and increases the focus of logs in bug fix contexts ~83% of the time. For further details, interested readers can watch our demonstration video (https://www.youtube.com/watch?v=qIULoAXoDv4). △ Less

Submitted 16 February, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 4 pages, ICSE '22 (tool demo track)

Journal ref: International Conference on Software Engineering, ICSE 2022. ACM/IEEE, ACM, May 2022

arXiv:2109.03138 [pdf, other]

Interests, Difficulties, Sentiments, and Tool Usages of Concurrency Developers: A Large-Scale Study on Stack Overflow

Authors: Mehdi Bagherzadeh, Syed Ahmed, Srilakshmi Sripathi, Raffi Khatchadourian

Abstract: Context: Software developers are increasingly facing the challenges of writing code that is not only concurrent but also correct. Objective: To help these developers, it is necessary to understand concurrency topics they are interested in, their difficulty in finding answers for questions in these topics, their sentiment for these topics, and how they use concurrency tools and techniques to guar… ▽ More Context: Software developers are increasingly facing the challenges of writing code that is not only concurrent but also correct. Objective: To help these developers, it is necessary to understand concurrency topics they are interested in, their difficulty in finding answers for questions in these topics, their sentiment for these topics, and how they use concurrency tools and techniques to guarantee correctness. Method: We conduct a large-scale study on the entirety of Stack Overflow to understand interests, difficulties, sentiment, and tool usages of concurrency developers. We discuss the implications of our findings for the practice, research, and education of concurrent software development, and investigate the relation of our findings with the findings of the previous work. Results: A few findings of our study are: (1) questions that concurrency developers ask can be grouped into a hierarchy with 27 concurrency topics under 8 major categories, (2) thread safety is among the most popular concurrency topics and client-server concurrency is among the least popular, (3) irreproducible behavior is among the most difficult topics and memory consistency is among the least difficult, (4) data scra** is among the most positive concurrency topics and irreproducible behavior is among the most negative, (5) root cause identification has the most number of questions for usage of data race tools and alternative use has the least. Conclusion: The results of our study can not only help concurrency developers but also concurrency educators and researchers to better decide where to focus their efforts, by trading off one concurrency topic against another. △ Less

Submitted 9 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

ACM Class: D.1.3

arXiv:2104.07736 [pdf, other]

doi 10.1016/j.scico.2021.102724

Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest

Authors: Yiming Tang, Allan Spektor, Raffi Khatchadourian, Mehdi Bagherzadeh

Abstract: Logging -- used for system events and security breaches to describe more informational yet essential aspects of software features -- is pervasive. Given the high transactionality of today's software, logging effectiveness can be reduced by information overload. Log levels help alleviate this problem by correlating a priority to logs that can be later filtered. As software evolves, however, levels… ▽ More Logging -- used for system events and security breaches to describe more informational yet essential aspects of software features -- is pervasive. Given the high transactionality of today's software, logging effectiveness can be reduced by information overload. Log levels help alleviate this problem by correlating a priority to logs that can be later filtered. As software evolves, however, levels of logs documenting surrounding feature implementations may also require modification as features once deemed important may have decreased in urgency and vice-versa. We present an automated approach that assists developers in evolving levels of such (feature) logs. The approach, based on mining Git histories and manipulating a degree of interest (DOI) model, transforms source code to revitalize feature log levels based on the "interestingness" of the surrounding code. Built upon JGit and Mylyn, the approach is implemented as an Eclipse IDE plug-in and evaluated on 18 Java projects with $\sim$3 million lines of code and $\sim$4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by $\sim$20%, and increases the focus of logs in bug fix contexts $\sim$83% of the time. Moreover, pull (patch) requests were integrated into large and popular open-source projects. The results indicate that the approach is promising in assisting developers in evolving feature log levels. △ Less

Submitted 12 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 37 pages, 1 figure

Journal ref: Science of Computer Programming, 2022-02, Vol. 214, p.102724

arXiv:1803.10198 [pdf]

doi 10.22152/programming-journal.org/2018/2/6

Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods

Authors: Raffi Khatchadourian, Hidehiko Masuhara

Abstract: Programming languages and platforms improve over time, sometimes resulting in new language features that offer many benefits. However, despite these benefits, developers may not always be willing to adopt them in their projects for various reasons. In this paper, we describe an empirical study where we assess the adoption of a particular new language feature. Studying how developers use (or do not… ▽ More Programming languages and platforms improve over time, sometimes resulting in new language features that offer many benefits. However, despite these benefits, developers may not always be willing to adopt them in their projects for various reasons. In this paper, we describe an empirical study where we assess the adoption of a particular new language feature. Studying how developers use (or do not use) new language features is important in programming language research and engineering because it gives designers insight into the usability of the language to create meaning programs in that language. This knowledge, in turn, can drive future innovations in the area. Here, we explore Java 8 default methods, which allow interfaces to contain (instance) method implementations. Default methods can ease interface evolution, make certain ubiquitous design patterns redundant, and improve both modularity and maintainability. A focus of this work is to discover, through a scientific approach and a novel technique, situations where developers found these constructs useful and where they did not, and the reasons for each. Although several studies center around assessing new language features, to the best of our knowledge, this kind of construct has not been previously considered. Despite their benefits, we found that developers did not adopt default methods in all situations. Our study consisted of submitting pull requests introducing the language feature to 19 real-world, open source Java projects without altering original program semantics. This novel assessment technique is proactive in that the adoption was driven by an automatic refactoring approach rather than waiting for developers to discover and integrate the feature themselves. In this way, we set forth best practices and patterns of using the language feature effectively earlier rather than later and are able to possibly guide (near) future language evolution. We foresee this technique to be useful in assessing other new language features, design patterns, and other programming idioms. △ Less

Submitted 27 March, 2018; originally announced March 2018.

Journal ref: The Art, Science, and Engineering of Programming, 2018, Vol. 2, Issue 3, Article 6

Showing 1–7 of 7 results for author: Khatchadourian, R