-
Deep Static Modeling of invokedynamic
Authors:
George Fourtounis,
Yannis Smaragdakis
Abstract:
Java 7 introduced programmable dynamic linking in the form of the invokedynamic framework. Static analysis of code containing programmable dynamic linking has often been cited as a significant source of unsoundness in the analysis of Java programs. For example, Java lambdas, introduced in Java 8, are a very popular feature, which is, however, resistant to static analysis, since it mixes invokedyna…
▽ More
Java 7 introduced programmable dynamic linking in the form of the invokedynamic framework. Static analysis of code containing programmable dynamic linking has often been cited as a significant source of unsoundness in the analysis of Java programs. For example, Java lambdas, introduced in Java 8, are a very popular feature, which is, however, resistant to static analysis, since it mixes invokedynamic with dynamic code generation. These techniques invalidate static analysis assumptions: programmable linking breaks reasoning about method resolution while dynamically generated code is, by definition, not available statically. In this paper, we show that a static analysis can predictively model uses of invokedynamic while also cooperating with extra rules to handle the runtime code generation of lambdas. Our approach plugs into an existing static analysis and helps eliminate all unsoundness in the handling of lambdas (including associated features such as method references) and generic invokedynamic uses. We evaluate our technique on a benchmark suite of our own and on third-party benchmarks, uncovering all code previously unreachable due to unsoundness, highly efficiently.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Heaps Don't Lie: Countering Unsoundness with Heap Snapshots
Authors:
Neville Grech,
George Fourtounis,
Adrian Francalanza,
Yannis Smaragdakis
Abstract:
Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas…
▽ More
Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Next-Paradigm Programming Languages: What Will They Look Like and What Changes Will They Bring?
Authors:
Yannis Smaragdakis
Abstract:
The dream of programming language design is to bring about orders-of-magnitude productivity improvements in software development tasks. Designers can endlessly debate on how this dream can be realized and on how close we are to its realization. Instead, I would like to focus on a question with an answer that can be, surprisingly, clearer: what will be the common principles behind next-paradigm, hi…
▽ More
The dream of programming language design is to bring about orders-of-magnitude productivity improvements in software development tasks. Designers can endlessly debate on how this dream can be realized and on how close we are to its realization. Instead, I would like to focus on a question with an answer that can be, surprisingly, clearer: what will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development? Based on my decade-plus experience of heavy-duty development in declarative languages, I speculate that certain tenets of high-productivity languages are inevitable. These include, for instance, enormous variations in performance (including automatic transformations that change the asymptotic complexity of algorithms); a radical change in a programmer's workflow, elevating testing from a near-menial task to an act of deep understanding; a change in the need for formal proofs; and more.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Symbolic Reasoning for Automatic Signal Placement (Extended Version)
Authors:
Kostas Ferles,
Jacob Van Geffen,
Isil Dillig,
Yannis Smaragdakis
Abstract:
Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal…
▽ More
Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.
△ Less
Submitted 6 April, 2018;
originally announced April 2018.
-
Effectiveness of Anonymization in Double-Blind Review
Authors:
Claire Le Goues,
Yuriy Brun,
Sven Apel,
Emery Berger,
Sarfraz Khurshid,
Yannis Smaragdakis
Abstract:
Double-blind review relies on the authors' ability and willingness to effectively anonymize their submissions. We explore anonymization effectiveness at ASE 2016, OOPSLA 2016, and PLDI 2016 by asking reviewers if they can guess author identities. We find that 74%-90% of reviews contain no correct guess and that reviewers who self-identify as experts on a paper's topic are more likely to attempt to…
▽ More
Double-blind review relies on the authors' ability and willingness to effectively anonymize their submissions. We explore anonymization effectiveness at ASE 2016, OOPSLA 2016, and PLDI 2016 by asking reviewers if they can guess author identities. We find that 74%-90% of reviews contain no correct guess and that reviewers who self-identify as experts on a paper's topic are more likely to attempt to guess, but no more likely to guess correctly. We present our findings, summarize the PC chairs' comments about administering double-blind review, discuss the advantages and disadvantages of revealing author identities part of the way through the process, and conclude by advocating for the continued use of double-blind review.
△ Less
Submitted 5 September, 2017;
originally announced September 2017.
-
Stream Fusion, to Completeness
Authors:
Oleg Kiselyov,
Aggelos Biboudis,
Nick Palladinos,
Yannis Smaragdakis
Abstract:
Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We…
▽ More
Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zip**, nesting (or flat-map**), sub-ranging, filtering, map**-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams.
△ Less
Submitted 20 December, 2016;
originally announced December 2016.
-
jUCM: Universal Class Morphing (position paper)
Authors:
Aggelos Biboudis,
George Fourtounis,
Yannis Smaragdakis
Abstract:
We extend prior work on class-morphing to provide a more expressive pattern-based compile-time reflection language. Our MorphJ language offers a disciplined form of metaprogramming that produces types by statically iterating over and pattern-matching on fields and methods of other types. We expand such capabilities with "universal morphing", which also allows pattern-matching over types (e.g., all…
▽ More
We extend prior work on class-morphing to provide a more expressive pattern-based compile-time reflection language. Our MorphJ language offers a disciplined form of metaprogramming that produces types by statically iterating over and pattern-matching on fields and methods of other types. We expand such capabilities with "universal morphing", which also allows pattern-matching over types (e.g., all classes nested in another, all supertypes of a class) while maintaining modular type safety for our meta-programs. We present informal examples of the functionality and discuss a design for adding universal morphing to Java.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Clash of the Lambdas
Authors:
Aggelos Biboudis,
Nick Palladinos,
Yannis Smaragdakis
Abstract:
The introduction of lambdas in Java 8 completes the slate of statically-typed, mainstream languages with both object-oriented and functional features. The main motivation for lambdas in Java has been to facilitate stream-based declarative APIs, and, therefore, easier parallelism. In this paper, we evaluate the performance impact of lambda abstraction employed in stream processing, for a variety of…
▽ More
The introduction of lambdas in Java 8 completes the slate of statically-typed, mainstream languages with both object-oriented and functional features. The main motivation for lambdas in Java has been to facilitate stream-based declarative APIs, and, therefore, easier parallelism. In this paper, we evaluate the performance impact of lambda abstraction employed in stream processing, for a variety of high-level languages that run on a virtual machine (C#, F#, Java and Scala) and runtime platforms (JVM on Linux and Windows, .NET CLR for Windows, Mono for Linux). Furthermore, we evaluate the performance gain that two optimizing libraries (ScalaBlitz and LinqOptimizer) can offer for C#, F# and Scala. Our study is based on small-scale throughput-benchmarking, with significant care to isolate different factors, consult experts on the systems involved, and identify causes and opportunities. We find that Java exhibits high implementation maturity, which is a dominant factor in benchmarks. At the same time, optimizing frameworks can be highly effective for common query patterns.
△ Less
Submitted 14 July, 2014; v1 submitted 25 June, 2014;
originally announced June 2014.
-
Resolving and Exploiting the $k$-CFA Paradox
Authors:
Matthew Might,
Yannis Smaragdakis,
David Van Horn
Abstract:
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of…
▽ More
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities.
Recent results indicate that the relationship between the functional and object-oriented incarnations of $k$-CFA is not as well understood as thought. Van Horn and Mairson proved $k$-CFA for $k \geq 1$ to be EXPTIME-complete; hence, no polynomial-time algorithm can exist. Yet, there are several polynomial-time formulations of context-sensitive points-to analyses in object-oriented languages. Thus, it seems that functional $k$-CFA may actually be a profoundly different analysis from object-oriented $k$-CFA. We resolve this paradox by showing that the exact same specification of $k$-CFA is polynomial-time for object-oriented languages yet exponential- time for functional ones: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity and complexity. This illumination leads to an immediate payoff: by projecting the object-oriented treatment of objects onto closures, we derive a polynomial-time hierarchy of context-sensitive CFAs for functional programs.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.