-
Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis
Authors:
Eric L. Seidel,
Huma Sibghat,
Kamalika Chaudhuri,
Westley Weimer,
Ranjit Jhala
Abstract:
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of wh…
▽ More
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.
△ Less
Submitted 17 September, 2017; v1 submitted 24 August, 2017;
originally announced August 2017.
-
Dynamic Witnesses for Static Type Errors (or, Ill-Typed Programs Usually Go Wrong)
Authors:
Eric L Seidel,
Ranjit Jhala,
Westley Weimer
Abstract:
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to synthesize witness values that make the program go wrong. We prove that our proced…
▽ More
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to synthesize witness values that make the program go wrong. We prove that our procedure synthesizes general witnesses in that if a witness is found, then for all inhabited input types, there exist values that can make the function go wrong. Second, we show how to extend this procedure to produce a reduction graph that can be used to interactively visualize and debug witness executions. Third, we evaluate the coverage of our approach on two data sets comprising over 4,500 ill-typed student programs. Our technique is able to generate witnesses for around 85% of the programs, our reduction graph yields small counterexamples for over 80% of the witnesses, and a simple heuristic allows us to use witnesses to locate the source of type errors with around 70% accuracy. Finally, we evaluate whether our witnesses help students understand and fix type errors, and find that students presented with our witnesses show a greater understanding of type errors than those presented with a standard error message.
△ Less
Submitted 18 March, 2018; v1 submitted 23 June, 2016;
originally announced June 2016.
-
Type Targeted Testing
Authors:
Eric L. Seidel,
Niki Vazou,
Ranjit Jhala
Abstract:
We present a new technique called type targeted testing, which translates precise refinement types into comprehensive test-suites. The key insight behind our approach is that through the lens of SMT solvers, refinement types can also be viewed as a high-level, declarative, test generation technique, wherein types are converted to SMT queries whose models can be decoded into concrete program inputs…
▽ More
We present a new technique called type targeted testing, which translates precise refinement types into comprehensive test-suites. The key insight behind our approach is that through the lens of SMT solvers, refinement types can also be viewed as a high-level, declarative, test generation technique, wherein types are converted to SMT queries whose models can be decoded into concrete program inputs. Our approach enables the systematic and exhaustive testing of implementations from high-level declarative specifications, and furthermore, provides a gradual path from testing to full verification. We have implemented our approach as a Haskell testing tool called TARGET, and present an evaluation that shows how TARGET can be used to test a wide variety of properties and how it compares against state-of-the-art testing approaches.
△ Less
Submitted 15 January, 2015; v1 submitted 20 October, 2014;
originally announced October 2014.
-
From Safety To Termination And Back: SMT-Based Verification For Lazy Languages
Authors:
Niki Vazou,
Eric L. Seidel,
Ranjit Jhala
Abstract:
SMT-based verifiers have long been an effective means of ensuring safety properties of programs. While these techniques are well understood, we show that they implicitly require eager semantics; directly applying them to a lazy language is unsound due to the presence of divergent sub-computations. We recover soundness by composing the safety analysis with a termination analysis. Of course, termina…
▽ More
SMT-based verifiers have long been an effective means of ensuring safety properties of programs. While these techniques are well understood, we show that they implicitly require eager semantics; directly applying them to a lazy language is unsound due to the presence of divergent sub-computations. We recover soundness by composing the safety analysis with a termination analysis. Of course, termination is itself a challenging problem, but we show how the safety analysis can be used to ensure termination, thereby bootstrap** soundness for the entire system. Thus, while safety invariants have long been required to prove termination, we show how termination proofs can be to soundly establish safety. We have implemented our approach in liquidHaskell, a Refinement Type-based verifier for Haskell. We demonstrate its effectiveness via an experimental evaluation using liquidHaskell to verify safety, functional correctness and termination properties of real-world Haskell libraries, totaling over 10,000 lines of code.
△ Less
Submitted 23 January, 2014;
originally announced January 2014.
-
Metadata Management in Scientific Computing
Authors:
Eric L. Seidel
Abstract:
Complex scientific codes and the datasets they generate are in need of a sophisticated categorization environment that allows the community to store, search, and enhance metadata in an open, dynamic system. Currently, data is often presented in a read-only format, distilled and curated by a select group of researchers. We envision a more open and dynamic system, where authors can publish their dat…
▽ More
Complex scientific codes and the datasets they generate are in need of a sophisticated categorization environment that allows the community to store, search, and enhance metadata in an open, dynamic system. Currently, data is often presented in a read-only format, distilled and curated by a select group of researchers. We envision a more open and dynamic system, where authors can publish their data in a writeable format, allowing users to annotate the datasets with their own comments and data. This would enable the scientific community to collaborate on a higher level than before, where researchers could for example annotate a published dataset with their citations.
Such a system would require a complete set of permissions to ensure that any individual's data cannot be altered by others unless they specifically allow it. For this reason datasets and codes are generally presented read-only, to protect the author's data; however, this also prevents the type of social revolutions that the private sector has seen with Facebook and Twitter.
In this paper, we present an alternative method of publishing codes and datasets, based on Fluidinfo, which is an openly writeable and social metadata engine. We will use the specific example of the Einstein Toolkit, a shared scientific code built using the Cactus Framework, to illustrate how the code's metadata may be published in writeable form via Fluidinfo.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Simplifying Complex Software Assembly: The Component Retrieval Language and Implementation
Authors:
Eric L. Seidel,
Gabrielle Allen,
Steven Brandt,
Frank Löffler,
Erik Schnetter
Abstract:
Assembling simulation software along with the associated tools and utilities is a challenging endeavor, particularly when the components are distributed across multiple source code versioning systems. It is problematic for researchers compiling and running the software across many different supercomputers, as well as for novices in a field who are often presented with a bewildering list of softwar…
▽ More
Assembling simulation software along with the associated tools and utilities is a challenging endeavor, particularly when the components are distributed across multiple source code versioning systems. It is problematic for researchers compiling and running the software across many different supercomputers, as well as for novices in a field who are often presented with a bewildering list of software to collect and install. In this paper, we describe a language (CRL) for specifying software components with the details needed to obtain them from source code repositories. The language supports public and private access. We describe a tool called GetComponents which implements CRL and can be used to assemble software. We demonstrate the tool for application scenarios with the Cactus Framework on the NSF TeraGrid resources. The tool itself is distributed with an open source license and freely available from our web page.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.
-
Component Specification in the Cactus Framework: The Cactus Configuration Language
Authors:
Gabrielle Allen,
Tom Goodale,
Frank Löffler,
David Rideout,
Erik Schnetter,
Eric L. Seidel
Abstract:
Component frameworks are complex systems that rely on many layers of abstraction to function properly. One essential requirement is a consistent means of describing each individual component and how it relates to both other components and the whole framework. As component frameworks are designed to be flexible by nature, the description method should be simultaneously powerful, lead to efficient c…
▽ More
Component frameworks are complex systems that rely on many layers of abstraction to function properly. One essential requirement is a consistent means of describing each individual component and how it relates to both other components and the whole framework. As component frameworks are designed to be flexible by nature, the description method should be simultaneously powerful, lead to efficient code, and be easy to use, so that new users can quickly adapt their own code to work with the framework. In this paper, we discuss the Cactus Configuration Language (CCL) which is used to describe components ("thorns'') in the Cactus Framework. The CCL provides a description language for the variables, parameters, functions, scheduling and compilation of a component and includes concepts such as interface and implementation which allow thorns providing the same capabilities to be easily interchanged. We include several application examples which illustrate how community toolkits use the CCL and Cactus and identify needed additions to the language.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.