Search | arXiv e-print repository

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

Authors: Aamod Khatiwada, Harsha Kokel, Ibrahim Abdelaziz, Subhajit Chaudhury, Julian Dolby, Oktie Hassanzadeh, Zhenhan Huang, Tejaswini Pedapati, Horst Samulowitz, Kavitha Srinivas

Abstract: Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness o… ▽ More Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness of data discovery techniques in neural tabular models. Second, to further finetune the pretrained model for several downstream tasks, we develop LakeBench, a collection of 8 benchmarks to help with different data discovery tasks such as finding tasks that are unionable, joinable, or subsets of each other. We then show on these finetuning tasks that TabSketchFM achieves state-of-the art performance compared to existing neural models. Third, we use these finetuned models to search for tables that are unionable, joinable, or can be subsets of each other. Our results demonstrate improvements in F1 scores for search compared to state-of-the-art techniques (even up to 70% improvement in a joinable search benchmark). Finally, we show significant transfer across datasets and tasks establishing that our model can generalize across different tasks over different data lakes △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.04217

arXiv:2406.10320 [pdf, other]

Out of style: Misadventures with LLMs and code style transfer

Authors: Karl Munson, Chih-Kai Ting, Serenity Wade, Anish Savla, Julian Dolby, Kiran Kate, Kavitha Srinivas

Abstract: Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform… ▽ More Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform code style transfer. Code style transfer, unlike text transfer, has rigorous requirements: the system needs to identify lines of code to change, change them correctly, and leave the rest of the program untouched. We designed CSB (Code Style Benchmark), a benchmark suite of code style transfer tasks across five categories including converting for-loops to list comprehensions, eliminating duplication in code, adding decorators to methods, etc. We then used these tests to see if large pre-trained code language models or fine-tuned models perform style transfer correctly, based on rigorous metrics to test that the transfer did occur, and the code still passes functional tests. Surprisingly, language models failed to perform all of the tasks, suggesting that they perform poorly on tasks that require code understanding. We will make available the large-scale corpora to help the community build better code models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2307.04217 [pdf, other]

LakeBench: Benchmarks for Data Discovery over Data Lakes

Authors: Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz

Abstract: Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private data… ▽ More Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2301.05108 [pdf, other]

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Authors: Wenting Zhao, Ibrahim Abdelaziz, Julian Dolby, Kavitha Srinivas, Mossad Helali, Essam Mansour

Abstract: Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we… ▽ More Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2205.01311 [pdf, other]

doi 10.1145/3520312.3534868

Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)

Authors: Julian Dolby, Jason Tsay, Martin Hirzel

Abstract: Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosin… ▽ More Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This paper describes an approach for automatically debugging an ML pipeline, explaining the failures, and producing a remediation. We implemented our approach, which builds on a combination of AutoML and SMT, in a tool called Maro. Maro works seamlessly with the familiar data science ecosystem including Python, Jupyter notebooks, scikit-learn, and AutoML tools such as Hyperopt. We empirically evaluate our tool and find that for most cases, a single remediation automatically fixes errors, produces no additional faults, and does not significantly impact optimal accuracy nor time to convergence. △ Less

Submitted 5 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

Comments: Extended version of MAPS 2022 paper

Journal ref: Symposium on Machine Programming (MAPS), pages 60-69, June 2022

arXiv:2201.12242 [pdf, other]

Large Scale Generation of Labeled Type Data for Python

Authors: Ibrahim Abdelaziz, Julian Dolby, Kavitha Srinivas

Abstract: Recently, dynamically typed languages, such as Python, have gained unprecedented popularity. Although these languages alleviate the need for mandatory type annotations, types still play a critical role in program understanding and preventing runtime errors. An attractive option is to infer types automatically to get static guarantees without writing types. Existing inference techniques rely mostly… ▽ More Recently, dynamically typed languages, such as Python, have gained unprecedented popularity. Although these languages alleviate the need for mandatory type annotations, types still play a critical role in program understanding and preventing runtime errors. An attractive option is to infer types automatically to get static guarantees without writing types. Existing inference techniques rely mostly on static ty** tools such as PyType for direct type inference; more recently, neural type inference has been proposed. However, neural type inference is data hungry, and depends on collecting labeled data based on static ty**. Such tools, however, are poor at inferring user defined types. Furthermore, type annotation by developers in these languages is quite sparse. In this work, we propose novel techniques for generating high quality types using 1) information retrieval techniques that work on well documented libraries to extract types and 2) usage patterns by analyzing a large repository of programs. Our results show that these techniques are more precise and address the weaknesses of static tools, and can be useful for generating a large labeled dataset for type inference by machine learning methods. F1 scores are 0.52-0.58 for our techniques, compared to static ty** tools which are at 0.06, and we use them to generate over 37,000 types for over 700 modules. △ Less

Submitted 6 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2111.00083 [pdf, other]

A Scalable AutoML Approach Based on Graph Neural Networks

Authors: Mossad Helali, Essam Mansour, Ibrahim Abdelaziz, Julian Dolby, Kavitha Srinivas

Abstract: AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines b… ▽ More AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 126 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems. △ Less

Submitted 14 July, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

Comments: 14 pages, 9 figures. Accepted in VLDB22

arXiv:2109.07452 [pdf, other]

Can Machines Read Coding Manuals Yet? -- A Benchmark for Building Better Language Models for Code Understanding

Authors: Ibrahim Abdelaziz, Julian Dolby, Jamie McCusker, Kavitha Srinivas

Abstract: Code understanding is an increasingly important application of Artificial Intelligence. A fundamental aspect of understanding code is understanding text about code, e.g., documentation and forum discussions. Pre-trained language models (e.g., BERT) are a popular approach for various NLP tasks, and there are now a variety of benchmarks, such as GLUE, to help improve the development of such models f… ▽ More Code understanding is an increasingly important application of Artificial Intelligence. A fundamental aspect of understanding code is understanding text about code, e.g., documentation and forum discussions. Pre-trained language models (e.g., BERT) are a popular approach for various NLP tasks, and there are now a variety of benchmarks, such as GLUE, to help improve the development of such models for natural language understanding. However, little is known about how well such models work on textual artifacts about code, and we are unaware of any systematic set of downstream tasks for such an evaluation. In this paper, we derive a set of benchmarks (BLANCA - Benchmarks for LANguage models on Coding Artifacts) that assess code understanding based on tasks such as predicting the best answer to a question in a forum post, finding related forum posts, or predicting classes related in a hierarchy from class documentation. We evaluate the performance of current state-of-the-art language models on these tasks and show that there is a significant improvement on each task from fine tuning. We also show that multi-task training over BLANCA tasks helps build better language models for code understanding. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2105.12655 [pdf, other]

CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

Authors: Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss

Abstract: Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled numerous breakthroughs,… ▽ More Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled numerous breakthroughs, motivating researchers to leverage AI techniques to improve software development efficiency. Thus, the fast-emerging research area of AI for Code has garnered new interest and gathered momentum. In this paper, we present a large-scale dataset CodeNet, consisting of over 14 million code samples and about 500 million lines of code in 55 different programming languages, which is aimed at teaching AI to code. In addition to its large scale, CodeNet has a rich set of high-quality annotations to benchmark and help accelerate research in AI techniques for a variety of critical coding tasks, including code similarity and classification, code translation between a large variety of programming languages, and code performance (runtime and memory) improvement techniques. Additionally, CodeNet provides sample input and output test sets for 98.5% of the code samples, which can be used as an oracle for determining code correctness and potentially guide reinforcement learning for code quality improvements. As a usability feature, we provide several pre-processing tools in CodeNet to transform source code into representations that can be readily used as inputs into machine learning models. Results of code classification and code similarity experiments using the CodeNet dataset are provided as a reference. We hope that the scale, diversity and rich, high-quality annotations of CodeNet will offer unprecedented research opportunities at the intersection of AI and Software Engineering. △ Less

Submitted 29 August, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

Comments: 22 pages including references

arXiv:2002.09440 [pdf, other]

A Toolkit for Generating Code Knowledge Graphs

Authors: Ibrahim Abdelaziz, Julian Dolby, Jamie McCusker, Kavitha Srinivas

Abstract: Knowledge graphs have been proven extremely useful in powering diverse applications in semantic search and natural language understanding. In this paper, we present GraphGen4Code, a toolkit to build code knowledge graphs that can similarly power various applications such as program search, code understanding, bug detection, and code automation. GraphGen4Code uses generic techniques to capture code… ▽ More Knowledge graphs have been proven extremely useful in powering diverse applications in semantic search and natural language understanding. In this paper, we present GraphGen4Code, a toolkit to build code knowledge graphs that can similarly power various applications such as program search, code understanding, bug detection, and code automation. GraphGen4Code uses generic techniques to capture code semantics with the key nodes in the graph representing classes, functions, and methods. Edges indicate function usage (e.g., how data flows through function calls, as derived from program analysis of real code), and documentation about functions (e.g., code documentation, usage documentation, or forum discussions such as StackOverflow). Our toolkit uses named graphs in RDF to model graphs per program, or can output graphs as JSON. We show the scalability of the toolkit by applying it to 1.3 million Python files drawn from GitHub, 2,300 Python modules, and 47 million forum posts. This results in an integrated code graph with over 2 billion triples. We make the toolkit to build such graphs as well as the sample extraction of the 2 billion triples graph publicly available to the community for use. △ Less

Submitted 27 September, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

arXiv:1809.01604 [pdf, other]

Merging datasets through deep learning

Authors: Kavitha Srinivas, Abraham Gale, Julian Dolby

Abstract: Merging datasets is a key operation for data analytics. A frequent requirement for merging is joining across columns that have different surface forms for the same entity (e.g., the name of a person might be represented as "Douglas Adams" or "Adams, Douglas"). Similarly, ontology alignment can require recognizing distinct surface forms of the same entity, especially when ontologies are independent… ▽ More Merging datasets is a key operation for data analytics. A frequent requirement for merging is joining across columns that have different surface forms for the same entity (e.g., the name of a person might be represented as "Douglas Adams" or "Adams, Douglas"). Similarly, ontology alignment can require recognizing distinct surface forms of the same entity, especially when ontologies are independently developed. However, data management systems are currently limited to performing merges based on string equality, or at best using string similarity. We propose an approach to performing merges based on deep learning models. Our approach depends on (a) creating a deep learning model that maps surface forms of an entity into a set of vectors such that alternate forms for the same entity are closest in vector space, (b) indexing these vectors using a nearest neighbors algorithm to find the forms that can be potentially joined together. To build these models, we had to adapt techniques from metric learning due to the characteristics of the data; specifically we describe novel sample selection techniques and loss functions that work for this problem. To evaluate our approach, we used Wikidata as ground truth and built models from datasets with approximately 1.1M people's names (200K identities) and 130K company names (70K identities). We developed models that allow for joins with precision@1 of .75-.81 and recall of .74-.81. We make the models available for aligning people or companies across multiple datasets. △ Less

Submitted 5 September, 2018; originally announced September 2018.

arXiv:1805.04058 [pdf, other]

doi 10.1145/3211346.3211349

Ariadne: Analysis for Machine Learning Program

Authors: Julian Dolby, Avraham Shinnar, Allison Allain, Jenna Reinen

Abstract: Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for error detection at code creation time than tools li… ▽ More Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for error detection at code creation time than tools like Eclipse. This is especially problematic for machine learning: given its statistical nature, code with subtle errors may run and produce results that look plausible but are meaningless. This can vitiate scientific results. We report on Ariadne: applying a static framework, WALA, to machine learning code that uses TensorFlow. We have created static analysis for Python, a type system for tracking tensors---Tensorflow's core data structures---and a data flow analysis to track their usage. We report on how it was built and present some early results. △ Less

Submitted 10 May, 2018; originally announced May 2018.

arXiv:1801.08928 [pdf, other]

Automatically Extracting Web API Specifications from HTML Documentation

Authors: **qiu Yang, Erik Wittern, Annie T. T. Ying, Julian Dolby, Lin Tan

Abstract: Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily relies on manual efforts by third parties. In this paper, we propose an automatic approach and an associa… ▽ More Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily relies on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting specifications from web API documentation pages. Given a seed online documentation page on an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine learning techniques to extract the base URL, path templates, and HTTP methods, which collectively describe the endpoints of an API. We evaluated whether D2Spec can accurately extract endpoints from documentation on 120 web APIs. The results showed that D2Spec achieved a precision of 87.5% in identifying base URLs, a precision of 81.3% and a recall of 80.6% in generating path templates, and a precision of 84.4% and a recall of 76.2% in extracting HTTP methods. In addition, we found that D2Spec was useful when applied to APIs with pre-existing API specifications: D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. Thus, D2Spec can be used by web API providers to keep documentation and specifications in synchronization. △ Less

Submitted 26 January, 2018; originally announced January 2018.

arXiv:1705.06629 [pdf, other]

Who you gonna call? Analyzing Web Requests in Android Applications

Authors: Marianna Rapoport, Philippe Suter, Erik Wittern, Ondřej Lhoták, Julian Dolby

Abstract: Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point… ▽ More Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data. △ Less

Submitted 18 May, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

arXiv:1705.06586 [pdf, other]

Opportunities in Software Engineering Research for Web API Consumption

Authors: Erik Wittern, Annie Ying, Yunhui Zheng, Jim A. Laredo, Julian Dolby, Christopher C. Young, Aleksander A. Slominski

Abstract: Nowadays, invoking third party code increasingly involves calling web services via their web APIs, as opposed to the more traditional scenario of downloading a library and invoking the library's API. However, there are also new challenges for developers calling these web APIs. In this paper, we highlight a broad set of these challenges and argue for resulting opportunities for software engineering… ▽ More Nowadays, invoking third party code increasingly involves calling web services via their web APIs, as opposed to the more traditional scenario of downloading a library and invoking the library's API. However, there are also new challenges for developers calling these web APIs. In this paper, we highlight a broad set of these challenges and argue for resulting opportunities for software engineering research to support developers in consuming web APIs. We outline two specific research threads in this context: (1) web API specification curation, which enables us to know the signatures of web APIs, and (2) static analysis that is capable of extracting URLs, HTTP methods etc. of web API calls. Furthermore, we present new work on how we combine (1) and (2) to provide IDE support for application developers consuming web APIs. As web APIs are used broadly, research in supporting the consumption of web APIs offers exciting opportunities. △ Less

Submitted 18 May, 2017; originally announced May 2017.

Comments: Erik Wittern and Annie Ying are both first authors

arXiv:1702.03906 [pdf, other]

Statically Checking Web API Requests in JavaScript

Authors: Erik Wittern, Annie T. T. Ying, Yunhui Zheng, Julian Dolby, Jim A. Laredo

Abstract: Many JavaScript applications perform HTTP requests to web APIs, relying on the request URL, HTTP method, and request data to be constructed correctly by string operations. Traditional compile-time error checking, such as calling a non-existent method in Java, are not available for checking whether such requests comply with the requirements of a web API. In this paper, we propose an approach to sta… ▽ More Many JavaScript applications perform HTTP requests to web APIs, relying on the request URL, HTTP method, and request data to be constructed correctly by string operations. Traditional compile-time error checking, such as calling a non-existent method in Java, are not available for checking whether such requests comply with the requirements of a web API. In this paper, we propose an approach to statically check web API requests in JavaScript. Our approach first extracts a request's URL string, HTTP method, and the corresponding request data using an inter-procedural string analysis, and then checks whether the request conforms to given web API specifications. We evaluated our approach by checking whether web API requests in JavaScript files mined from GitHub are consistent or inconsistent with publicly available API specifications. From the 6575 requests in scope, our approach determined whether the request's URL and HTTP method was consistent or inconsistent with web API specifications with a precision of 96.0%. Our approach also correctly determined whether extracted request data was consistent or inconsistent with the data requirements with a precision of 87.9% for payload data and 99.9% for query data. In a systematic analysis of the inconsistent cases, we found that many of them were due to errors in the client code. The here proposed checker can be integrated with code editors or with continuous integration tools to warn programmers about code containing potentially erroneous requests. △ Less

Submitted 15 February, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

Comments: International Conference on Software Engineering, 2017

Showing 1–16 of 16 results for author: Dolby, J