-
An Information-Theoretic Perspective on Overfitting and Underfitting
Authors:
Daniel Bashir,
George D. Montanez,
Sonia Sehra,
Pedro Sandoval Segura,
Julius Lauw
Abstract:
We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a si…
▽ More
We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.
△ Less
Submitted 6 November, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Specification and Automated Analysis of Inter-Parameter Dependencies in Web APIs
Authors:
Alberto Martin-Lopez,
Sergio Segura,
Carlos Müller,
Antonio Ruiz-Cortés
Abstract:
Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and int…
▽ More
Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and interact with services without human intervention. In this article, we present an approach for the specification and automated analysis of inter-parameter dependencies in web APIs. We first present a domain-specific language, called Inter-parameter Dependency Language (IDL), for the specification of dependencies among input parameters in web services. Then, we propose a map** to translate an IDL document into a constraint satisfaction problem (CSP), enabling the automated analysis of IDL specifications using standard CSP-based reasoning operations. Specifically, we present a catalogue of nine analysis operations on IDL documents allowing to compute, for example, whether a given request satisfies all the dependencies of the service. Finally, we present a tool suite including an editor, a parser, an OAS extension, a constraint programming-aided library, and a test suite supporting IDL specifications and their analyses. Together, these contributions pave the way for a new range of specification-driven applications in areas such as code generation and testing.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
The Labeling Distribution Matrix (LDM): A Tool for Estimating Machine Learning Algorithm Capacity
Authors:
Pedro Sandoval Segura,
Julius Lauw,
Daniel Bashir,
Kinjal Shah,
Sonia Sehra,
Dominique Macias,
George Montanez
Abstract:
Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capac…
▽ More
Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capacity of learning algorithms. The method attempts to characterize the diversity of possible outputs by an algorithm for different training datasets, using this to measure algorithm flexibility and responsiveness to data. We test the method on several supervised learning algorithms, and find that while the results are not conclusive, the LDM does allow us to gain potentially valuable insight into the prediction behavior of algorithms. We also introduce the Label Recorder as an additional tool for estimating algorithm capacity, with more promising initial results.
△ Less
Submitted 6 January, 2020; v1 submitted 22 December, 2019;
originally announced December 2019.
-
Towards the Automation of Metamorphic Testing in Model Transformations
Authors:
Javier Troya,
Sergio Segura,
Antonio Ruiz-Cortés
Abstract:
Model transformations are the cornerstone of Model-Driven Engineering, and provide the essential mechanisms for manipulating and transforming models. Checking whether the output of a model transformation is correct is a manual and error-prone task, this is referred to as the oracle problem in the software testing literature. The correctness of the model transformation program is crucial for the pr…
▽ More
Model transformations are the cornerstone of Model-Driven Engineering, and provide the essential mechanisms for manipulating and transforming models. Checking whether the output of a model transformation is correct is a manual and error-prone task, this is referred to as the oracle problem in the software testing literature. The correctness of the model transformation program is crucial for the proper generation of its output, so it should be tested. Metamorphic testing is a testing technique to alleviate the oracle problem consisting on exploiting the relations between different inputs and outputs of the program under test, so-called metamorphic relations. In this paper we give an insight into our approach to generically define metamorphic relations for model transformations, which can be automatically instantiated given any specific model transformation.
△ Less
Submitted 30 April, 2018;
originally announced April 2018.