Search | arXiv e-print repository

arXiv:2010.06076 [pdf, ps, other]

An Information-Theoretic Perspective on Overfitting and Underfitting

Authors: Daniel Bashir, George D. Montanez, Sonia Sehra, Pedro Sandoval Segura, Julius Lauw

Abstract: We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a si… ▽ More We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization. △ Less

Submitted 6 November, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: Accepted for presentation at The 33rd Australasian Joint Conference on Artificial Intelligence (AJCAI 2020), November 29-30, 2020

arXiv:2005.03320 [pdf, other]

Specification and Automated Analysis of Inter-Parameter Dependencies in Web APIs

Authors: Alberto Martin-Lopez, Sergio Segura, Carlos Müller, Antonio Ruiz-Cortés

Abstract: Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and int… ▽ More Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and interact with services without human intervention. In this article, we present an approach for the specification and automated analysis of inter-parameter dependencies in web APIs. We first present a domain-specific language, called Inter-parameter Dependency Language (IDL), for the specification of dependencies among input parameters in web services. Then, we propose a map** to translate an IDL document into a constraint satisfaction problem (CSP), enabling the automated analysis of IDL specifications using standard CSP-based reasoning operations. Specifically, we present a catalogue of nine analysis operations on IDL documents allowing to compute, for example, whether a given request satisfies all the dependencies of the service. Finally, we present a tool suite including an editor, a parser, an OAS extension, a constraint programming-aided library, and a test suite supporting IDL specifications and their analyses. Together, these contributions pave the way for a new range of specification-driven applications in areas such as code generation and testing. △ Less

Submitted 7 May, 2020; originally announced May 2020.

ACM Class: H.0; D.2

arXiv:1912.10597 [pdf, other]

doi 10.5220/0009178209800986

The Labeling Distribution Matrix (LDM): A Tool for Estimating Machine Learning Algorithm Capacity

Authors: Pedro Sandoval Segura, Julius Lauw, Daniel Bashir, Kinjal Shah, Sonia Sehra, Dominique Macias, George Montanez

Abstract: Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capac… ▽ More Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capacity of learning algorithms. The method attempts to characterize the diversity of possible outputs by an algorithm for different training datasets, using this to measure algorithm flexibility and responsiveness to data. We test the method on several supervised learning algorithms, and find that while the results are not conclusive, the LDM does allow us to gain potentially valuable insight into the prediction behavior of algorithms. We also introduce the Label Recorder as an additional tool for estimating algorithm capacity, with more promising initial results. △ Less

Submitted 6 January, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: Accepted to 12th International Conference on Agents and Artificial Intelligence (ICAART 2020), 7 pages including references

arXiv:1804.11121 [pdf, other]

Towards the Automation of Metamorphic Testing in Model Transformations

Authors: Javier Troya, Sergio Segura, Antonio Ruiz-Cortés

Abstract: Model transformations are the cornerstone of Model-Driven Engineering, and provide the essential mechanisms for manipulating and transforming models. Checking whether the output of a model transformation is correct is a manual and error-prone task, this is referred to as the oracle problem in the software testing literature. The correctness of the model transformation program is crucial for the pr… ▽ More Model transformations are the cornerstone of Model-Driven Engineering, and provide the essential mechanisms for manipulating and transforming models. Checking whether the output of a model transformation is correct is a manual and error-prone task, this is referred to as the oracle problem in the software testing literature. The correctness of the model transformation program is crucial for the proper generation of its output, so it should be tested. Metamorphic testing is a testing technique to alleviate the oracle problem consisting on exploiting the relations between different inputs and outputs of the program under test, so-called metamorphic relations. In this paper we give an insight into our approach to generically define metamorphic relations for model transformations, which can be automatically instantiated given any specific model transformation. △ Less

Submitted 30 April, 2018; originally announced April 2018.

Comments: Jornadas de Ingeniería del Software y Bases de Datos (JISBD) 2016

Showing 1–4 of 4 results for author: Segura, S