Search | arXiv e-print repository

Regularization and Optimization in Model-Based Clustering

Authors: Raphael Araujo Sampaio, Joaquim Dias Garcia, Marcus Poggi, Thibaut Vidal

Abstract: Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estima… ▽ More Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estimating a quadratic number of parameters per cluster to represent the covariance matrices. This poses two main issues: (i) the underlying optimization problems are challenging due to their larger number of local minima, and (ii) their solutions can overfit the data. In this work, we design search strategies that circumvent both issues. We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. Through extensive computational analyses, we observe that optimization or regularization in isolation does not substantially improve cluster recovery. However, combining these techniques permits a completely new level of performance previously unachieved by k-means algorithm variants, unraveling vastly different cluster structures. These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration. To facilitate such applications, we provide open-source code as well as Julia packages (UnsupervisedClustering.jl and RegularizedCovarianceMatrices.jl) implementing the proposed techniques. △ Less

Submitted 5 February, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

arXiv:2207.08854 [pdf, other]

A Pattern-based deadlock-freedom analysis strategy for concurrent systems

Authors: Pedro Antonino, Augusto Sampaio, Jim Woodcock

Abstract: Local analysis has long been recognised as an effective tool to combat the state-space explosion problem. In this work, we propose a method that systematises the use of local analysis in the verification of deadlock freedom for concurrent and distributed systems. It combines a strategy for system decomposition with the verification of the decomposed subsystems via adherence to behavioural patterns… ▽ More Local analysis has long been recognised as an effective tool to combat the state-space explosion problem. In this work, we propose a method that systematises the use of local analysis in the verification of deadlock freedom for concurrent and distributed systems. It combines a strategy for system decomposition with the verification of the decomposed subsystems via adherence to behavioural patterns. At the core of our work, we have a number of CSP refinement expressions that allows the user of our method to automatically verify all the behavioural restrictions that we impose. We also propose a prototype tool to support our method. Finally, we demonstrate the practical impact our method can have by analysing how it fares when applied to some examples. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2205.07529 [pdf, other]

Specification is Law: Safe Creation and Upgrade of Ethereum Smart Contracts

Authors: Pedro Antonino, Juliandson Ferreira, Augusto Sampaio, A. W. Roscoe

Abstract: Smart contracts are the building blocks of the "code is law" paradigm: the smart contract's code indisputably describes how its assets are to be managed - once it is created, its code is typically immutable. Faulty smart contracts present the most significant evidence against the practicality of this paradigm; they are well-documented and resulted in assets worth vast sums of money being compromis… ▽ More Smart contracts are the building blocks of the "code is law" paradigm: the smart contract's code indisputably describes how its assets are to be managed - once it is created, its code is typically immutable. Faulty smart contracts present the most significant evidence against the practicality of this paradigm; they are well-documented and resulted in assets worth vast sums of money being compromised. To address this issue, the Ethereum community proposed (i) tools and processes to audit/analyse smart contracts, and (ii) design patterns implementing a mechanism to make contract code mutable. Individually, (i) and (ii) only partially address the challenges raised by the "code is law" paradigm. In this paper, we combine elements from (i) and (ii) to create a systematic framework that moves away from "code is law" and gives rise to a new "specification is law" paradigm. It allows contracts to be created and upgraded but only if they meet a corresponding formal specification. The framework is centered around \emph{a trusted deployer}: an off-chain service that formally verifies and enforces this notion of conformance. We have prototyped this framework, and investigated its applicability to contracts implementing two widely used Ethereum standards: the ERC20 Token Standard and ERC1155 Multi Token Standard, with promising results. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2005.10295 [pdf, other]

doi 10.1016/j.jss.2020.110598

A refinement checking based strategy for component-based systems evolution

Authors: José Dihego, Augusto Sampaio, Marcel Oliveira

Abstract: We propose inheritance and refinement relations for a CSP-based component model (BRIC), which supports a constructive design based on composition rules that preserve classical concurrency properties such as deadlock freedom. The proposed relations allow extension of functionality, whilst preserving behavioural properties. A notion of extensibility is defined on top of a behavioural relation called… ▽ More We propose inheritance and refinement relations for a CSP-based component model (BRIC), which supports a constructive design based on composition rules that preserve classical concurrency properties such as deadlock freedom. The proposed relations allow extension of functionality, whilst preserving behavioural properties. A notion of extensibility is defined on top of a behavioural relation called convergence, which distinguishes inputs from outputs and the context where they are communicated, allowing extensions to reuse existing events with different purposes. We mechanise the strategy for extensibility verification using the FDR4 tool, and illustrate our results with an autonomous healthcare robot case study. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: The Journal of Systems and Software

arXiv:1911.04279 [pdf, other]

Community Detection for Power Systems Network Aggregation Considering Renewable Variability

Authors: Raphael Araujo Sampaio, Gerson Couto Oliveira, Luiz Carlos da Costa Jr., Joaquim Dias Garcia

Abstract: The increasing penetration of variable renewable energy (VRE) has brought significant challenges for power systems planning and operation. These highly variable sources are typically distributed in the grid; therefore, a detailed representation of transmission bottlenecks is fundamental to approximate the impact of the transmission network on the dispatch with VRE resources. The fine grain tempora… ▽ More The increasing penetration of variable renewable energy (VRE) has brought significant challenges for power systems planning and operation. These highly variable sources are typically distributed in the grid; therefore, a detailed representation of transmission bottlenecks is fundamental to approximate the impact of the transmission network on the dispatch with VRE resources. The fine grain temporal scale of short term and day-ahead dispatch, taking into account the network constraints, also mandatory for mid-term planning studies, combined with the high variability of the VRE has brought the need to represent these uncertainties in stochastic optimization models while taking into account the transmission system. These requirements impose a computational burden to solve the planning and operation models. We propose a methodology based on community detection to aggregate the network representation, capable of preserving the locational marginal price (LMP) differences in multiple VRE scenarios, and describe a real-world operational planning study. The optimal expected cost solution considering aggregated networks is compared with the full network representation. Both representations were embedded in an operation model relying on Stochastic Dual Dynamic Programming (SDDP) to deal with the random variables in a multi-stage problem. △ Less

Submitted 8 November, 2019; originally announced November 2019.

arXiv:1604.00527 [pdf, ps, other]

A Decomposition Approach to Solve The Quay Crane Scheduling Problem

Authors: Afonso Sampaio, Sebastián Urrutia, Johan Oppen

Abstract: In this work we propose a decomposition approach to solve the quay crane scheduling problem. This is an important maritime transportation problem faced in container terminals where quay cranes are used to handle cargo. The objective is to determine a sequence of loading and unloading operations for each crane in order to minimize the completion time. We solve a mixed integer programming formulatio… ▽ More In this work we propose a decomposition approach to solve the quay crane scheduling problem. This is an important maritime transportation problem faced in container terminals where quay cranes are used to handle cargo. The objective is to determine a sequence of loading and unloading operations for each crane in order to minimize the completion time. We solve a mixed integer programming formulation for the quay crane scheduling problem, decomposing it into a vehicle routing problem and a corresponding scheduling problem. The routing sub-problem is solved by minimizing the longest crane completion time without taking crane interference into account. This solution provides a lower bound for the makespan of the whole problem and is sent to the scheduling sub-problem, where a completion time for each task and the makespan are determined. This scheme resembles Benders' decomposition and, in particular, the scheme underlying combinatorial Benders' cuts. We evaluate the proposed approach by solving instances from the literature and comparing the results with other available methods. △ Less

Submitted 2 April, 2016; originally announced April 2016.

Comments: 24 pages, 4 figures, 4 tables. Work presented at EURO 2015

Showing 1–6 of 6 results for author: Sampaio, A