-
Domain Reasoning in TopKAT
Authors:
Cheng Zhang,
Arthur Azevedo de Amorim,
Marco Gaboardi
Abstract:
TopKAT is the algebraic theory of Kleene algebra with tests (KAT) extended with a top element. Compared to KAT, one pleasant feature of TopKAT is that, in relational models, the top element allows us to express the domain and codomain of a relation. This enables several applications in program logics, such as proving under-approximate specifications or reachability properties of imperative program…
▽ More
TopKAT is the algebraic theory of Kleene algebra with tests (KAT) extended with a top element. Compared to KAT, one pleasant feature of TopKAT is that, in relational models, the top element allows us to express the domain and codomain of a relation. This enables several applications in program logics, such as proving under-approximate specifications or reachability properties of imperative programs. However, while TopKAT inherits many pleasant features of KATs, such as having a decidable equational theory, it is incomplete with respect to relational models. In other words, there are properties that hold true of all relational TopKATs but cannot be proved with the axioms of TopKAT. This issue is potentially worrisome for program-logic applications, in which relational models play a key role.
In this paper, we further investigate the completeness properties of TopKAT with respect to relational models. We show that TopKAT is complete with respect to (co)domain comparison of KAT terms, but incomplete when comparing the (co)domain of arbitrary TopKAT terms. Since the encoding of under-approximate specifications in TopKAT hinges on this type of formula, the aforementioned incompleteness results have a limited impact when using TopKAT to reason about such specifications.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Compositional Expected Cost Analysis of Functional Probabilistic Programs
Authors:
Pedro H. Azevedo de Amorim
Abstract:
Reasoning about resources used during the execution of programs, such as time, is one of the fundamental questions in computer science. When programming with probabilistic primitives, however, different samples may result in different resource usage, making the cost of a program not a single number but a distribution instead.
The expected cost is an important metric used to quantify the efficien…
▽ More
Reasoning about resources used during the execution of programs, such as time, is one of the fundamental questions in computer science. When programming with probabilistic primitives, however, different samples may result in different resource usage, making the cost of a program not a single number but a distribution instead.
The expected cost is an important metric used to quantify the efficiency of probabilistic programs. In this work we introduce $\mathbf{cert}$, a call-by-push-value (CBPV) metalanguage extended with primitives for probability, cost and unbounded recursion, and give it denotational semantics for reasoning about the average cost of programs. We justify the validity of the semantics by presenting case-studies ranging from randomized algorithms to stochastic processes and showing how the semantics captures their intended cost.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
SECOMP: Formally Secure Compilation of Compartmentalized C Programs
Authors:
Jérémy Thibault,
Roberto Blanco,
Dongjae Lee,
Sven Argo,
Arthur Azevedo de Amorim,
Aïna Linn Georges,
Catalin Hritcu,
Andrew Tolmach
Abstract:
Undefined behavior in C often causes devastating security vulnerabilities. One practical mitigation is compartmentalization, which allows developers to structure large programs into mutually distrustful compartments with clearly specified privileges and interactions. In this paper we introduce SECOMP, a compiler for compartmentalized C code that comes with machine-checked proofs guaranteeing that…
▽ More
Undefined behavior in C often causes devastating security vulnerabilities. One practical mitigation is compartmentalization, which allows developers to structure large programs into mutually distrustful compartments with clearly specified privileges and interactions. In this paper we introduce SECOMP, a compiler for compartmentalized C code that comes with machine-checked proofs guaranteeing that the scope of undefined behavior is restricted to the compartments that encounter it and become dynamically compromised. These guarantees are formalized as the preservation of safety properties against adversarial contexts, a secure compilation criterion similar to full abstraction, and this is the first time such a strong criterion is proven for a mainstream programming language. To achieve this we extend the languages of the CompCert verified C compiler with isolated compartments that can only interact via procedure calls and returns, as specified by cross-compartment interfaces. We adapt the passes and optimizations of CompCert as well as their correctness proofs to this compartment-aware setting. We then use compiler correctness as an ingredient in a larger secure compilation proof that involves several proof engineering novelties, needed to scale formally secure compilation up to a C compiler.
△ Less
Submitted 1 July, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Pipelines and Beyond: Graph Types for ADTs with Futures
Authors:
Francis Rinaldi,
june wunder,
Arthur Aevedo De Amorim,
Stefan K. Muller
Abstract:
Parallel programs are frequently modeled as dependency or cost graphs, which can be used to detect various bugs, or simply to visualize the parallel structure of the code. However, such graphs reflect just one particular execution and are typically constructed in a post-hoc manner. Graph types, which were introduced recently to mitigate this problem, can be assigned statically to a program by a ty…
▽ More
Parallel programs are frequently modeled as dependency or cost graphs, which can be used to detect various bugs, or simply to visualize the parallel structure of the code. However, such graphs reflect just one particular execution and are typically constructed in a post-hoc manner. Graph types, which were introduced recently to mitigate this problem, can be assigned statically to a program by a type system and compactly represent the family of all graphs that could result from the program. Unfortunately, prior work is restricted in its treatment of futures, an increasingly common and especially dynamic form of parallelism. In short, each instance of a future must be statically paired with a vertex name. Previously, this led to the restriction that futures could not be placed in collections or be used to construct data structures. Doing so is not a niche exercise: such structures form the basis of numerous algorithms that use forms of pipelining to achieve performance not attainable without futures. All but the most limited of these examples are out of reach of prior graph type systems. In this paper, we propose a graph type system that allows for almost arbitrary combinations of futures and recursive data types. We do so by indexing datatypes with a type-level vertex structure, a codata structure that supplies unique vertex names to the futures in a data structure. We prove the soundness of the system in a parallel core calculus annotated with vertex structures and associated operations. Although the calculus is annotated, this is merely for convenience in defining the type system. We prove that it is possible to annotate arbitrary recursive types with vertex structures, and show using a prototype inference engine that these annotations can be inferred from OCaml-like source code for several complex parallel algorithms.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Modular Hardware Design with Timeline Types
Authors:
Rachit Nigam,
Pedro Henrique Azevedo De Amorim,
Adrian Sampson
Abstract:
Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints -- which cycle a signal arrives, when an input is read -- and structural constraints -- how often a multiplier accepts new inputs -- are fundamental to hardware interfaces. Existing har…
▽ More
Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints -- which cycle a signal arrives, when an input is read -- and structural constraints -- how often a multiplier accepts new inputs -- are fundamental to hardware interfaces. Existing hardware design languages do not provide a way to encode these constraints; a user must read documentation, build scripts, or in the worst case, a module's implementation to understand how to use it. We present Filament, a language for modular hardware design that supports the specification and enforcement of timing and structural constraints for statically scheduled pipelines. Filament uses timeline types, which describe the intervals of clock-cycle time when a given signal is available or required. Filament enables safe composition of hardware modules, ensures that the resulting designs are correctly pipelined, and predictably lowers them to efficient hardware.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Robust human position estimation in cooperative robotic cells
Authors:
António Amorim,
Diana Guimarães,
Tiago Mendonça,
Pedro Neto,
Paulo Costa,
António Paulo Moreira
Abstract:
Robots are increasingly present in our lives, sharing the workspace and tasks with human co-workers. However, existing interfaces for human-robot interaction / cooperation (HRI/C) have limited levels of intuitiveness to use and safety is a major concern when humans and robots share the same workspace. Many times, this is due to the lack of a reliable estimation of the human pose in space which is…
▽ More
Robots are increasingly present in our lives, sharing the workspace and tasks with human co-workers. However, existing interfaces for human-robot interaction / cooperation (HRI/C) have limited levels of intuitiveness to use and safety is a major concern when humans and robots share the same workspace. Many times, this is due to the lack of a reliable estimation of the human pose in space which is the primary input to calculate the human-robot minimum distance (required for safety and collision avoidance) and HRI/C featuring machine learning algorithms classifying human behaviours / gestures. Each sensor type has its own characteristics resulting in problems such as occlusions (vision) and drift (inertial) when used in an isolated fashion. In this paper, it is proposed a combined system that merges the human tracking provided by a 3D vision sensor with the pose estimation provided by a set of inertial measurement units (IMUs) placed in human body limbs. The IMUs compensate the gaps in occluded areas to have tracking continuity. To mitigate the lingering effects of the IMU offset we propose a continuous online calculation of the offset value. Experimental tests were designed to simulate human motion in a human-robot collaborative environment where the robot moves away to avoid unexpected collisions with de human. Results indicate that our approach is able to capture the human\textsc's position, for example the forearm, with a precision in the millimetre range and robustness to occlusions.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Separated and Shared Effects in Higher-Order Languages
Authors:
Pedro H. Azevedo de Amorim,
Justin Hsu
Abstract:
Effectful programs interact in ways that go beyond simple input-output, making compositional reasoning challenging. Existing work has shown that when such programs are ``separate'', i.e., when programs do not interfere with each other, it can be easier to reason about them. While reasoning about separated resources has been well-studied, there has been little work on reasoning about separated effe…
▽ More
Effectful programs interact in ways that go beyond simple input-output, making compositional reasoning challenging. Existing work has shown that when such programs are ``separate'', i.e., when programs do not interfere with each other, it can be easier to reason about them. While reasoning about separated resources has been well-studied, there has been little work on reasoning about separated effects, especially for functional, higher-order programming languages. We propose two higher-order languages that can reason about sharing and separation in effectful programs. Our first language $λ_{\text{INI}}$ has a linear type system and probabilistic semantics, where the two product types capture independent and possibly-dependent pairs. Our second language $λ_{\text{INI}}^2$ is two-level, stratified language, inspired by Benton's linear-non-linear (LNL) calculus. We motivate this language with a probabilistic model, but we also provide a general categorical semantics and exhibit a range of concrete models beyond probabilistic programming. We prove soundness theorems for all of our languages; our general soundness theorem for our categorical models of $λ_{\text{INI}}^2$ uses a categorical gluing construction.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Comparative analysis of deep learning approaches for AgNOR-stained cytology samples interpretation
Authors:
João Gustavo Atkinson Amorim,
André Victória Matias,
Allan Cerentini,
Luiz Antonio Buschetto Macarini,
Alexandre Sherlley Onofre,
Fabiana Botelho Onofre,
Aldo von Wangenheim
Abstract:
Cervical cancer is a public health problem, where the treatment has a better chance of success if detected early. The analysis is a manual process which is subject to a human error, so this paper provides a way to analyze argyrophilic nucleolar organizer regions (AgNOR) stained slide using deep learning approaches. Also, this paper compares models for instance and semantic detection approaches. Ou…
▽ More
Cervical cancer is a public health problem, where the treatment has a better chance of success if detected early. The analysis is a manual process which is subject to a human error, so this paper provides a way to analyze argyrophilic nucleolar organizer regions (AgNOR) stained slide using deep learning approaches. Also, this paper compares models for instance and semantic detection approaches. Our results show that the semantic segmentation using U-Net with ResNet-18 or ResNet-34 as the backbone have similar results, and the best model shows an IoU for nucleus, cluster, and satellites of 0.83, 0.92, and 0.99 respectively. For instance segmentation, the Mask R-CNN using ResNet-50 performs better in the visual inspection and has a 0.61 of the IoU metric. We conclude that the instance segmentation and semantic segmentation models can be used in combination to make a cascade model able to select a nucleus and subsequently segment the nucleus and its respective nucleolar organizer regions (NORs).
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
On Pitts' Relational Properties of Domains
Authors:
Arthur Azevedo de Amorim
Abstract:
Andrew Pitts' framework of relational properties of domains is a powerful method for defining predicates or relations on domains, with applications ranging from reasoning principles for program equivalence to proofs of adequacy connecting denotational and operational semantics. Its main appeal is handling recursive definitions that are not obviously well-founded: as long as the corresponding domai…
▽ More
Andrew Pitts' framework of relational properties of domains is a powerful method for defining predicates or relations on domains, with applications ranging from reasoning principles for program equivalence to proofs of adequacy connecting denotational and operational semantics. Its main appeal is handling recursive definitions that are not obviously well-founded: as long as the corresponding domain is also defined recursively, and its recursion pattern lines up appropriately with the definition of the relations, the framework can guarantee their existence. Pitts' original development used the Knaster-Tarski fixed-point theorem as a key ingredient. In these notes, I show how his construction can be seen as an instance of other key fixed-point theorems: the inverse limit construction, the Banach fixed-point theorem and the Kleene fixed-point theorem. The connection underscores how Pitts' construction is intimately tied to the methods for constructing the base recursive domains themselves, and also to techniques based on guarded recursion, or step-indexing, that have become popular in the last two decades.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Distribution Theoretic Semantics for Non-Smooth Differentiable Programming
Authors:
Pedro H. Azevedo de Amorim,
Christopher Lam
Abstract:
With the wide spread of deep learning and gradient descent inspired optimization algorithms, differentiable programming has gained traction. Nowadays it has found applications in many different areas as well, such as scientific computing, robotics, computer graphics and others. One of its notoriously difficult problems consists in interpreting programs that are not differentiable everywhere.
In…
▽ More
With the wide spread of deep learning and gradient descent inspired optimization algorithms, differentiable programming has gained traction. Nowadays it has found applications in many different areas as well, such as scientific computing, robotics, computer graphics and others. One of its notoriously difficult problems consists in interpreting programs that are not differentiable everywhere.
In this work we define $λ_δ$, a core calculus for non-smooth differentiable programs and define its semantics using concepts from distribution theory, a well-established area of functional analysis. We also show how $λ_δ$ presents better equational properties than other existing semantics and use our semantics to reason about a simplified ray tracing algorithm. Further, we relate our semantics to existing differentiable languages by providing translations to and from other existing differentiable semantic models. Finally, we provide a proof-of-concept implementation in PyTorch of the novel constructions in this paper.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Bunched Fuzz: Sensitivity for Vector Metrics
Authors:
june wunder,
Arthur Azevedo de Amorim,
Patrick Baillot,
Marco Gaboardi
Abstract:
Program sensitivity measures the distance between the outputs of a program when run on two related inputs. This notion, which plays a key role in areas such as data privacy and optimization, has been the focus of several program analysis techniques introduced in recent years. Among the most successful ones, we can highlight type systems inspired by linear logic, as pioneered by Reed and Pierce in…
▽ More
Program sensitivity measures the distance between the outputs of a program when run on two related inputs. This notion, which plays a key role in areas such as data privacy and optimization, has been the focus of several program analysis techniques introduced in recent years. Among the most successful ones, we can highlight type systems inspired by linear logic, as pioneered by Reed and Pierce in the Fuzz programming language. In Fuzz, each type is equipped with its own distance, and sensitivity analysis boils down to type checking. In particular, Fuzz features two product types, corresponding to two different notions of distance: the tensor product combines the distances of each component by adding them, while the with product takes their maximum.
In this work, we show that these products can be generalized to arbitrary $L^p$ distances, metrics that are often used in privacy and optimization. The original Fuzz products, tensor and with, correspond to the special cases $L^1$ and $L^\infty$. To ease the handling of such products, we extend the Fuzz type system with bunches -- as in the logic of bunched implications -- where the distances of different groups of variables can be combined using different $L^p$ distances. We show that our extension can be used to reason about quantitative properties of probabilistic programs.
△ Less
Submitted 1 February, 2023; v1 submitted 3 February, 2022;
originally announced February 2022.
-
A Higher-Order Language for Markov Kernels and Linear Operators
Authors:
Pedro H. Azevedo de Amorim
Abstract:
Much work has been done to give semantics to probabilistic programming languages. In recent years, most of the semantics used to reason about probabilistic programs fall in two categories: semantics based on Markov kernels and semantics based on linear operators.
Both styles of semantics have found numerous applications in reasoning about probabilistic programs, but they each have their strength…
▽ More
Much work has been done to give semantics to probabilistic programming languages. In recent years, most of the semantics used to reason about probabilistic programs fall in two categories: semantics based on Markov kernels and semantics based on linear operators.
Both styles of semantics have found numerous applications in reasoning about probabilistic programs, but they each have their strengths and weaknesses. Though it is believed that there is a connection between them there are no languages that can handle both styles of programming.
In this work we address these questions by defining a two-level calculus and its categorical semantics which makes it possible to program with both kinds of semantics. From the logical side of things we see this language as an alternative resource interpretation of linear logic, where the resource being kept track of is sampling instead of variable use.
△ Less
Submitted 2 March, 2023; v1 submitted 31 January, 2022;
originally announced February 2022.
-
On Incorrectness Logic and Kleene Algebra with Top and Tests
Authors:
Cheng Zhang,
Arthur Azevedo de Amorim,
Marco Gaboardi
Abstract:
Kleene algebra with tests (KAT) is a foundational equational framework for reasoning about programs, which has found applications in program transformations, networking and compiler optimizations, among many other areas. In his seminal work, Kozen proved that KAT subsumes propositional Hoare logic, showing that one can reason about the (partial) correctness of while programs by means of the equati…
▽ More
Kleene algebra with tests (KAT) is a foundational equational framework for reasoning about programs, which has found applications in program transformations, networking and compiler optimizations, among many other areas. In his seminal work, Kozen proved that KAT subsumes propositional Hoare logic, showing that one can reason about the (partial) correctness of while programs by means of the equational theory of KAT. In this work, we investigate the support that KAT provides for reasoning about incorrectness, instead, as embodied by Ohearn's recently proposed incorrectness logic. We show that KAT cannot directly express incorrectness logic. The main reason for this limitation can be traced to the fact that KAT cannot express explicitly the notion of codomain, which is essential to express incorrectness triples. To address this issue, we study Kleene Algebra with Top and Tests (TopKAT), an extension of KAT with a top element. We show that TopKAT is powerful enough to express a codomain operation, to express incorrectness triples, and to prove all the rules of incorrectness logic sound. This shows that one can reason about the incorrectness of while-like programs by means of the equational theory of TopKAT.
△ Less
Submitted 4 August, 2022; v1 submitted 17 August, 2021;
originally announced August 2021.
-
What is the State of the Art of Computer Vision-Assisted Cytology? A Systematic Literature Review
Authors:
André Victória Matias,
João Gustavo Atkinson Amorim,
Luiz Antonio Buschetto Macarini,
Allan Cerentini,
Alexandre Sherlley Casimiro Onofre,
Fabiana Botelho de Miranda Onofre,
Felipe Perozzo Daltoé,
Marcelo Ricardo Stemmer,
Aldo von Wangenheim
Abstract:
Cytology is a low-cost and non-invasive diagnostic procedure employed to support the diagnosis of a broad range of pathologies. Computer Vision technologies, by automatically generating quantitative and objective descriptions of examinations' contents, can help minimize the chances of misdiagnoses and shorten the time required for analysis. To identify the state-of-art of computer vision technique…
▽ More
Cytology is a low-cost and non-invasive diagnostic procedure employed to support the diagnosis of a broad range of pathologies. Computer Vision technologies, by automatically generating quantitative and objective descriptions of examinations' contents, can help minimize the chances of misdiagnoses and shorten the time required for analysis. To identify the state-of-art of computer vision techniques currently applied to cytology, we conducted a Systematic Literature Review. We analyzed papers published in the last 5 years. The initial search was executed in September 2020 and resulted in 431 articles. After applying the inclusion/exclusion criteria, 157 papers remained, which we analyzed to build a picture of the tendencies and problems present in this research area, highlighting the computer vision methods, staining techniques, evaluation metrics, and the availability of the used datasets and computer code. As a result, we identified that the most used methods in the analyzed works are deep learning-based (70 papers), while fewer works employ classic computer vision only (101 papers). The most recurrent metric used for classification and object detection was the accuracy (33 papers and 5 papers), while for segmentation it was the Dice Similarity Coefficient (38 papers). Regarding staining techniques, Papanicolaou was the most employed one (130 papers), followed by H&E (20 papers) and Feulgen (5 papers). Twelve of the datasets used in the papers are publicly available, with the DTU/Herlev dataset being the most used one. We conclude that there still is a lack of high-quality datasets for many types of stains and most of the works are not mature enough to be applied in a daily clinical diagnostic routine. We also identified a growing tendency towards adopting deep learning-based approaches as the methods of choice.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
First-Order Logic for Flow-Limited Authorization
Authors:
Andrew K. Hirsch,
Pedro H. Azevedo de Amorim,
Ethan Cecchetti,
Ross Tate,
Owen Arden
Abstract:
We present the Flow-Limited Authorization First-Order Logic (FLAFOL), a logic for reasoning about authorization decisions in the presence of information-flow policies. We formalize the FLAFOL proof system, characterize its proof-theoretic properties, and develop its security guarantees. In particular, FLAFOL is the first logic to provide a non-interference guarantee while supporting all connective…
▽ More
We present the Flow-Limited Authorization First-Order Logic (FLAFOL), a logic for reasoning about authorization decisions in the presence of information-flow policies. We formalize the FLAFOL proof system, characterize its proof-theoretic properties, and develop its security guarantees. In particular, FLAFOL is the first logic to provide a non-interference guarantee while supporting all connectives of first-order logic. Furthermore, this guarantee is the first to combine the notions of non-interference from both authorization logic and information-flow systems. All theorems in this paper are proven in Coq.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Probabilistic Relational Reasoning via Metrics
Authors:
Arthur Azevedo de Amorim,
Marco Gaboardi,
Justin Hsu,
Shin-ya Katsumata
Abstract:
The Fuzz programming language [Reed and Pierce, 2010] uses an elegant linear type system combined with a monad-like type to express and reason about probabilistic sensitivity properties, most notably $ε$-differential privacy. We show how to extend Fuzz to capture more general relational properties of probabilistic programs, with approximate, or $(ε, δ)$-differential privacy serving as a leading ex…
▽ More
The Fuzz programming language [Reed and Pierce, 2010] uses an elegant linear type system combined with a monad-like type to express and reason about probabilistic sensitivity properties, most notably $ε$-differential privacy. We show how to extend Fuzz to capture more general relational properties of probabilistic programs, with approximate, or $(ε, δ)$-differential privacy serving as a leading example. Our technical contributions are threefold. First, we introduce the categorical notion of comonadic lifting of a monad to model composition properties of probabilistic divergences. Then, we show how to express relational properties in terms of sensitivity properties via an adjunction we call the path construction. Finally, we instantiate our semantics to model the terminating fragment of Fuzz extended with types carrying information about other divergences between distributions.
△ Less
Submitted 18 April, 2019; v1 submitted 13 July, 2018;
originally announced July 2018.
-
When Good Components Go Bad: Formally Secure Compilation Despite Dynamic Compromise
Authors:
Carmine Abate,
Arthur Azevedo de Amorim,
Roberto Blanco,
Ana Nora Evans,
Guglielmo Fachini,
Catalin Hritcu,
Théo Laurent,
Benjamin C. Pierce,
Marco Stronati,
Jérémy Thibault,
Andrew Tolmach
Abstract:
We propose a new formal criterion for evaluating secure compilation schemes for unsafe languages, expressing end-to-end security guarantees for software components that may become compromised after encountering undefined behavior---for example, by accessing an array out of bounds.
Our criterion is the first to model dynamic compromise in a system of mutually distrustful components with clearly s…
▽ More
We propose a new formal criterion for evaluating secure compilation schemes for unsafe languages, expressing end-to-end security guarantees for software components that may become compromised after encountering undefined behavior---for example, by accessing an array out of bounds.
Our criterion is the first to model dynamic compromise in a system of mutually distrustful components with clearly specified privileges. It articulates how each component should be protected from all the others---in particular, from components that have encountered undefined behavior and become compromised. Each component receives secure compilation guarantees---in particular, its internal invariants are protected from compromised components---up to the point when this component itself becomes compromised, after which we assume an attacker can take complete control and use this component's privileges to attack other components. More precisely, a secure compilation chain must ensure that a dynamically compromised component cannot break the safety properties of the system at the target level any more than an arbitrary attacker-controlled component (with the same interface and privileges, but without undefined behaviors) already could at the source level.
To illustrate the model, we construct a secure compilation chain for a small unsafe language with buffers, procedures, and components, targeting a simple abstract machine with built-in compartmentalization. We give a machine-checked proof in Coq that this compiler satisfies our secure compilation criterion. Finally, we show that the protection guarantees offered by the compartmentalized abstract machine can be achieved at the machine-code level using either software fault isolation or a tag-based reference monitor.
△ Less
Submitted 29 November, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Formally Secure Compilation of Unsafe Low-Level Components (Extended Abstract)
Authors:
Guglielmo Fachini,
Catalin Hritcu,
Marco Stronati,
Ana Nora Evans,
Théo Laurent,
Arthur Azevedo de Amorim,
Benjamin C. Pierce,
Andrew Tolmach
Abstract:
We propose a new formal criterion for secure compilation, providing strong security guarantees for components written in unsafe, low-level languages with C-style undefined behavior. Our criterion goes beyond recent proposals, which protect the trace properties of a single component against an adversarial context, to model dynamic compromise in a system of mutually distrustful components. Each comp…
▽ More
We propose a new formal criterion for secure compilation, providing strong security guarantees for components written in unsafe, low-level languages with C-style undefined behavior. Our criterion goes beyond recent proposals, which protect the trace properties of a single component against an adversarial context, to model dynamic compromise in a system of mutually distrustful components. Each component is protected from all the others until it receives an input that triggers an undefined behavior, causing it to become compromised and attack the remaining uncompromised components. To illustrate this model, we demonstrate a secure compilation chain for an unsafe language with buffers, procedures, and components, compiled to a simple RISC abstract machine with built-in compartmentalization. The protection guarantees offered by this abstract machine can be achieved at the machine-code level using either software fault isolation or tag-based reference monitoring. We are working on machine-checked proofs showing that this compiler satisfies our secure compilation criterion.
△ Less
Submitted 31 October, 2017; v1 submitted 19 October, 2017;
originally announced October 2017.
-
The Meaning of Memory Safety
Authors:
Arthur Azevedo de Amorim,
Catalin Hritcu,
Benjamin C. Pierce
Abstract:
We give a rigorous characterization of what it means for a programming language to be memory safe, capturing the intuition that memory safety supports local reasoning about state. We formalize this principle in two ways. First, we show how a small memory-safe language validates a noninterference property: a program can neither affect nor be affected by unreachable parts of the state. Second, we ex…
▽ More
We give a rigorous characterization of what it means for a programming language to be memory safe, capturing the intuition that memory safety supports local reasoning about state. We formalize this principle in two ways. First, we show how a small memory-safe language validates a noninterference property: a program can neither affect nor be affected by unreachable parts of the state. Second, we extend separation logic, a proof system for heap-manipulating programs, with a memory-safe variant of its frame rule. The new rule is stronger because it applies even when parts of the program are buggy or malicious, but also weaker because it demands a stricter form of separation between parts of the program state. We also consider a number of pragmatically motivated variations on memory safety and the reasoning principles they support. As an application of our characterization, we evaluate the security of a previously proposed dynamic monitor for memory safety of heap-allocated data.
△ Less
Submitted 6 April, 2018; v1 submitted 20 May, 2017;
originally announced May 2017.
-
A Semantic Account of Metric Preservation
Authors:
Arthur Azevedo de Amorim,
Marco Gaboardi,
Justin Hsu,
Shin-ya Katsumata,
Ikram Cherigui
Abstract:
Program sensitivity measures how robust a program is to small changes in its input, and is a fundamental notion in domains ranging from differential privacy to cyber-physical systems. A natural way to formalize program sensitivity is in terms of metrics on the input and output spaces, requiring that an $r$-sensitive function map inputs that are at distance $d$ to outputs that are at distance at mo…
▽ More
Program sensitivity measures how robust a program is to small changes in its input, and is a fundamental notion in domains ranging from differential privacy to cyber-physical systems. A natural way to formalize program sensitivity is in terms of metrics on the input and output spaces, requiring that an $r$-sensitive function map inputs that are at distance $d$ to outputs that are at distance at most $r \cdot d$. Program sensitivity is thus an analogue of Lipschitz continuity for programs.
Reed and Pierce introduced Fuzz, a functional language with a linear type system that can express program sensitivity. They show soundness operationally, in the form of a metric preservation property. Inspired by their work, we study program sensitivity and metric preservation from a denotational point of view. In particular, we introduce metric CPOs, a novel semantic structure for reasoning about computation on metric spaces, by endowing CPOs with a compatible notion of distance. This structure is useful for reasoning about metric properties of programs, and specifically about program sensitivity. We demonstrate metric CPOs by giving a model for the deterministic fragment of Fuzz.
△ Less
Submitted 23 October, 2022; v1 submitted 1 February, 2017;
originally announced February 2017.
-
Beyond Good and Evil: Formalizing the Security Guarantees of Compartmentalizing Compilation
Authors:
Yannis Juglaret,
Catalin Hritcu,
Arthur Azevedo de Amorim,
Boris Eng,
Benjamin C. Pierce
Abstract:
Compartmentalization is good security-engineering practice. By breaking a large software system into mutually distrustful components that run with minimal privileges, restricting their interactions to conform to well-defined interfaces, we can limit the damage caused by low-level attacks such as control-flow hijacking. When used to defend against such attacks, compartmentalization is often impleme…
▽ More
Compartmentalization is good security-engineering practice. By breaking a large software system into mutually distrustful components that run with minimal privileges, restricting their interactions to conform to well-defined interfaces, we can limit the damage caused by low-level attacks such as control-flow hijacking. When used to defend against such attacks, compartmentalization is often implemented cooperatively by a compiler and a low-level compartmentalization mechanism. However, the formal guarantees provided by such compartmentalizing compilation have seen surprisingly little investigation.
We propose a new security property, secure compartmentalizing compilation (SCC), that formally characterizes the guarantees provided by compartmentalizing compilation and clarifies its attacker model. We reconstruct our property by starting from the well-established notion of fully abstract compilation, then identifying and lifting three important limitations that make standard full abstraction unsuitable for compartmentalization. The connection to full abstraction allows us to prove SCC by adapting established proof techniques; we illustrate this with a compiler from a simple unsafe imperative language with procedures to a compartmentalized abstract machine.
△ Less
Submitted 15 April, 2017; v1 submitted 14 February, 2016;
originally announced February 2016.
-
Towards a Fully Abstract Compiler Using Micro-Policies: Secure Compilation for Mutually Distrustful Components
Authors:
Yannis Juglaret,
Catalin Hritcu,
Arthur Azevedo de Amorim,
Benjamin C. Pierce,
Antal Spector-Zabusky,
Andrew Tolmach
Abstract:
Secure compilation prevents all low-level attacks on compiled code and allows for sound reasoning about security in the source language. In this work we propose a new attacker model for secure compilation that extends the well-known notion of full abstraction to ensure protection for mutually distrustful components. We devise a compiler chain (compiler, linker, and loader) and a novel security mon…
▽ More
Secure compilation prevents all low-level attacks on compiled code and allows for sound reasoning about security in the source language. In this work we propose a new attacker model for secure compilation that extends the well-known notion of full abstraction to ensure protection for mutually distrustful components. We devise a compiler chain (compiler, linker, and loader) and a novel security monitor that together defend against this strong attacker model. The monitor is implemented using a recently proposed, generic tag-based protection framework called micro-policies, which comes with hardware support for efficient caching and with a formal verification methodology. Our monitor protects the abstractions of a simple object-oriented language---class isolation, the method call discipline, and type safety---against arbitrary low-level attackers.
△ Less
Submitted 2 October, 2015;
originally announced October 2015.
-
A Verified Information-Flow Architecture
Authors:
Arthur Azevedo de Amorim,
Nathan Collins,
André DeHon,
Delphine Demange,
Catalin Hritcu,
David Pichardie,
Benjamin C. Pierce,
Randy Pollack,
Andrew Tolmach
Abstract:
SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for tracking and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained programmable tags, with efficient and flexible propagation and combination of tags as instructions are executed. The operating system virtualizes these generic facilities to present an information-flow…
▽ More
SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for tracking and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained programmable tags, with efficient and flexible propagation and combination of tags as instructions are executed. The operating system virtualizes these generic facilities to present an information-flow abstract machine that allows user programs to label sensitive data with rich confidentiality policies. We present a formal, machine-checked model of the key hardware and software mechanisms used to dynamically control information flow in SAFE and an end-to-end proof of noninterference for this model.
We use a refinement proof methodology to propagate the noninterference property of the abstract machine down to the concrete machine level. We use an intermediate layer in the refinement chain that factors out the details of the information-flow control policy and devise a code generator for compiling such information-flow policies into low-level monitor code. Finally, we verify the correctness of this generator using a dedicated Hoare logic that abstracts from low-level machine instructions into a reusable set of verified structured code generators.
△ Less
Submitted 6 March, 2016; v1 submitted 22 September, 2015;
originally announced September 2015.
-
Really Natural Linear Indexed Type Checking
Authors:
Arthur Azevedo de Amorim,
Emilio Jesús Gallego Arias,
Marco Gaboardi,
Justin Hsu
Abstract:
Recent works have shown the power of linear indexed type systems for enforcing complex program properties. These systems combine linear types with a language of type-level indices, allowing more fine-grained analyses. Such systems have been fruitfully applied in diverse domains, including implicit complexity and differential privacy. A natural way to enhance the expressiveness of this approach is…
▽ More
Recent works have shown the power of linear indexed type systems for enforcing complex program properties. These systems combine linear types with a language of type-level indices, allowing more fine-grained analyses. Such systems have been fruitfully applied in diverse domains, including implicit complexity and differential privacy. A natural way to enhance the expressiveness of this approach is by allowing the indices to depend on runtime information, in the spirit of dependent types. This approach is used in DFuzz, a language for differential privacy. The DFuzz type system relies on an index language supporting real and natural number arithmetic over constants and variables. Moreover, DFuzz uses a subty** mechanism to make types more flexible. By themselves, linearity, dependency, and subty** each require delicate handling when performing type checking or type inference; their combination increases this challenge substantially, as the features can interact in non-trivial ways. In this paper, we study the type-checking problem for DFuzz. We show how we can reduce type checking for (a simple extension of) DFuzz to constraint solving over a first-order theory of naturals and real numbers which, although undecidable, can often be handled in practice by standard numeric solvers.
△ Less
Submitted 16 March, 2015;
originally announced March 2015.
-
Testing Noninterference, Quickly
Authors:
Catalin Hritcu,
Leonidas Lampropoulos,
Antal Spector-Zabusky,
Arthur Azevedo de Amorim,
Maxime Dénès,
John Hughes,
Benjamin C. Pierce,
Dimitrios Vytiniotis
Abstract:
Information-flow control mechanisms are difficult both to design and to prove correct. To reduce the time wasted on doomed proof attempts due to broken definitions, we advocate modern random testing techniques for finding counterexamples during the design process. We show how to use QuickCheck, a property-based random-testing tool, to guide the design of increasingly complex information-flow abstr…
▽ More
Information-flow control mechanisms are difficult both to design and to prove correct. To reduce the time wasted on doomed proof attempts due to broken definitions, we advocate modern random testing techniques for finding counterexamples during the design process. We show how to use QuickCheck, a property-based random-testing tool, to guide the design of increasingly complex information-flow abstract machines, leading up to a sophisticated register machine with a novel and highly permissive flow-sensitive dynamic enforcement mechanism that is sound in the presence of first-class public labels. We find that both sophisticated strategies for generating well-distributed random programs and readily falsifiable formulations of noninterference properties are critically important for efficient testing. We propose several approaches and evaluate their effectiveness on a collection of injected bugs of varying subtlety. We also present an effective technique for shrinking large counterexamples to minimal, easily comprehensible ones. Taken together, our best methods enable us to quickly and automatically generate simple counterexamples for more than 45 bugs. Moreover, we show how testing guides the discovery of the sophisticated invariants needed for the noninterference proof of our most complex machine.
△ Less
Submitted 25 July, 2015; v1 submitted 1 September, 2014;
originally announced September 2014.
-
HEP@Home - A distributed computing system based on BOINC
Authors:
Antonio Amorim,
Jaime Villate,
Pedro Andrade
Abstract:
Project SETI@HOME has proven to be one of the biggest successes of distributed computing during the last years. With a quite simple approach SETI manages to process large volumes of data using a vast amount of distributed computer power.
To extend the generic usage of this kind of distributed computing tools, BOINC is being developed. In this paper we propose HEP@HOME, a BOINC version tailored…
▽ More
Project SETI@HOME has proven to be one of the biggest successes of distributed computing during the last years. With a quite simple approach SETI manages to process large volumes of data using a vast amount of distributed computer power.
To extend the generic usage of this kind of distributed computing tools, BOINC is being developed. In this paper we propose HEP@HOME, a BOINC version tailored to the specific requirements of the High Energy Physics (HEP) community.
The HEP@HOME will be able to process large amounts of data using virtually unlimited computing power, as BOINC does, and it should be able to work according to HEP specifications. In HEP the amounts of data to be analyzed or reconstructed are of central importance. Therefore, one of the design principles of this tool is to avoid data transfer. This will allow scientists to run their analysis applications and taking advantage of a large number of CPUs. This tool also satisfies other important requirements in HEP, namely, security, fault-tolerance and monitoring.
△ Less
Submitted 7 October, 2004;
originally announced October 2004.
-
Grid-Brick Event Processing Framework in GEPS
Authors:
Antonio Amorim,
Luis Pedro,
Han Fei,
Nuno Almeida,
Paulo Trezentos,
Jaime E. Villate
Abstract:
Experiments like ATLAS at LHC involve a scale of computing and data management that greatly exceeds the capability of existing systems, making it necessary to resort to Grid-based Parallel Event Processing Systems (GEPS). Traditional Grid systems concentrate the data in central data servers which have to be accessed by many nodes each time an analysis or processing job starts. These systems requ…
▽ More
Experiments like ATLAS at LHC involve a scale of computing and data management that greatly exceeds the capability of existing systems, making it necessary to resort to Grid-based Parallel Event Processing Systems (GEPS). Traditional Grid systems concentrate the data in central data servers which have to be accessed by many nodes each time an analysis or processing job starts. These systems require very powerful central data servers and make little use of the distributed disk space that is available in commodity computers. The Grid-Brick system, which is described in this paper, follows a different approach. The data storage is split among all grid nodes having each one a piece of the whole information. Users submit queries and the system will distribute the tasks through all the nodes and retrieve the result, merging them together in the Job Submit Server. The main advantage of using this system is the huge scalability it provides, while its biggest disadvantage appears in the case of failure of one of the nodes. A workaround for this problem involves data replication or backup.
△ Less
Submitted 14 June, 2003;
originally announced June 2003.
-
An on-line Integrated Bookkee**: electronic run log book and Meta-Data Repository for ATLAS
Authors:
M. Barczyc,
D. Burckhart-Chromek,
M. Caprini,
J. Da Silva Conceicao,
M. Dobson,
J. Flammer,
R. Jones,
A. Kazarov,
S. Kolos,
D. Liko,
L. Mapelli,
I. Soloviev,
R. Hart NIKHEF,
A. Amorim,
D. Klose,
J. Lima,
L. Lucio,
L. Pedro,
H. Wolters,
E. Badescu NIPNE,
I. Alexandrov,
V. Kotov,
M. Mineev JINR,
Yu. Ryabov PNPI
Abstract:
In the context of the ATLAS experiment there is growing evidence of the importance of different kinds of Meta-data including all the important details of the detector and data acquisition that are vital for the analysis of the acquired data. The Online BookKeeper (OBK) is a component of ATLAS online software that stores all information collected while running the experiment, including the Meta-d…
▽ More
In the context of the ATLAS experiment there is growing evidence of the importance of different kinds of Meta-data including all the important details of the detector and data acquisition that are vital for the analysis of the acquired data. The Online BookKeeper (OBK) is a component of ATLAS online software that stores all information collected while running the experiment, including the Meta-data associated with the event acquisition, triggering and storage. The facilities for acquisition of control data within the on-line software framework, together with a full functional Web interface, make the OBK a powerful tool containing all information needed for event analysis, including an electronic log book.
In this paper we explain how OBK plays a role as one of the main collectors and managers of Meta-data produced on-line, and we'll also focus on the Web facilities already available. The usage of the web interface as an electronic run logbook is also explained, together with the future extensions.
We describe the technology used in OBK development and how we arrived at the present level explaining the previous experience with various DBMS technologies. The extensive performance evaluations that have been performed and the usage in the production environment of the ATLAS test beams are also analysed.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
Experience with the Open Source based implementation for ATLAS Conditions Data Management System
Authors:
A. Amorim,
J. Lima,
C. Oliveira,
L. Pedro,
N. Barros
Abstract:
Conditions Data in high energy physics experiments is frequently seen as every data needed for reconstruction besides the event data itself. This includes all sorts of slowly evolving data like detector alignment, calibration and robustness, and data from detector control system. Also, every Conditions Data Object is associated with a time interval of validity and a version. Besides that, quite…
▽ More
Conditions Data in high energy physics experiments is frequently seen as every data needed for reconstruction besides the event data itself. This includes all sorts of slowly evolving data like detector alignment, calibration and robustness, and data from detector control system. Also, every Conditions Data Object is associated with a time interval of validity and a version. Besides that, quite often is useful to tag collections of Conditions Data Objects altogether. These issues have already been investigated and a data model has been proposed and used for different implementations based in commercial DBMSs, both at CERN and for the BaBar experiment. The special case of the ATLAS complex trigger that requires online access to calibration and alignment data poses new challenges that have to be met using a flexible and customizable solution more in the line of Open Source components. Motivated by the ATLAS challenges we have developed an alternative implementation, based in an Open Source RDBMS. Several issues were investigated land will be described in this paper:
-The best way to map the conditions data model into the relational database concept considering what are foreseen as the most frequent queries.
-The clustering model best suited to address the scalability problem. -Extensive tests were performed and will be described.
The very promising results from these tests are attracting the attention from the HEP community and driving further developments.
△ Less
Submitted 13 June, 2003; v1 submitted 30 May, 2003;
originally announced June 2003.