-
Report on the "The Future of the Shell" Panel at HotOS 2021
Authors:
Michael Greenberg,
Konstantinos Kallas,
Nikos Vasilakis,
Stephen Kell
Abstract:
This document summarizes the challenges and possible research directions around the shell and its ecosystem, collected during and after the HotOS21 Panel on the future of the shell. The goal is to create a snapshot of what a number of researchers from various disciplines -- connected to the shell to varying degrees -- think about its future. We hope that this document will serve as a reference for…
▽ More
This document summarizes the challenges and possible research directions around the shell and its ecosystem, collected during and after the HotOS21 Panel on the future of the shell. The goal is to create a snapshot of what a number of researchers from various disciplines -- connected to the shell to varying degrees -- think about its future. We hope that this document will serve as a reference for future research on the shell and its ecosystem.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat
Authors:
Jiasi Shen,
Martin Rinard,
Nikos Vasilakis
Abstract:
We present KumQuat, a system for automatically generating data parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the co…
▽ More
We present KumQuat, a system for automatically generating data parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners.
We evaluate KumQuat on 70 benchmark scripts that together have a total of 427 stages. KumQuat synthesizes a correct combiner for 113 of the 121 unique commands that appear in these benchmark scripts. The synthesis times vary between 39 seconds and 331 seconds with a median of 60 seconds. We present experimental results that show that these combiners enable the effective parallelization of our benchmark scripts.
△ Less
Submitted 22 August, 2021; v1 submitted 30 December, 2020;
originally announced December 2020.
-
An Order-Aware Dataflow Model for Parallel Unix Pipelines
Authors:
Shivam Handa,
Konstantinos Kallas,
Nikos Vasilakis,
Martin Rinard
Abstract:
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the…
▽ More
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the semantics of transformations that exploit data parallelism available in Unix shell computations and prove their correctness. We additionally formalize the translations from the Unix shell to the dataflow model and from the dataflow model back to a parallel shell script. We implement our model and transformations as the compiler and optimization passes of a system parallelizing shell pipelines, and use it to evaluate the speedup achieved on 47 pipelines.
△ Less
Submitted 5 July, 2021; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Mir: Automated Quantifiable Privilege Reduction Against Dynamic Library Compromise in JavaScript
Authors:
Nikos Vasilakis,
Cristian-Alexandru Staicu,
Grigoris Ntousakis,
Konstantinos Kallas,
Ben Karel,
André DeHon,
Michael Pradel
Abstract:
Third-party libraries ease the development of large-scale software systems. However, they often execute with significantly more privilege than needed to complete their task. This additional privilege is often exploited at runtime via dynamic compromise, even when these libraries are not actively malicious. Mir addresses this problem by introducing a fine-grained read-write-execute (RWX) permission…
▽ More
Third-party libraries ease the development of large-scale software systems. However, they often execute with significantly more privilege than needed to complete their task. This additional privilege is often exploited at runtime via dynamic compromise, even when these libraries are not actively malicious. Mir addresses this problem by introducing a fine-grained read-write-execute (RWX) permission model at the boundaries of libraries. Every field of an imported library is governed by a set of permissions, which developers can express when importing libraries. To enforce these permissions during program execution, Mir transforms libraries and their context to add runtime checks. As permissions can overwhelm developers, Mir's permission inference generates default permissions by analyzing how libraries are used by their consumers. Applied to 50 popular libraries, Mir's prototype for JavaScript demonstrates that the RWX permission model combines simplicity with power: it is simple enough to automatically infer 99.33% of required permissions, it is expressive enough to defend against 16 real threats, it is efficient enough to be usable in practice (1.93% overhead), and it enables a novel quantification of privilege reduction.
△ Less
Submitted 1 January, 2021; v1 submitted 31 October, 2020;
originally announced November 2020.
-
PaSh: Light-touch Data-Parallel Shell Processing
Authors:
Nikos Vasilakis,
Konstantinos Kallas,
Konstantinos Mamouras,
Achilleas Benetopoulos,
Lazar Cvetković
Abstract:
This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided…
▽ More
This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided {\scshape Unix}-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands -- two large and commonly used groups -- guides the annotation language and optimized aggregator library that {\scshape PaSh} uses. Finally, {\scshape PaSh}'s {\scshape PaSh}'s extensive evaluation over 44 unmodified {\scshape Unix} scripts shows significant speedups ($0.89$--$61.1\times$, avg: $6.7\times$) stemming from the combination of its program transformations and runtime primitives.
△ Less
Submitted 3 April, 2021; v1 submitted 18 July, 2020;
originally announced July 2020.