Search | arXiv e-print repository

Reusable Formal Verification of DAG-based Consensus Protocols

Authors: Nathalie Bertrand, Pranav Ghorpade, Sasha Rubin, Bernhard Scholz, Pavle Subotic

Abstract: DAG-based consensus protocols are being adoption by blockchain companies to decrease energy footprints and improve security. A DAG-based consensus protocol collaboratively constructs a partial order of blocks of transactions and produces linearly ordered blocks. The ubiquity and strategic importance of blockchains call for formal proof of the correctness of key components, namely, consensus protoc… ▽ More DAG-based consensus protocols are being adoption by blockchain companies to decrease energy footprints and improve security. A DAG-based consensus protocol collaboratively constructs a partial order of blocks of transactions and produces linearly ordered blocks. The ubiquity and strategic importance of blockchains call for formal proof of the correctness of key components, namely, consensus protocols. This paper presents a safety-proven formal specification of two DAG-based protocols. Our specification highlights several dissemination, DAG construction, and ordering variations that can be combined to express the two protocols. The formalization requires a refinement approach for modeling the consensus. In an abstract model, we first show the safety of DAG-based consensus on leader blocks and then further refine the specification to encompass all blocks for all processes. The TLA+ specification for a given protocol consists of 492-732 lines, and the proof system TLAPS verifies 2025-2294 obligations in 6-8 minutes. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2205.09325 [pdf, other]

Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads

Authors: Shinhyung Yang, Jiun Jeong, Bernhard Scholz, Bernd Burgstaller

Abstract: To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple processing latency. Performance engineering is complicated by the fact that streaming engines constitute complex distributed systems consisting of multiple nodes in t… ▽ More To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple processing latency. Performance engineering is complicated by the fact that streaming engines constitute complex distributed systems consisting of multiple nodes in the cloud. A profiling technique is required that is capable of measuring time durations at high accuracy across nodes. Standard clock synchronization techniques such as the network time protocol (NTP) are limited to millisecond accuracy, and hence cannot be used. We propose a profiling technique that relates the time-stamp counters (TSCs) of nodes to measure the duration of events in a streaming framework. The precision of the TSC relation determines the accuracy of the measured duration. The TSC relation is conducted in quiescent periods of the network to achieve accuracy in the tens of microseconds. We propose a throughput-controlled data generator to reliably determine the sustainable throughput of a streaming engine. To facilitate high-throughput data ingestion, we propose a concurrent object factory that moves the deserialization overhead of incoming data tuples off the critical path of the streaming framework. The evaluation of the proposed techniques within the Apache Storm streaming framework on the Google Compute Engine public cloud shows that data ingestion increases from $700$ $\text{k}$ to $4.68$ $\text{M}$ tuples per second, and that time durations can be profiled at a measurement accuracy of $92$ $μ\text{s}$, which is three orders of magnitude higher than the accuracy of NTP, and one order of magnitude higher than prior work. △ Less

Submitted 10 August, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.07421 [pdf, other]

Julia Cloud Matrix Machine: Dynamic Matrix Language Acceleration on Multicore Clusters in the Cloud

Authors: Jay Hwan Lee, Yeonsoo Kim, Younghyun Ryu, Wasuwee Sodsong, Hyunjun Jeon, **sik Park, Bernd Burgstaller, Bernhard Scholz

Abstract: In emerging scientific computing environments, matrix computations of increasing size and complexity are increasingly becoming prevalent. However, contemporary matrix language implementations are insufficient in their support for efficient utilization of cloud computing resources, particularly on the user side. We thus developed an extension of the Julia high-performance computation language such… ▽ More In emerging scientific computing environments, matrix computations of increasing size and complexity are increasingly becoming prevalent. However, contemporary matrix language implementations are insufficient in their support for efficient utilization of cloud computing resources, particularly on the user side. We thus developed an extension of the Julia high-performance computation language such that matrix computations are automatically parallelized in the cloud, where users are separated from directly interacting with complex explicitly-parallel computations. We implement lazy evaluation semantics combined with directed graphs to optimize matrix operations on the fly while dynamic simulation finds the optimal tile size and schedule for a given cluster of cloud nodes. A time model prediction of the cluster's performance capacity is constructed to enable simulations. Automatic configuration of communication and worker processes on the cloud networks allow for the framework to automatically scale up for clusters of heterogeneous nodes. Our framework's experimental evaluation comprises eleven benchmarks on an fourteen node (564 CPUs) cluster in the AWS public cloud, revealing speedups of up to a factor of 5.1, with an average 74.39% of the upper bound for speedups. △ Less

Submitted 8 December, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

arXiv:2108.11212 [pdf, ps, other]

The Choice Construct in the Souffle Language

Authors: Xiaowen Hu, Joshua Karp, David Zhao, Abdul Zreika, Xi Wu, Bernhard Scholz

Abstract: Datalog has become a popular implementation language for solving large-scale, real-world problems, including bug finders, network analysis tools, and disassemblers. These applications express complex behaviour with hundreds of relations and rules that often require a non-deterministic choice for tuples in relations to express worklist algorithms. This work is an experience report that describes th… ▽ More Datalog has become a popular implementation language for solving large-scale, real-world problems, including bug finders, network analysis tools, and disassemblers. These applications express complex behaviour with hundreds of relations and rules that often require a non-deterministic choice for tuples in relations to express worklist algorithms. This work is an experience report that describes the implementation of a choice construct in the Datalog engine Souffle. With the choice construct, we can express worklist algorithms such as spanning trees in a few lines of code. We highlight the differences between rule-based choice as described in prior work, and relation-based choice introduced by this work. We show that a choice construct enables certain worklist algorithms to be computed up to 10kx faster than having no choice construct. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: This is an extended technical report of work that will be published at APLAS'21

arXiv:1912.12747 [pdf, ps, other]

Worst-Case Optimal Radix Triejoin

Authors: Alan Fekete, Brody Franks, Herbert Jordan, Bernhard Scholz

Abstract: Relatively recently, the field of join processing has been swayed by the discovery of a new class of multi-way join algorithms. The new algorithms join multiple relations simultaneously rather than perform a series of pairwise joins. The new join algorithms satisfy stronger worst-case runtime complexity guarantees than any of the existing approaches based on pairwise joins -- they are worst-case o… ▽ More Relatively recently, the field of join processing has been swayed by the discovery of a new class of multi-way join algorithms. The new algorithms join multiple relations simultaneously rather than perform a series of pairwise joins. The new join algorithms satisfy stronger worst-case runtime complexity guarantees than any of the existing approaches based on pairwise joins -- they are worst-case optimal in data complexity. These research efforts have resulted in a flurry of papers documenting theoretical and some practical contributions. However, there is still the quest of making the new worst-case optimal join algorithms truly practical in terms of (1) ease of implementation and (2) secondary index efficiency in terms of number of indexes created to answer a query. In this paper, we present a simple worst-case optimal multi-way join algorithm called the radix triejoin. Radix triejoin uses a binary encoding for reducing the domain of a database. Our main technical contribution is that domain reduction allows a bit-interleaving of attribute values that gives rise to a query-independent relation representation, permitting the computation of multiple queries over the same relations worst-case optimally without having to construct additional secondary indexes. We also generalise the core algorithm to conjunctive queries with inequality constraints and provide a new proof technique for the worst-case optimal join result. △ Less

Submitted 29 December, 2019; originally announced December 2019.

Comments: 12 pages, 4 figures

arXiv:1910.11143 [pdf, other]

The Economics of Smart Contracts

Authors: Kirk Baird, Seongho Jeong, Yeonsoo Kim, Bernd Burgstaller, Bernhard Scholz

Abstract: Ethereum is a distributed blockchain that can execute smart contracts, which inter-communicate and perform transactions automatically. The execution of smart contracts is paid in the form of gas, which is a monetary unit used in the Ethereum blockchain. The Ethereum Virtual Machine (EVM) provides the metering capability for smart contract execution. Instruction costs vary depending on the instruct… ▽ More Ethereum is a distributed blockchain that can execute smart contracts, which inter-communicate and perform transactions automatically. The execution of smart contracts is paid in the form of gas, which is a monetary unit used in the Ethereum blockchain. The Ethereum Virtual Machine (EVM) provides the metering capability for smart contract execution. Instruction costs vary depending on the instruction type and the approximate computational resources required to execute the instruction on the network. The cost of gas is adjusted using transaction fees to ensure adequate payment of the network. In this work, we highlight the "real" economics of smart contracts. We show that the actual costs of executing smart contracts are disproportionate to the computational costs and that this gap is continuously widening. We show that the gas cost-model of the underlying EVM instruction-set is wrongly modeled. Specifically, the computational cost for the SLOAD instruction increases with the length of the blockchain. Our proposed performance model estimates gas usage and execution time of a smart contract at a given block-height. The new gas-cost model incorporates the block-height to eliminate irregularities in the Ethereum gas calculations. Our findings are based on extensive experiments over the entire history of the EVM blockchain. △ Less

Submitted 22 October, 2019; originally announced October 2019.

ACM Class: D.2.8; D.2.11

arXiv:1907.05045 [pdf, ps, other]

Provenance for Large-scale Datalog

Authors: David Zhao, Pavle Subotic, Bernhard Scholz

Abstract: Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug.… ▽ More Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data-provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this paper, we introduce a novel bottom-up Datalog evaluation strategy for debugging: our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation. A debugging query mechanism allows arbitrary provenance queries, constructing partial proof trees of tuples with minimal height. We integrate our technique into Souffle, a Datalog engine that synthesizes C++ code, and achieve high performance by using specialized parallel data structures. Experiments are conducted with DOOP/DaCapo, producing proof annotations for tens of millions of output tuples. We show that our method has a runtime overhead of 1.27x on average while being more flexible than existing state-of-the-art techniques. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: 28 pages, 18 figures, technical report

arXiv:1809.03981 [pdf, other]

Vandal: A Scalable Security Analysis Framework for Smart Contracts

Authors: Lexi Brent, Anton Jurisevic, Michael Kong, Eric Liu, Francois Gauthier, Vincent Gramoli, Ralph Holz, Bernhard Scholz

Abstract: The rise of modern blockchains has facilitated the emergence of smart contracts: autonomous programs that live and run on the blockchain. Smart contracts have seen a rapid climb to prominence, with applications predicted in law, business, commerce, and governance. Smart contracts are commonly written in a high-level language such as Ethereum's Solidity, and translated to compact low-level byteco… ▽ More The rise of modern blockchains has facilitated the emergence of smart contracts: autonomous programs that live and run on the blockchain. Smart contracts have seen a rapid climb to prominence, with applications predicted in law, business, commerce, and governance. Smart contracts are commonly written in a high-level language such as Ethereum's Solidity, and translated to compact low-level bytecode for deployment on the blockchain. Once deployed, the bytecode is autonomously executed, usually by a %Turing-complete virtual machine. As with all programs, smart contracts can be highly vulnerable to malicious attacks due to deficient programming methodologies, languages, and toolchains, including buggy compilers. At the same time, smart contracts are also high-value targets, often commanding large amounts of cryptocurrency. Hence, developers and auditors need security frameworks capable of analysing low-level bytecode to detect potential security vulnerabilities. In this paper, we present Vandal: a security analysis framework for Ethereum smart contracts. Vandal consists of an analysis pipeline that converts low-level Ethereum Virtual Machine (EVM) bytecode to semantic logic relations. Users of the framework can express security analyses in a declarative fashion: a security analysis is expressed in a logic specification written in the \souffle language. We conduct a large-scale empirical study for a set of common smart contract security vulnerabilities, and show the effectiveness and efficiency of Vandal. Vandal is both fast and robust, successfully analysing over 95\% of all 141k unique contracts with an average runtime of 4.15 seconds; outperforming the current state of the art tools---Oyente, EthIR, Mythril, and Rattle---under equivalent conditions. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: 28 pages, 11 figures

arXiv:1711.01024 [pdf, other]

SPARK: Static Program Analysis Reasoning and Retrieving Knowledge

Authors: Wasuwee Sodsong, Bernhard Scholz, Sanjay Chawla

Abstract: Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that we… ▽ More Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules. To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code. △ Less

Submitted 3 November, 2017; originally announced November 2017.

arXiv:1709.03685 [pdf, other]

Optimal On The Fly Index Selection in Polynomial Time

Authors: Herbert Jordan, Bernhard Scholz, Pavle Subotić

Abstract: The index selection problem (ISP) is an important problem for accelerating the execution of relational queries, and it has received a lot of attention as a combinatorial knapsack problem in the past. Various solutions to this very hard problem have been provided. In contrast to existing literature, we change the underlying assumptions of the problem definition: we adapt the problem for systems tha… ▽ More The index selection problem (ISP) is an important problem for accelerating the execution of relational queries, and it has received a lot of attention as a combinatorial knapsack problem in the past. Various solutions to this very hard problem have been provided. In contrast to existing literature, we change the underlying assumptions of the problem definition: we adapt the problem for systems that store relations in memory, and use complex specification languages, e.g., Datalog. In our framework, we decompose complex queries into primitive searches that select tuples in a relation for which an equality predicate holds. A primitive search can be accelerated by an index exhibiting a worst-case run-time complexity of log-linear time in the size of the output result of the primitive search. However, the overheads associated with maintaining indexes are very costly in terms of memory and computing time. In this work, we present an optimal polynomial-time algorithm that finds the minimal set of indexes of a relation for a given set of primitive searches. An index may cover more than one primitive search due to the algebraic properties of the search predicate, which is a conjunction of equalities over the attributes of a relation. The index search space exhibits a exponential complexity in the number of attributes in a relation, and, hence brute-force algorithms searching for solutions in the index domain are infeasible. As a scaffolding for designing a polynomial-time algorithm, we build a partial order on search operations and use a constructive version of Dilworth's theorem. We show a strong relationship between chains of primitive searches (forming a partial order) and indexes. We demonstrate the effectiveness and efficiency of our algorithm for an in-memory Datalog compiler that is able to process relations with billions of entries in memory. △ Less

Submitted 12 September, 2017; originally announced September 2017.

arXiv:1608.05317 [pdf, ps, other]

doi 10.1007/s00023-018-0670-x

Rényi divergences as weighted non-commutative vector valued $L_p$-spaces

Authors: Mario Berta, Volkher B. Scholz, Marco Tomamichel

Abstract: We show that Araki and Masuda's weighted non-commutative vector valued $L_p$-spaces [Araki \& Masuda, Publ. Res. Inst. Math. Sci., 18:339 (1982)] correspond to an algebraic generalization of the sandwiched Rényi divergences with parameter $α= \frac{p}{2}$. Using complex interpolation theory, we prove various fundamental properties of these divergences in the setup of von Neumann algebras, includin… ▽ More We show that Araki and Masuda's weighted non-commutative vector valued $L_p$-spaces [Araki \& Masuda, Publ. Res. Inst. Math. Sci., 18:339 (1982)] correspond to an algebraic generalization of the sandwiched Rényi divergences with parameter $α= \frac{p}{2}$. Using complex interpolation theory, we prove various fundamental properties of these divergences in the setup of von Neumann algebras, including a data-processing inequality and monotonicity in $α$. We thereby also give new proofs for the corresponding finite-dimensional properties. We discuss the limiting cases $α\to \{\frac{1}{2},1,\infty\}$ leading to minus the logarithm of Uhlmann's fidelity, Umegaki's relative entropy, and the max-relative entropy, respectively. As a contribution that might be of independent interest, we derive a Riesz-Thorin theorem for Araki-Masuda $L_p$-spaces and an Araki-Lieb-Thirring inequality for states on von Neumann algebras. △ Less

Submitted 22 July, 2018; v1 submitted 18 August, 2016; originally announced August 2016.

Comments: v2: 20 pages, published version

Journal ref: Annales Henri Poincaré 19 (6), 1843-1867 (2018)

arXiv:1603.05970 [pdf, other]

doi 10.1103/PhysRevA.95.062343

Coherent-state constellations and polar codes for thermal Gaussian channels

Authors: Felipe Lacerda, Joseph M. Renes, Volkher B. Scholz

Abstract: Optical communication channels are ultimately quantum-mechanical in nature, and we must therefore look beyond classical information theory to determine their communication capacity as well as to find efficient encoding and decoding schemes of the highest rates. Thermal channels, which arise from linear coupling of the field to a thermal environment, are of particular practical relevance; their cla… ▽ More Optical communication channels are ultimately quantum-mechanical in nature, and we must therefore look beyond classical information theory to determine their communication capacity as well as to find efficient encoding and decoding schemes of the highest rates. Thermal channels, which arise from linear coupling of the field to a thermal environment, are of particular practical relevance; their classical capacity has been recently established, but their quantum capacity remains unknown. While the capacity sets the ultimate limit on reliable communication rates, it does not promise that such rates are achievable by practical means. Here we construct efficiently encodable codes for thermal channels which achieve the classical capacity and the so-called Gaussian coherent information for transmission of classical and quantum information, respectively. Our codes are based on combining polar codes with a discretization of the channel input into a finite "constellation" of coherent states. Encoding of classical information can be done using linear optics. △ Less

Submitted 4 July, 2017; v1 submitted 18 March, 2016; originally announced March 2016.

Comments: v3: published version; v2: vastly improved presentation. Figures! v1: 10 pages

Journal ref: Phys. Rev. A 95, 062343 (2017)

arXiv:1510.06743 [pdf, ps, other]

doi 10.4230/LIPIcs.TQC.2016.2

Quantum-proof multi-source randomness extractors in the Markov model

Authors: Rotem Arnon-Friedman, Christopher Portmann, Volkher B. Scholz

Abstract: Randomness extractors, widely used in classical and quantum cryptography and other fields of computer science, e.g., derandomization, are functions which generate almost uniform randomness from weak sources of randomness. In the quantum setting one must take into account the quantum side information held by an adversary which might be used to break the security of the extractor. In the case of see… ▽ More Randomness extractors, widely used in classical and quantum cryptography and other fields of computer science, e.g., derandomization, are functions which generate almost uniform randomness from weak sources of randomness. In the quantum setting one must take into account the quantum side information held by an adversary which might be used to break the security of the extractor. In the case of seeded extractors the presence of quantum side information has been extensively studied. For multi-source extractors one can easily see that high conditional min-entropy is not sufficient to guarantee security against arbitrary side information, even in the classical case. Hence, the interesting question is under which models of (both quantum and classical) side information multi-source extractors remain secure. In this work we suggest a natural model of side information, which we call the Markov model, and prove that any multi-source extractor remains secure in the presence of quantum side information of this type (albeit with weaker parameters). This improves on previous results in which more restricted models were considered and the security of only some types of extractors was shown. △ Less

Submitted 9 September, 2016; v1 submitted 22 October, 2015; originally announced October 2015.

Comments: 21 pages + appendix. v2: Improved comparison to previous work. Comments are welcome!

Journal ref: 11th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2016), volume 61 of LIPIcs, pages 2:1-2:34. Schloss Dagstuhl, 2016

arXiv:1412.0980 [pdf, ps, other]

doi 10.1109/TIT.2017.2754268

Approximate Degradable Quantum Channels

Authors: David Sutter, Volkher B. Scholz, Andreas Winter, Renato Renner

Abstract: Degradable quantum channels are an important class of completely positive trace-preserving maps. Among other properties, they offer a single-letter formula for the quantum and the private classical capacity and are characterized by the fact that a complementary channel can be obtained from the channel by applying a degrading channel. In this work we introduce the concept of approximate degradable… ▽ More Degradable quantum channels are an important class of completely positive trace-preserving maps. Among other properties, they offer a single-letter formula for the quantum and the private classical capacity and are characterized by the fact that a complementary channel can be obtained from the channel by applying a degrading channel. In this work we introduce the concept of approximate degradable channels, which satisfy this condition up to some finite $\varepsilon\geq0$. That is, there exists a degrading channel which upon composition with the channel is $\varepsilon$-close in the diamond norm to the complementary channel. We show that for any fixed channel the smallest such $\varepsilon$ can be efficiently determined via a semidefinite program. Moreover, these approximate degradable channels also approximately inherit all other properties of degradable channels. As an application, we derive improved upper bounds to the quantum and private classical capacity for certain channels of interest in quantum communication. △ Less

Submitted 17 October, 2017; v1 submitted 2 December, 2014; originally announced December 2014.

Comments: v3: minor changes, published version. v2: 21 pages, 2 figures, improved bounds on the capacity for approximate degradable channels based on [arXiv:1507.07775], an author added

MSC Class: 81P45

Journal ref: IEEE Transactions on Information Theory, vol. 63, no. 12, pages 7832 - 7844, 2017

arXiv:1212.0520 [pdf, other]

A modular framework for randomness extraction based on Trevisan's construction

Authors: Wolfgang Mauerer, Christopher Portmann, Volkher B. Scholz

Abstract: Informally, an extractor delivers perfect randomness from a source that may be far away from the uniform distribution, yet contains some randomness. This task is a crucial ingredient of any attempt to produce perfectly random numbers---required, for instance, by cryptographic protocols, numerical simulations, or randomised computations. Trevisan's extractor raised considerable theoretical interest… ▽ More Informally, an extractor delivers perfect randomness from a source that may be far away from the uniform distribution, yet contains some randomness. This task is a crucial ingredient of any attempt to produce perfectly random numbers---required, for instance, by cryptographic protocols, numerical simulations, or randomised computations. Trevisan's extractor raised considerable theoretical interest not only because of its data parsimony compared to other constructions, but particularly because it is secure against quantum adversaries, making it applicable to quantum key distribution. We discuss a modular, extensible and high-performance implementation of the construction based on various building blocks that can be flexibly combined to satisfy the requirements of a wide range of scenarios. Besides quantitatively analysing the properties of many combinations in practical settings, we improve previous theoretical proofs, and give explicit results for non-asymptotic cases. The self-contained description does not assume familiarity with extractors. △ Less

Submitted 3 December, 2012; originally announced December 2012.

Comments: 21 pages, 15 figures. Source code is available under GPLv2+. Comments welcome

arXiv:0910.2324 [pdf, ps, other]

Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture

Authors: Raymes Khoury, Bernd Burgstaller, Bernhard Scholz

Abstract: Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their scripting languages with matrix data types. Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell process… ▽ More Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their scripting languages with matrix data types. Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell processor (Cell), which is currently used in the fastest computer in the world. We present a new framework that extends Octave to harness the computational power of the Cell. With this framework the programmer is relieved of the burden of introducing explicit notions of parallelism. Instead the programmer uses a new matrix data-type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data-type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into sub-matrices), scheduled and executed on the SPEs. Thereby we exploit (1) data parallelism, (2) instruction level parallelism, (3) pipeline parallelism and (4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors. △ Less

Submitted 14 November, 2009; v1 submitted 13 October, 2009; originally announced October 2009.

Comments: 61 pages, 34 figures

ACM Class: D.3.4

Showing 1–16 of 16 results for author: Scholz, B