-
BEC: Bit-Level Static Analysis for Reliability against Soft Errors
Authors:
Yousun Ko,
Bernd Burgstaller
Abstract:
Soft errors are a type of transient digital signal corruption that occurs in digital hardware components such as the internal flip-flops of CPU pipelines, the register file, memory cells, and even internal communication buses. Soft errors are caused by environmental radioactivity, magnetic interference, lasers, and temperature fluctuations, either unintentionally, or as part of a deliberate attemp…
▽ More
Soft errors are a type of transient digital signal corruption that occurs in digital hardware components such as the internal flip-flops of CPU pipelines, the register file, memory cells, and even internal communication buses. Soft errors are caused by environmental radioactivity, magnetic interference, lasers, and temperature fluctuations, either unintentionally, or as part of a deliberate attempt to compromise a system and expose confidential data.
We propose a bit-level error coalescing (BEC) static program analysis and its two use cases to understand and improve program reliability against soft errors. The BEC analysis tracks each bit corruption in the register file and classifies the effect of the corruption by its semantics at compile time. The usefulness of the proposed analysis is demonstrated in two scenarios, fault injection campaign pruning, and reliability-aware program transformation. Experimental results show that bit-level analysis pruned up to 30.04 % of exhaustive fault injection campaigns (13.71 % on average), without loss of accuracy. Program vulnerability was reduced by up to 13.11 % (4.94 % on average) through bit-level vulnerability-aware instruction scheduling. The analysis has been implemented within LLVM and evaluated on the RISC-V architecture.
To the best of our knowledge, the proposed BEC analysis is the first bit-level compiler analysis for program reliability against soft errors. The proposed method is generic and not limited to a specific computer architecture.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Information content in formal languages
Authors:
Bernhard Burgstaller
Abstract:
Motivated by creating physical theories, formal languages $S$ with variables are considered and a kind of distance between elements of the languages is defined by the formula $d(x,y)= \ell(x \nabla y) - \ell(x) \wedge \ell(y)$, where $\ell$ is a length function and $x \nabla y$ means the united theory of $x$ and $y$. Actually we mainly consider abstract abelian idempotent monoids $(S,\nabla)$ prov…
▽ More
Motivated by creating physical theories, formal languages $S$ with variables are considered and a kind of distance between elements of the languages is defined by the formula $d(x,y)= \ell(x \nabla y) - \ell(x) \wedge \ell(y)$, where $\ell$ is a length function and $x \nabla y$ means the united theory of $x$ and $y$. Actually we mainly consider abstract abelian idempotent monoids $(S,\nabla)$ provided with length functions $\ell$. The set of length functions can be projected to another set of length functions such that the distance $d$ is actually a pseudometric and satisfies $d(x\nabla a,y\nabla b) \le d(x,y) + d(a,b)$. We also propose a "signed measure" on the set of Boolean expressions of elements in $S$, and a Banach-Mazur-like distance between abelian, idempotent monoids with length functions, or formal languages.
△ Less
Submitted 16 November, 2023; v1 submitted 11 September, 2022;
originally announced September 2022.
-
Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads
Authors:
Shinhyung Yang,
Jiun Jeong,
Bernhard Scholz,
Bernd Burgstaller
Abstract:
To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple processing latency. Performance engineering is complicated by the fact that streaming engines constitute complex distributed systems consisting of multiple nodes in t…
▽ More
To conduct real-time analytics computations, big data stream processing engines are required to process unbounded data streams at millions of events per second. However, current streaming engines exhibit low throughput and high tuple processing latency. Performance engineering is complicated by the fact that streaming engines constitute complex distributed systems consisting of multiple nodes in the cloud. A profiling technique is required that is capable of measuring time durations at high accuracy across nodes. Standard clock synchronization techniques such as the network time protocol (NTP) are limited to millisecond accuracy, and hence cannot be used.
We propose a profiling technique that relates the time-stamp counters (TSCs) of nodes to measure the duration of events in a streaming framework. The precision of the TSC relation determines the accuracy of the measured duration. The TSC relation is conducted in quiescent periods of the network to achieve accuracy in the tens of microseconds. We propose a throughput-controlled data generator to reliably determine the sustainable throughput of a streaming engine. To facilitate high-throughput data ingestion, we propose a concurrent object factory that moves the deserialization overhead of incoming data tuples off the critical path of the streaming framework. The evaluation of the proposed techniques within the Apache Storm streaming framework on the Google Compute Engine public cloud shows that data ingestion increases from $700$ $\text{k}$ to $4.68$ $\text{M}$ tuples per second, and that time durations can be profiled at a measurement accuracy of $92$ $μ\text{s}$, which is three orders of magnitude higher than the accuracy of NTP, and one order of magnitude higher than prior work.
△ Less
Submitted 10 August, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Julia Cloud Matrix Machine: Dynamic Matrix Language Acceleration on Multicore Clusters in the Cloud
Authors:
Jay Hwan Lee,
Yeonsoo Kim,
Younghyun Ryu,
Wasuwee Sodsong,
Hyunjun Jeon,
**sik Park,
Bernd Burgstaller,
Bernhard Scholz
Abstract:
In emerging scientific computing environments, matrix computations of increasing size and complexity are increasingly becoming prevalent. However, contemporary matrix language implementations are insufficient in their support for efficient utilization of cloud computing resources, particularly on the user side. We thus developed an extension of the Julia high-performance computation language such…
▽ More
In emerging scientific computing environments, matrix computations of increasing size and complexity are increasingly becoming prevalent. However, contemporary matrix language implementations are insufficient in their support for efficient utilization of cloud computing resources, particularly on the user side. We thus developed an extension of the Julia high-performance computation language such that matrix computations are automatically parallelized in the cloud, where users are separated from directly interacting with complex explicitly-parallel computations. We implement lazy evaluation semantics combined with directed graphs to optimize matrix operations on the fly while dynamic simulation finds the optimal tile size and schedule for a given cluster of cloud nodes. A time model prediction of the cluster's performance capacity is constructed to enable simulations. Automatic configuration of communication and worker processes on the cloud networks allow for the framework to automatically scale up for clusters of heterogeneous nodes. Our framework's experimental evaluation comprises eleven benchmarks on an fourteen node (564 CPUs) cluster in the AWS public cloud, revealing speedups of up to a factor of 5.1, with an average 74.39% of the upper bound for speedups.
△ Less
Submitted 8 December, 2023; v1 submitted 15 May, 2022;
originally announced May 2022.
-
GLocal-K: Global and Local Kernels for Recommender Systems
Authors:
Soyeon Caren Han,
Taejun Lim,
Siqu Long,
Bernd Burgstaller,
Josiah Poon
Abstract:
Recommender systems typically operate on high-dimensional sparse user-item matrices. Matrix completion is a very challenging task to predict one's interest based on millions of other users having each seen a small subset of thousands of items. We propose a Global-Local Kernel-based matrix completion framework, named GLocal-K, that aims to generalise and represent a high-dimensional sparse user-ite…
▽ More
Recommender systems typically operate on high-dimensional sparse user-item matrices. Matrix completion is a very challenging task to predict one's interest based on millions of other users having each seen a small subset of thousands of items. We propose a Global-Local Kernel-based matrix completion framework, named GLocal-K, that aims to generalise and represent a high-dimensional sparse user-item matrix entry into a low dimensional space with a small number of important features. Our GLocal-K can be divided into two major stages. First, we pre-train an auto encoder with the local kernelised weight matrix, which transforms the data from one space into the feature space by using a 2d-RBF kernel. Then, the pre-trained auto encoder is fine-tuned with the rating matrix, produced by a convolution-based global kernel, which captures the characteristics of each item. We apply our GLocal-K model under the extreme low-resource setting, which includes only a user-item rating matrix, with no side information. Our model outperforms the state-of-the-art baselines on three collaborative filtering benchmarks: ML-100K, ML-1M, and Douban.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
A kind of $KK$-theory for rings
Authors:
Bernhard Burgstaller
Abstract:
A group equivariant $KK$-theory for rings will be defined and studied in analogy to Kasparov's $KK$-theory for $C^*$-algebras. It is a kind of linearization of the category of rings by allowing addition of homomorphisms, imposing also homotopy invariance, invertibility of matrix corner embeddings, and allowing morphisms which are the opposite split of split exact sequences. We demonstrate the pote…
▽ More
A group equivariant $KK$-theory for rings will be defined and studied in analogy to Kasparov's $KK$-theory for $C^*$-algebras. It is a kind of linearization of the category of rings by allowing addition of homomorphisms, imposing also homotopy invariance, invertibility of matrix corner embeddings, and allowing morphisms which are the opposite split of split exact sequences. We demonstrate the potential of this theory by proving for example equivalence induced by Morita equivalence and a Green-Julg isomorphism in this framework.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Aspects of equivariant $KK$-theory in its generators and relations picture
Authors:
Bernhard Burgstaller
Abstract:
We give a new proof of the universal property of $KK^G$-theory with respect to stability, homotopy invariance and split-exactness for $G$ a locally compact group, or a locally compact (not necessarily Hausdorff) groupoid, or a countable inverse semigroup which is relatively short and conceptual. Morphisms in the generators and relations picture of $KK^G$-theory are brought to a particular simple f…
▽ More
We give a new proof of the universal property of $KK^G$-theory with respect to stability, homotopy invariance and split-exactness for $G$ a locally compact group, or a locally compact (not necessarily Hausdorff) groupoid, or a countable inverse semigroup which is relatively short and conceptual. Morphisms in the generators and relations picture of $KK^G$-theory are brought to a particular simple form.
△ Less
Submitted 6 December, 2019;
originally announced December 2019.
-
The Economics of Smart Contracts
Authors:
Kirk Baird,
Seongho Jeong,
Yeonsoo Kim,
Bernd Burgstaller,
Bernhard Scholz
Abstract:
Ethereum is a distributed blockchain that can execute smart contracts, which inter-communicate and perform transactions automatically. The execution of smart contracts is paid in the form of gas, which is a monetary unit used in the Ethereum blockchain. The Ethereum Virtual Machine (EVM) provides the metering capability for smart contract execution. Instruction costs vary depending on the instruct…
▽ More
Ethereum is a distributed blockchain that can execute smart contracts, which inter-communicate and perform transactions automatically. The execution of smart contracts is paid in the form of gas, which is a monetary unit used in the Ethereum blockchain. The Ethereum Virtual Machine (EVM) provides the metering capability for smart contract execution. Instruction costs vary depending on the instruction type and the approximate computational resources required to execute the instruction on the network. The cost of gas is adjusted using transaction fees to ensure adequate payment of the network. In this work, we highlight the "real" economics of smart contracts. We show that the actual costs of executing smart contracts are disproportionate to the computational costs and that this gap is continuously widening. We show that the gas cost-model of the underlying EVM instruction-set is wrongly modeled. Specifically, the computational cost for the SLOAD instruction increases with the length of the blockchain. Our proposed performance model estimates gas usage and execution time of a smart contract at a given block-height. The new gas-cost model incorporates the block-height to eliminate irregularities in the Ethereum gas calculations. Our findings are based on extensive experiments over the entire history of the EVM blockchain.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Some remarks in $C^*$- and $K$-theory
Authors:
Bernhard Burgstaller
Abstract:
This note consists of three unrelated remarks. First, we demonstrate how roughly speaking $*$-homomorphisms between matrix stable $C^*$-algebras are exactly the uniformly continuous $*$-preserving group homomorphisms between their genral linear groups. Second, using the Cuntz picture in $KK$-theory we bring morphisms in $KK$-theory represented by generators and relations to a particular simple for…
▽ More
This note consists of three unrelated remarks. First, we demonstrate how roughly speaking $*$-homomorphisms between matrix stable $C^*$-algebras are exactly the uniformly continuous $*$-preserving group homomorphisms between their genral linear groups. Second, using the Cuntz picture in $KK$-theory we bring morphisms in $KK$-theory represented by generators and relations to a particular simple form. Third, we show that for an inverse semigroup its associated groupoid is Hausdorff if and only if the inverse semigroup is $E$-continuous.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Safe Non-blocking Synchronization in Ada 202x
Authors:
Johann Blieberger,
Bernd Burgstaller
Abstract:
The mutual-exclusion property of locks stands in the way to scalability of parallel programs on many-core architectures. Locks do not allow progress guarantees, because a task may fail inside a critical section and keep holding a lock that blocks other tasks from accessing shared data. With non-blocking synchronization, the drawbacks of locks are avoided by synchronizing access to shared data by a…
▽ More
The mutual-exclusion property of locks stands in the way to scalability of parallel programs on many-core architectures. Locks do not allow progress guarantees, because a task may fail inside a critical section and keep holding a lock that blocks other tasks from accessing shared data. With non-blocking synchronization, the drawbacks of locks are avoided by synchronizing access to shared data by atomic read-modify-write operations. To incorporate non-blocking synchronization in Ada~202x, programmers must be able to reason about the behavior and performance of tasks in the absence of protected objects and rendezvous. We therefore extend Ada's memory model by synchronized types, which support the expression of memory ordering operations at a sufficient level of detail. To mitigate the complexity associated with non-blocking synchronization, we propose concurrent objects as a novel high-level language construct. Entities of a concurrent object execute in parallel, due to a fine-grained, optimistic synchronization mechanism. Synchronization is framed by the semantics of concurrent entry execution. The programmer is only required to label shared data accesses in the code of concurrent entries. Labels constitute memory-ordering operations expressed through attributes. To the best of our knowledge, this is the first approach to provide a non-blocking synchronization construct as a first-class citizen of a high-level programming language. We illustrate the use of concurrent objects by several examples.
△ Less
Submitted 18 June, 2018; v1 submitted 27 March, 2018;
originally announced March 2018.
-
Semigroup homomorphisms on matrix algebras
Authors:
Bernhard Burgstaller
Abstract:
We explore the connection between ring homomorphisms and semigroup homomorphisms on matrix algebras over rings or $C^*$-algebras.
We explore the connection between ring homomorphisms and semigroup homomorphisms on matrix algebras over rings or $C^*$-algebras.
△ Less
Submitted 23 January, 2017;
originally announced January 2017.
-
A note on a certain Baum--Connes map for inverse semigroups
Authors:
Bernhard Burgstaller
Abstract:
Let $G$ denote a countable inverse semigroup. We construct a kind of a Baum--Connes map $K(\tilde A \rtimes G) \rightarrow K(A \rtimes G)$ by a categorial approach via localization of triangulated categories, developed by R. Meyer and R. Nest for groups $G$. We allow the coefficient algebras $A$ to be in a special class of algebras called fibered $G$-algebras. This note continues and fixes our pre…
▽ More
Let $G$ denote a countable inverse semigroup. We construct a kind of a Baum--Connes map $K(\tilde A \rtimes G) \rightarrow K(A \rtimes G)$ by a categorial approach via localization of triangulated categories, developed by R. Meyer and R. Nest for groups $G$. We allow the coefficient algebras $A$ to be in a special class of algebras called fibered $G$-algebras. This note continues and fixes our preprint "Attempts to define a Baum--Connes map via localization of categories for inverse semigroups".
△ Less
Submitted 7 September, 2016;
originally announced September 2016.
-
The generators and relations picture of $KK$-theory
Authors:
Bernhard Burgstaller
Abstract:
This is half an overview article since what we describe here is essentially known. We describe $KK$-theory by generators and relations in a formal sum of formal products of $*$-homomorphisms and some synthetical morphisms. What comes out is a category. The Kasparov product is then just the composition of morphisms. This description may be interesting to anyone who wants a quick and elementary defi…
▽ More
This is half an overview article since what we describe here is essentially known. We describe $KK$-theory by generators and relations in a formal sum of formal products of $*$-homomorphisms and some synthetical morphisms. What comes out is a category. The Kasparov product is then just the composition of morphisms. This description may be interesting to anyone who wants a quick and elementary definition of $KK$-theory. The description could also be used for other categories of algebras than $C^*$-algebras endowed with group actions, for example, $C^*$-algebras equipped with an action by a semigroup, a category et cetera.
△ Less
Submitted 1 September, 2016; v1 submitted 9 February, 2016;
originally announced February 2016.
-
Efficient Construction of Simultaneous Deterministic Finite Automata on Multicores Using Rabin Fingerprints
Authors:
Minyoung Jung,
Bernd Burgstaller,
Johann Blieberger
Abstract:
In this paper, we propose several optimizations for the SFA construction algorithm, which greatly reduce the in-memory footprint and the processing steps required to construct an SFA. We introduce fingerprints as a space- and time-efficient way to represent SFA states. To compute fingerprints, we apply the Barrett reduction algorithm and accelerate it using recent additions to the x86 instruction…
▽ More
In this paper, we propose several optimizations for the SFA construction algorithm, which greatly reduce the in-memory footprint and the processing steps required to construct an SFA. We introduce fingerprints as a space- and time-efficient way to represent SFA states. To compute fingerprints, we apply the Barrett reduction algorithm and accelerate it using recent additions to the x86 instruction set architecture. We exploit fingerprints to introduce hashing for further optimizations. Our parallel SFA construction algorithm is nonblocking and utilizes instruction-level, data-level, and task-level parallelism of coarse-, medium- and fine-grained granularity. We adapt static workload distributions and align the SFA data-structures with the constraints of multicore memory hierarchies, to increase the locality of memory accesses and facilitate HW prefetching. We conduct experiments on the PROSITE protein database for FAs of up to 702 FA states to evaluate performance and effectiveness of our proposed optimizations. Evaluations have been conducted on a 4 CPU (64 cores) AMD Opteron 6378 system and a 2 CPU (28 cores, 2 hyperthreads per core) Intel Xeon E5-2697 v3 system. The observed speedups over the sequential baseline algorithm are up to 118541x on the AMD system and 2113968x on the Intel system.
△ Less
Submitted 25 September, 2017; v1 submitted 31 December, 2015;
originally announced December 2015.
-
Inverse semigroup equivariant $KK$-theory and $C^*$-extensions
Authors:
Bernhard Burgstaller
Abstract:
In this note we extend the classical result by G. G. Kasparov that the Kasparov groups $KK_1(A,B)$ can be identified with the extension groups $\mbox{Ext}(A,B)$ to the inverse semigroup equivariant setting. More precisely, we show that $KK_G^1(A,B) \cong \mbox{Ext}_G(A \otimes {\cal K}_G,B \otimes {\cal K}_G)$ for every countable, $E$-continuous inverse semigroup $G$. For locally compact second co…
▽ More
In this note we extend the classical result by G. G. Kasparov that the Kasparov groups $KK_1(A,B)$ can be identified with the extension groups $\mbox{Ext}(A,B)$ to the inverse semigroup equivariant setting. More precisely, we show that $KK_G^1(A,B) \cong \mbox{Ext}_G(A \otimes {\cal K}_G,B \otimes {\cal K}_G)$ for every countable, $E$-continuous inverse semigroup $G$. For locally compact second countable groups $G$ this was proved by K. Thomsen, and technically this note presents an adaption of his proof.
△ Less
Submitted 12 August, 2015;
originally announced August 2015.
-
Attempts to define a Baum--Connes map via localization of categories for inverse semigroups
Authors:
Bernhard Burgstaller
Abstract:
Meyer and Nest showed that the Baum--Connes map is equivalent to a map on $K$-theory of two different crossed products. This approach is strongly categorial in method since its bases is to regard Kasparov's theory $KK^G$ as a triangulated category. We have tried to translate this approach to the realm of inverse semigroup equivariant $C^*$-algebras but can prove the existence of a Baum--Connes map…
▽ More
Meyer and Nest showed that the Baum--Connes map is equivalent to a map on $K$-theory of two different crossed products. This approach is strongly categorial in method since its bases is to regard Kasparov's theory $KK^G$ as a triangulated category. We have tried to translate this approach to the realm of inverse semigroup equivariant $C^*$-algebras but can prove the existence of a Baum--Connes map only under some unverified additional assumptions which we however strongly motivate. Some of our results may be of independent interest, for example Bott periodicity, the definition of induction functors, the definition of a completely novel compatible $L^2(G)$-space, a Cuntz picture of $KK^G$, and the verification that $KK^G$ is a triangulated category.
△ Less
Submitted 12 July, 2017; v1 submitted 28 June, 2015;
originally announced June 2015.
-
An elementary Green imprimitivity theorem for inverse semigroups
Authors:
Bernhard Burgstaller
Abstract:
A Morita equivalence similar to that found by Green for crossed products by groups will be established for crossed products by inverse semigroups. More precisely, let $G$ be an inverse semigroup, $H$ a finite sub-inverse semigroup of $G$ and $A$ a $G$-algebra or a $H$-algebra. Then the crossed product $A \rtimes H$ is Morita equivalent to a certain crossed product $B \rtimes G$.
A Morita equivalence similar to that found by Green for crossed products by groups will be established for crossed products by inverse semigroups. More precisely, let $G$ be an inverse semigroup, $H$ a finite sub-inverse semigroup of $G$ and $A$ a $G$-algebra or a $H$-algebra. Then the crossed product $A \rtimes H$ is Morita equivalent to a certain crossed product $B \rtimes G$.
△ Less
Submitted 12 July, 2017; v1 submitted 7 May, 2014;
originally announced May 2014.
-
The universal property of inverse semigroup equivariant $KK$-theory
Authors:
Bernhard Burgstaller
Abstract:
Higson proved that every homotopy invariant, stable and split exact functor from the category of $C^*$-algebras to an additive category factors through Kasparov's $KK$-theory. By adapting a group equivariant generalization of this result by Thomsen, we generalize Higson's result to the inverse semigroup equivariant setting.
Higson proved that every homotopy invariant, stable and split exact functor from the category of $C^*$-algebras to an additive category factors through Kasparov's $KK$-theory. By adapting a group equivariant generalization of this result by Thomsen, we generalize Higson's result to the inverse semigroup equivariant setting.
△ Less
Submitted 12 May, 2017; v1 submitted 7 May, 2014;
originally announced May 2014.
-
A Green--Julg isomorphism for inverse semigroups
Authors:
Bernhard Burgstaller
Abstract:
For every finite unital inverse semigroup $S$ and $S$-$C^*$-algebra $A$ we establish an isomorphism between $KK^S(\mathbb{C},A)$ and $K(A \rtimes S)$. This extends the classical Green--Julg isomorphism from finite groups to finite inverse semigroups.
For every finite unital inverse semigroup $S$ and $S$-$C^*$-algebra $A$ we establish an isomorphism between $KK^S(\mathbb{C},A)$ and $K(A \rtimes S)$. This extends the classical Green--Julg isomorphism from finite groups to finite inverse semigroups.
△ Less
Submitted 7 May, 2014;
originally announced May 2014.
-
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Authors:
Wasuwee Sodsong,
**gun Hong,
Seongwook Chung,
Yeongkyu Lim,
Shin-Dug Kim,
Bernd Burgstaller
Abstract:
With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable of joining forces of a sys…
▽ More
With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable of joining forces of a system's CPU and GPU for JPEG decoding. In this paper we introduce a novel JPEG decoding scheme for heterogeneous architectures consisting of a CPU and an OpenCL-programmable GPU. We employ an offline profiling step to determine the performance of a system's CPU and GPU with respect to JPEG decoding. For a given JPEG image, our performance model uses (1) the CPU and GPU performance characteristics, (2) the image entropy and (3) the width and height of the image to balance the JPEG decoding workload on the underlying hardware. Our run-time partitioning and scheduling scheme exploits task, data and pipeline parallelism by scheduling the non-parallelizable entropy decoding task on the CPU, whereas inverse cosine transformations (IDCTs), color conversions and upsampling are conducted on both the CPU and the GPU. Our kernels have been optimized for GPU memory hierarchies. We have implemented the proposed method in the context of the libjpeg-turbo library, which is an industrial-strength JPEG encoding and decoding engine. Libjpeg-turbo's hand-optimized SIMD routines for ARM and x86 constitute a competitive yardstick for the comparison to the proposed approach. Retro-fitting our method with libjpeg-turbo provided insights on the software-engineering aspects of re-engineering legacy code for heterogeneous multicores.
△ Less
Submitted 12 May, 2014; v1 submitted 20 November, 2013;
originally announced November 2013.
-
Equivariant $KK$-theory of $r$-discrete groupoids and inverse semigroups
Authors:
Bernhard Burgstaller
Abstract:
For an $r$-discrete Hausdorff groupoid ${\cal G}$ and an inverse semigroup $S$ of slices of ${\cal G}$ there is an isomorphism between ${\cal G}$-equivariant $KK$-theory and compatible $S$-equivariant $KK$-theory. We use it to define descent homomorphisms for $S$, and indicate a Baum--Connes map for inverse semigroups. Also findings by Khoshkam and Skandalis for crossed products by inverse semigro…
▽ More
For an $r$-discrete Hausdorff groupoid ${\cal G}$ and an inverse semigroup $S$ of slices of ${\cal G}$ there is an isomorphism between ${\cal G}$-equivariant $KK$-theory and compatible $S$-equivariant $KK$-theory. We use it to define descent homomorphisms for $S$, and indicate a Baum--Connes map for inverse semigroups. Also findings by Khoshkam and Skandalis for crossed products by inverse semigroups are reflected in $KK$-theory.
△ Less
Submitted 21 November, 2012;
originally announced November 2012.
-
A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments
Authors:
Yousun Ko,
Minyoung Jung,
Yo-Sub Han,
Bernd Burgstaller
Abstract:
We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculativ…
▽ More
We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculative overhead. Unlike previous approaches, our speculation is failure-free, i.e., (1) sequential semantics are maintained, and (2) speed-downs are avoided altogether. On architectures with a SIMD gather-operation for indexed memory loads, our matching operation is fully vectorized. The proposed load-balancing scheme uses an off-line profiling step to determine the matching capacity of each par- ticipating processor. Based on matching capacities, DFA matches are load-balanced on inhomogeneous parallel architectures such as cloud computing environments. We evaluated our speculative DFA membership test for a representative set of benchmarks from the Perl-compatible Regular Expression (PCRE) library and the PROSITE protein database. Evaluation was conducted on a 4 CPU (40 cores) shared-memory node of the Intel Manycore Testing Lab (Intel MTL), on the Intel AVX2 SDE simulator for 8-way fully vectorized SIMD execution, and on a 20-node (288 cores) cluster on the Amazon EC2 computing cloud.
△ Less
Submitted 22 July, 2013; v1 submitted 18 October, 2012;
originally announced October 2012.
-
On certain properties of Cuntz--Krieger type algebras
Authors:
Bernhard Burgstaller
Abstract:
The note presents a further study of the class of Cuntz--Krieger type algebras. A necessary and sufficient condition is identified that ensures that the algebra is purely infinite, the ideal structure is studied, % and applied to semigraph algebras, and nuclearity is proved by presenting the algebra as a crossed product of an AF-algebra by an abelian group. The results are applied to examples of C…
▽ More
The note presents a further study of the class of Cuntz--Krieger type algebras. A necessary and sufficient condition is identified that ensures that the algebra is purely infinite, the ideal structure is studied, % and applied to semigraph algebras, and nuclearity is proved by presenting the algebra as a crossed product of an AF-algebra by an abelian group. The results are applied to examples of Cuntz--Krieger type algebras, such as higher rank semigraph $C^*$-algebras and higher rank Exel-Laca algebras.
△ Less
Submitted 18 November, 2011;
originally announced November 2011.
-
On freely generated semigraph $C^*$-algebras
Authors:
Bernhard Burgstaller
Abstract:
For special universal $C^*$-algebras associated to $k$-semigraphs we present the universal representations of these algebras, prove a Cuntz--Krieger uniqueness theorem, and compute the $K$-theory. These $C^*$-algebras seem to be the most universal Cuntz--Krieger like algebras naturally associated to $k$-semigraphs. For instance, the Toeplitz Cuntz algebra is a proper quotient of such an algebra.
For special universal $C^*$-algebras associated to $k$-semigraphs we present the universal representations of these algebras, prove a Cuntz--Krieger uniqueness theorem, and compute the $K$-theory. These $C^*$-algebras seem to be the most universal Cuntz--Krieger like algebras naturally associated to $k$-semigraphs. For instance, the Toeplitz Cuntz algebra is a proper quotient of such an algebra.
△ Less
Submitted 20 June, 2013; v1 submitted 18 November, 2011;
originally announced November 2011.
-
A Cuntz--Krieger uniqueness theorem for semigraph $C^*$-algebras
Authors:
Bernhard Burgstaller
Abstract:
Higher rank semigraph algebras are introduced by mixing concepts of ultragraph algebras and higher rank graph algebras. This yields a kind of higher rank generalisation of ultragraph algebras. We prove Cuntz--Krieger uniqueness theorems for cancelling semigraph algebras and aperiodic full semigraph algebras.
Higher rank semigraph algebras are introduced by mixing concepts of ultragraph algebras and higher rank graph algebras. This yields a kind of higher rank generalisation of ultragraph algebras. We prove Cuntz--Krieger uniqueness theorems for cancelling semigraph algebras and aperiodic full semigraph algebras.
△ Less
Submitted 17 November, 2011;
originally announced November 2011.
-
A descent homomorphism for semimultiplicative sets
Authors:
Bernhard Burgstaller
Abstract:
We define and provide some basic analysis of various types of crossed products by semimultiplicative sets, and then prove a $KK$-theoretical descent homomorphisms for semimultiplicative sets in accord with the descent homomorphism for discrete groups.
We define and provide some basic analysis of various types of crossed products by semimultiplicative sets, and then prove a $KK$-theoretical descent homomorphisms for semimultiplicative sets in accord with the descent homomorphism for discrete groups.
△ Less
Submitted 17 November, 2011;
originally announced November 2011.
-
Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture
Authors:
Raymes Khoury,
Bernd Burgstaller,
Bernhard Scholz
Abstract:
Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their scripting languages with matrix data types. Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell process…
▽ More
Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their scripting languages with matrix data types. Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell processor (Cell), which is currently used in the fastest computer in the world.
We present a new framework that extends Octave to harness the computational power of the Cell. With this framework the programmer is relieved of the burden of introducing explicit notions of parallelism. Instead the programmer uses a new matrix data-type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data-type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into sub-matrices), scheduled and executed on the SPEs. Thereby we exploit (1) data parallelism, (2) instruction level parallelism, (3) pipeline parallelism and (4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.
△ Less
Submitted 14 November, 2009; v1 submitted 13 October, 2009;
originally announced October 2009.