-
Children's Overtrust and Shifting Perspectives of Generative AI
Authors:
Jaemarie Solyst,
Ellia Yang,
Shixian Xie,
Jessica Hammer,
Amy Ogan,
Motahhare Eslami
Abstract:
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to e…
▽ More
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to explore middle school girls' (N = 26) attitudes and reasoning about how genAI works. We focused on girls who are often disproportionately impacted by algorithmic bias. We found that: (1) middle school girls were initially overtrusting of genAI, (2) deliberate exposure to the limitations and mistakes of generative AI shifted this overtrust to disillusionment about genAI capabilities, though they were still optimistic for future possibilities of genAI, and (3) their ideas about school policy were nuanced. This work informs how children think about genAI like ChatGPT and its integration in learning settings.
△ Less
Submitted 29 June, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI
Authors:
Naweiluo Zhou,
Florent Dufour,
Vinzent Bode,
Peter Zinterhof,
Nicolay J Hammer,
Dieter Kranzlmüller
Abstract:
Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally…
▽ More
Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally outsourced which are more exposed to risks. We present a secure cloud architecture and describes how it enables workflow packaging and scheduling while kee** its data, logic and computation secure in transit, in use and at rest.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Practical Aspects of Membership Problem of Watson-Crick Context-free Grammars
Authors:
Jan Hammer,
Zbyněk Křivka
Abstract:
This paper focuses on Watson-Crick languages inspired by DNA computing, their models, and algorithms for deciding the language membership. It analyzes a recently introduced algorithm called WK-CYK and introduces a state space search algorithm that is based on regular Breadth-first search but uses a number of optimizations and heuristics to be efficient in practical use and able to analyze longer i…
▽ More
This paper focuses on Watson-Crick languages inspired by DNA computing, their models, and algorithms for deciding the language membership. It analyzes a recently introduced algorithm called WK-CYK and introduces a state space search algorithm that is based on regular Breadth-first search but uses a number of optimizations and heuristics to be efficient in practical use and able to analyze longer inputs. The key parts are the heuristics for pruning the state space (detecting dead ends) and heuristics for choosing the most promising branches to continue the search.
These two algorithms have been tested with 20 different Watson-Crick grammars (40 including their Chomsky normal form versions). While WK-CYK is able to decide the language membership in a reasonable time for inputs of the length of roughly 30-50 symbols and its performance is very consistent for all kinds of grammars and inputs, the state space search is usually (89-98 % of cases) more efficient and able to do the computation for inputs with lengths of hundreds or even thousands of symbols. Thus, the state space search has the potential to be a good tool for practical Watson-Crick membership testing and is a good basis for improvement the efficiency of the algorithm in the future.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Cloud, Fog or Edge: Where to Compute?
Authors:
Dragi Kimovski,
Roland Mathá,
Josef Hammer,
Narges Mehran,
Hermann Hellwagner,
Radu Prodan
Abstract:
The computing continuum extends the high-performance cloud data centers with energy-efficient and low-latency devices close to the data sources located at the edge of the network. However, the heterogeneity of the computing continuum raises multiple challenges related to application management. These include where to offload an application - from the cloud to the edge - to meet its computation and…
▽ More
The computing continuum extends the high-performance cloud data centers with energy-efficient and low-latency devices close to the data sources located at the edge of the network. However, the heterogeneity of the computing continuum raises multiple challenges related to application management. These include where to offload an application - from the cloud to the edge - to meet its computation and communication requirements. To support these decisions, we provide in this article a detailed performance and carbon footprint analysis of a selection of use case applications with complementary resource requirements across the computing continuum over a real-life evaluation testbed.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Audience and Streamer Participation at Scale on Twitch
Authors:
Claudia Flores-Saviaga,
Jessica Hammer,
Juan Pablo Flores,
Joseph Seering,
Stuart Reeves,
Saiph Savage
Abstract:
Large-scale streaming platforms such as Twitch are becoming increasingly popular, but detailed audience-streamer interaction dynamics remain unexplored at scale. In this paper, we perform a mixed-methods study on a dataset with over 12 million audience chat messages and 45 hours of streaming video to understand audience participation and streamer performance on Twitch. We uncover five types of str…
▽ More
Large-scale streaming platforms such as Twitch are becoming increasingly popular, but detailed audience-streamer interaction dynamics remain unexplored at scale. In this paper, we perform a mixed-methods study on a dataset with over 12 million audience chat messages and 45 hours of streaming video to understand audience participation and streamer performance on Twitch. We uncover five types of streams based on size and audience participation styles: Clique Streams, small streams with close streamer-audience interactions; Rising Streamers, mid-range streams using custom technology and moderators to formalize their communities; Chatter-boxes, mid-range streams with established conversational dynamics; Spotlight Streamers, large streams that engage large numbers of viewers while still retaining a sense of community; and Professionals, massive streams with the stadium-style audiences. We discuss challenges and opportunities emerging for streamers and audiences from each style and conclude by providing data-backed design implications that empower streamers, audiences, live streaming platforms, and game designers
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
Authors:
Jan Laukemann,
Julian Hammer,
Georg Hager,
Gerhard Wellein
Abstract:
Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and…
▽ More
Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and Execution-Cache-Memory (ECM) models. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction including the detection of loop-carried dependencies, which turns it into a versatile cross-architecture modeling tool. We show runtime predictions for code on Intel Cascade Lake, AMD Zen, and Marvell ThunderX2 micro-architectures based on machine models from available documentation and semi-automatic benchmarking. The predictions are compared with actual measurements.
△ Less
Submitted 21 October, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT
Authors:
Julian Hornich,
Julian Hammer,
Georg Hager,
Thomas Gruber,
Gerhard Wellein
Abstract:
Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a…
▽ More
Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a generalizable methodology for reproducible measurements accompanied by state-of-the-art performance models. Our open-source toolchain, and collected results are publicly available in the "Intranode Stencil Performance Evaluation Collection" (INSPECT). We present the underlying methodologies, models and tools involved in gathering and documenting the performance behavior of a collection of typical stencil patterns across multiple architectures and hardware configuration options. Our aim is to endow performance-aware application developers with reproducible baseline performance data and validated models to initiate a well-defined process of performance assessment and optimization.
△ Less
Submitted 2 July, 2019; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Exploiting the Space Filling Curve Ordering of Particles in the Neighbour Search of Gadget3
Authors:
Antonio Ragagnin,
Nikola Tchipev,
Michael Bader,
Klaus Dolag,
Nicolay J. Hammer
Abstract:
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the N…
▽ More
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the Neighbour Search which takes advantage of the space-filling-curve particle ordering. Instead of performing Neighbour Search for all particles individually, nearby active particles can be grouped and one single Neighbour Search can be performed to obta\ in a common superset of neighbours. Thus, with this approach we reduce the number of searches. On the other hand, tree walks are performed within a larger searching radius. There is an optimal size of grou** that maximize the speedup, which we found by numerical experiments. We tested the algorithm within the boxes of the Magneticum project. As a result we obtained a speedup of $1.65$ in the Density and of $1.30$ in the Hydrodynamics computation, respectively, and a total speedup of $1.34.$
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Pushing the Limits of Encrypted Databases with Secure Hardware
Authors:
Panagiotis Antonopoulos,
Arvind Arasu,
Ken Eguro,
Joachim Hammer,
Raghav Kaushik,
Donald Kossmann,
Ravi Ramamurthy,
Jakub Szymaszek
Abstract:
Encrypted databases have been studied for more than 10 years and are quickly emerging as a critical technology for the cloud. The current state of the art is to use property-preserving encrypting techniques (e.g., deterministic encryption) to protect the confidentiality of the data and support query processing at the same time. Unfortunately, these techniques have many limitations. Recently, trust…
▽ More
Encrypted databases have been studied for more than 10 years and are quickly emerging as a critical technology for the cloud. The current state of the art is to use property-preserving encrypting techniques (e.g., deterministic encryption) to protect the confidentiality of the data and support query processing at the same time. Unfortunately, these techniques have many limitations. Recently, trusted computing platforms (e.g., Intel SGX) have emerged as an alternative to implement encrypted databases. This paper demonstrates some vulnerabilities and the limitations of this technology, but it also shows how to make best use of it in order to improve on confidentiality, functionality, and performance.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
Authors:
Jan Laukemann,
Julian Hammer,
Johannes Hofmann,
Georg Hager,
Gerhard Wellein
Abstract:
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understandin…
▽ More
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel Skylake and AMD Zen micro-architectures. To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements. Finally we give an outlook on how the method may be generalized to new architectures.
△ Less
Submitted 10 October, 2018; v1 submitted 4 September, 2018;
originally announced September 2018.
-
Cross-paradigm pretraining of convolutional networks improves intracranial EEG decoding
Authors:
Joos Behncke,
Robin Tibor Schirrmeister,
Martin Völker,
Jiří Hammer,
Petr Marusič,
Andreas Schulze-Bonhage,
Wolfram Burgard,
Tonio Ball
Abstract:
When it comes to the classification of brain signals in real-life applications, the training and the prediction data are often described by different distributions. Furthermore, diverse data sets, e.g., recorded from various subjects or tasks, can even exhibit distinct feature spaces. The fact that data that have to be classified are often only available in small amounts reinforces the need for te…
▽ More
When it comes to the classification of brain signals in real-life applications, the training and the prediction data are often described by different distributions. Furthermore, diverse data sets, e.g., recorded from various subjects or tasks, can even exhibit distinct feature spaces. The fact that data that have to be classified are often only available in small amounts reinforces the need for techniques to generalize learned information, as performances of brain-computer interfaces (BCIs) are enhanced by increasing quantity of available data. In this paper, we apply transfer learning to a framework based on deep convolutional neural networks (deep ConvNets) to prove the transferability of learned patterns in error-related brain signals across different tasks. The experiments described in this paper demonstrate the usefulness of transfer learning, especially improving performances when only little data can be used to distinguish between erroneous and correct realization of a task. This effect could be delimited from a transfer of merely general brain signal characteristics, underlining the transfer of error-specific information. Furthermore, we could extract similar patterns in time-frequency analyses in identical channels, leading to selective high signal correlations between the two different paradigms. Classification on the intracranial data yields in median accuracies up to $(81.50 \pm 9.49)\,\%$. Decoding on only $10\%$ of the data without pre-training reaches performances of $(54.76 \pm 3.56)\,\%$, compared to $(64.95 \pm 0.79)\,\%$ with pre-training.
△ Less
Submitted 20 July, 2018; v1 submitted 20 June, 2018;
originally announced June 2018.
-
Intracranial Error Detection via Deep Learning
Authors:
Martin Völker,
Jiří Hammer,
Robin T. Schirrmeister,
Joos Behncke,
Lukas D. J. Fiederer,
Andreas Schulze-Bonhage,
Petr Marusič,
Wolfram Burgard,
Tonio Ball
Abstract:
Deep learning techniques have revolutionized the field of machine learning and were recently successfully applied to various classification problems in noninvasive electroencephalography (EEG). However, these methods were so far only rarely evaluated for use in intracranial EEG. We employed convolutional neural networks (CNNs) to classify and characterize the error-related brain response as measur…
▽ More
Deep learning techniques have revolutionized the field of machine learning and were recently successfully applied to various classification problems in noninvasive electroencephalography (EEG). However, these methods were so far only rarely evaluated for use in intracranial EEG. We employed convolutional neural networks (CNNs) to classify and characterize the error-related brain response as measured in 24 intracranial EEG recordings. Decoding accuracies of CNNs were significantly higher than those of a regularized linear discriminant analysis. Using time-resolved deep decoding, it was possible to classify errors in various regions in the human brain, and further to decode errors over 200 ms before the actual erroneous button press, e.g., in the precentral gyrus. Moreover, deeper networks performed better than shallower networks in distinguishing correct from error trials in all-channel decoding. In single recordings, up to 100 % decoding accuracy was achieved. Visualization of the networks' learned features indicated that multivariate decoding on an ensemble of channels yields related, albeit non-redundant information compared to single-channel decoding. In summary, here we show the usefulness of deep learning for both intracranial error decoding and map** of the spatio-temporal structure of the human error processing network.
△ Less
Submitted 2 November, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
Authors:
Julian Hammer,
Jan Eitzinger,
Georg Hager,
Gerhard Wellein
Abstract:
Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple mach…
▽ More
Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip. We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.
△ Less
Submitted 13 January, 2017;
originally announced February 2017.
-
Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures
Authors:
Fabio Baruffa,
Luigi Iapichino,
Nicolay J. Hammer,
Vasileios Karakasis
Abstract:
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications inclu…
▽ More
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 \times$ on Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures.
△ Less
Submitted 10 May, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
Authors:
Julian Hammer,
Georg Hager,
Jan Eitzinger,
Gerhard Wellein
Abstract:
Analytic performance models are essential for understanding the performance characteristics of loop kernels, which consume a major part of CPU cycles in computational science. Starting from a validated performance model one can infer the relevant hardware bottlenecks and promising optimization opportunities. Unfortunately, analytic performance modeling is often tedious even for experienced develop…
▽ More
Analytic performance models are essential for understanding the performance characteristics of loop kernels, which consume a major part of CPU cycles in computational science. Starting from a validated performance model one can infer the relevant hardware bottlenecks and promising optimization opportunities. Unfortunately, analytic performance modeling is often tedious even for experienced developers since it requires in-depth knowledge about the hardware and how it interacts with the software. We present the "Kerncraft" tool, which eases the construction of analytic performance models for streaming kernels and stencil loop nests. Starting from the loop source code, the problem size, and a description of the underlying hardware, Kerncraft can ideally predict the single-core performance and scaling behavior of loops on multicore processors using the Roofline or the Execution-Cache-Memory (ECM) model. We describe the operating principles of Kerncraft with its capabilities and limitations, and we show how it may be used to quickly gain insights by accelerated analytic modeling.
△ Less
Submitted 5 November, 2015; v1 submitted 12 September, 2015;
originally announced September 2015.