-
Reproducibility, Replicability, and Repeatability: A survey of reproducible research with a focus on high performance computing
Authors:
Benjamin A. Antunes,
David R. C. Hill
Abstract:
Reproducibility is widely acknowledged as a fundamental principle in scientific research. Currently, the scientific community grapples with numerous challenges associated with reproducibility, often referred to as the ''reproducibility crisis.'' This crisis permeated numerous scientific disciplines. In this study, we examined the factors in scientific practices that might contribute to this lack o…
▽ More
Reproducibility is widely acknowledged as a fundamental principle in scientific research. Currently, the scientific community grapples with numerous challenges associated with reproducibility, often referred to as the ''reproducibility crisis.'' This crisis permeated numerous scientific disciplines. In this study, we examined the factors in scientific practices that might contribute to this lack of reproducibility. Significant focus is placed on the prevalent integration of computation in research, which can sometimes function as a black box in published papers. Our study primarily focuses on highperformance computing (HPC), which presents unique reproducibility challenges. This paper provides a comprehensive review of these concerns and potential solutions. Furthermore, we discuss the critical role of reproducible research in advancing science and identifying persisting issues within the field of HPC.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations
Authors:
Benjamin Antunes,
David R. C Hill
Abstract:
Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods. The field of machine learning holds the potential for substantial advancements across various domains, as exemplified by recent breakthroughs in Large Language Models (LLMs). However, despite the growing interest, persistent concerns include issues rela…
▽ More
Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods. The field of machine learning holds the potential for substantial advancements across various domains, as exemplified by recent breakthroughs in Large Language Models (LLMs). However, despite the growing interest, persistent concerns include issues related to reproducibility and energy consumption. Reproducibility is crucial for robust scientific inquiry and explainability, while energy efficiency underscores the imperative to conserve finite global resources. This study delves into the investigation of whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical reproducibility when compared to the original C implementation of the respective PRNG algorithms. Additionally, we aim to evaluate the time efficiency and energy consumption of various implementations. Our experiments encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal performance of machine learning technologies closely aligns with that of C-based implementations, with instances of achieving even superior performances. On the other hand, it is noteworthy that ML technologies consumed only 10% more energy than their C-implementation counterparts. However, while statistical quality was found to be comparable, achieving numerical reproducibility across different platforms for identical seeds and algorithms was not achieved.
△ Less
Submitted 10 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Identifying Quality Mersenne Twister Streams For Parallel Stochastic Simulations
Authors:
Benjamin Antunes,
Claude Mazel,
David R. C Hill
Abstract:
The Mersenne Twister (MT) is a pseudo-random number generator (PRNG) widely used in High Performance Computing for parallel stochastic simulations. We aim to assess the quality of common parallelization techniques used to generate large streams of MT pseudo-random numbers. We compare three techniques: sequence splitting, random spacing and MT indexed sequence. The TestU01 Big Crush battery is used…
▽ More
The Mersenne Twister (MT) is a pseudo-random number generator (PRNG) widely used in High Performance Computing for parallel stochastic simulations. We aim to assess the quality of common parallelization techniques used to generate large streams of MT pseudo-random numbers. We compare three techniques: sequence splitting, random spacing and MT indexed sequence. The TestU01 Big Crush battery is used to evaluate the quality of 4096 streams for each technique on three different hardware configurations. Surprisingly, all techniques exhibited almost 30% of defects with no technique showing better quality than the others. While all 106 Big Crush tests showed failures, the failure rate was limited to a small number of tests (maximum of 6 tests failed per stream, resulting in over 94% success rate). Thanks to 33 CPU years, high-quality streams identified are given. They can be used for sensitive parallel simulations such as nuclear medicine and precise high-energy physics applications.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
SmoothHess: ReLU Network Feature Interactions via Stein's Lemma
Authors:
Max Torop,
Aria Masoomi,
Davin Hill,
Kivanc Kose,
Stratis Ioannidis,
Jennifer Dy
Abstract:
Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a…
▽ More
Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a Gaussian through an efficient sampling algorithm, requiring only network gradient calls. SmoothHess is applied post-hoc, requires no modifications to the ReLU network architecture, and the extent of smoothing can be controlled explicitly. We provide a non-asymptotic bound on the sample complexity of our estimation procedure. We validate the superior ability of SmoothHess to capture interactions on benchmark datasets and a real-world medical spirometry dataset.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Towards quantum enhanced adversarial robustness in machine learning
Authors:
Maxwell T. West,
Shu-Lok Tsang,
Jia S. Low,
Charles D. Hill,
Christopher Leckie,
Lloyd C. L. Hollenberg,
Sarah M. Erfani,
Muhammad Usman
Abstract:
Machine learning algorithms are powerful tools for data driven tasks such as image classification and feature detection, however their vulnerability to adversarial examples - input samples manipulated to fool the algorithm - remains a serious challenge. The integration of machine learning with quantum computing has the potential to yield tools offering not only better accuracy and computational ef…
▽ More
Machine learning algorithms are powerful tools for data driven tasks such as image classification and feature detection, however their vulnerability to adversarial examples - input samples manipulated to fool the algorithm - remains a serious challenge. The integration of machine learning with quantum computing has the potential to yield tools offering not only better accuracy and computational efficiency, but also superior robustness against adversarial attacks. Indeed, recent work has employed quantum mechanical phenomena to defend against adversarial attacks, spurring the rapid development of the field of quantum adversarial machine learning (QAML) and potentially yielding a new source of quantum advantage. Despite promising early results, there remain challenges towards building robust real-world QAML tools. In this review we discuss recent progress in QAML and identify key challenges. We also suggest future research directions which could determine the route to practicality for QAML approaches as quantum computing hardware scales up and noise levels are reduced.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Explanations of Black-Box Models based on Directional Feature Interactions
Authors:
Aria Masoomi,
Davin Hill,
Zhonghui Xu,
Craig P Hersh,
Edwin K. Silverman,
Peter J. Castaldi,
Stratis Ioannidis,
Jennifer Dy
Abstract:
As machine learning algorithms are deployed ubiquitously to a variety of domains, it is imperative to make these often black-box models transparent. Several recent works explain black-box models by capturing the most influential features for prediction per instance; such explanation methods are univariate, as they characterize importance per feature. We extend univariate explanation to a higher-or…
▽ More
As machine learning algorithms are deployed ubiquitously to a variety of domains, it is imperative to make these often black-box models transparent. Several recent works explain black-box models by capturing the most influential features for prediction per instance; such explanation methods are univariate, as they characterize importance per feature. We extend univariate explanation to a higher-order; this enhances explainability, as bivariate methods can capture feature interactions in black-box models, represented as a directed graph. Analyzing this graph enables us to discover groups of features that are equally important (i.e., interchangeable), while the notion of directionality allows us to identify the most influential features. We apply our bivariate method on Shapley value explanations, and experimentally demonstrate the ability of directional explanations to discover feature interactions. We show the superiority of our method against state-of-the-art on CIFAR10, IMDB, Census, Divorce, Drug, and gene data.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
A Subset of the CERN Virtual Machine File System: Fast Delivering of Complex Software Stacks for Supercomputing Resources
Authors:
Alexandre F Boyer,
Christophe Haen,
Federico Stagni,
David R C Hill
Abstract:
Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various communities to deploy software on worldwide distributed computing infrastructures by decoupling the software from the Operating System. However, the installation of thi…
▽ More
Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various communities to deploy software on worldwide distributed computing infrastructures by decoupling the software from the Operating System. However, the installation of this file system depends on a collaboration with system administrators of the remote resources and an HTTP connectivity to fetch dependencies from external sources. Supercomputers, which offer tremendous computing power, generally have more restrictive policies than grid sites and do not easily provide the mandatory conditions to exploit CVMFS. Different solutions have been developed to tackle the issue, but they are often specific to a scientific community and do not deal with the problem in its globality. In this paper, we provide a generic utility to assist any community in the installation of complex software dependencies on supercomputers with no external connectivity. The approach consists in capturing dependencies of applications of interests, building a subset of dependencies, testing it in a given environment, and deploying it to a remote computing resource. We experiment this proposal with a real use case by exporting Gauss-a Monte-Carlo simulation program from the LHCb experiment-on Mare Nostrum, one of the top supercomputers of the world. We provide steps to encapsulate the minimum required files and deliver a light and easy-to-update subset of CVMFS: 12.4 Gigabytes instead of 5.2 Terabytes for the whole LHCb repository.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
GASP -- A Genetic Algorithm for State Preparation
Authors:
Floyd M. Creevey,
Charles D. Hill,
Lloyd C. L. Hollenberg
Abstract:
The efficient preparation of quantum states is an important step in the execution of many quantum algorithms. In the noisy intermediate-scale quantum (NISQ) computing era, this is a significant challenge given quantum resources are scarce and typically only low-depth quantum circuits can be implemented on physical devices. We present a genetic algorithm for state preparation (GASP) which generates…
▽ More
The efficient preparation of quantum states is an important step in the execution of many quantum algorithms. In the noisy intermediate-scale quantum (NISQ) computing era, this is a significant challenge given quantum resources are scarce and typically only low-depth quantum circuits can be implemented on physical devices. We present a genetic algorithm for state preparation (GASP) which generates relatively low-depth quantum circuits for initialising a quantum computer in a specified quantum state. The method uses a basis set of R_x, R_y, R_z, and CNOT gates and a genetic algorithm to systematically generate circuits to synthesize the target state to the required fidelity. GASP can produce more efficient circuits of a given accuracy with lower depth and gate counts than other methods. This variability of the required accuracy facilitates overall higher accuracy on implementation, as error accumulation in high-depth circuits can be avoided. We directly compare the method to the state initialisation technique based on an exact synthesis technique by implemented in IBM Qiskit simulated with noise and implemented on physical IBM Quantum devices. Results achieved by GASP outperform Qiskit's exact general circuit synthesis method on a variety of states such as Gaussian states and W-states, and consistently show the method reduces the number of gates required for the quantum circuits to generate these quantum states to the required accuracy.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Geometry of Score Based Generative Models
Authors:
Sandesh Ghimire,
**yang Liu,
Armand Comas,
Davin Hill,
Aria Masoomi,
Octavia Camps,
Jennifer Dy
Abstract:
In this work, we look at Score-based generative models (also called diffusion generative models) from a geometric perspective. From a new view point, we prove that both the forward and backward process of adding noise and generating from noise are Wasserstein gradient flow in the space of probability measures. We are the first to prove this connection. Our understanding of Score-based (and Diffusi…
▽ More
In this work, we look at Score-based generative models (also called diffusion generative models) from a geometric perspective. From a new view point, we prove that both the forward and backward process of adding noise and generating from noise are Wasserstein gradient flow in the space of probability measures. We are the first to prove this connection. Our understanding of Score-based (and Diffusion) generative models have matured and become more complete by drawing ideas from different fields like Bayesian inference, control theory, stochastic differential equation and Schrodinger bridge. However, many open questions and challenges remain. One problem, for example, is how to decrease the sampling time? We demonstrate that looking from geometric perspective enables us to answer many of these questions and provide new interpretations to some known results. Furthermore, geometric perspective enables us to devise an intuitive geometric solution to the problem of faster sampling. By augmenting traditional score-based generative models with a projection step, we show that we can generate high quality images with significantly fewer sampling-steps.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Divide and Compose with Score Based Generative Models
Authors:
Sandesh Ghimire,
Armand Comas,
Davin Hill,
Aria Masoomi,
Octavia Camps,
Jennifer Dy
Abstract:
While score based generative models, or diffusion models, have found success in image synthesis, they are often coupled with text data or image label to be able to manipulate and conditionally generate images. Even though manipulation of images by changing the text prompt is possible, our understanding of the text embedding and our ability to modify it to edit images is quite limited. Towards the…
▽ More
While score based generative models, or diffusion models, have found success in image synthesis, they are often coupled with text data or image label to be able to manipulate and conditionally generate images. Even though manipulation of images by changing the text prompt is possible, our understanding of the text embedding and our ability to modify it to edit images is quite limited. Towards the direction of having more control over image manipulation and conditional generation, we propose to learn image components in an unsupervised manner so that we can compose those components to generate and manipulate images in informed manner. Taking inspiration from energy based models, we interpret different score components as the gradient of different energy functions. We show how score based learning allows us to learn interesting components and we can visualize them through generation. We also show how this novel decomposition allows us to compose, generate and modify images in interesting ways akin to dreaming. We make our code available at https://github.com/sandeshgh/Score-based-disentanglement
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Boundary-Aware Uncertainty for Feature Attribution Explainers
Authors:
Davin Hill,
Aria Masoomi,
Max Torop,
Sandesh Ghimire,
Jennifer Dy
Abstract:
Post-hoc explanation methods have become a critical tool for understanding black-box classifiers in high-stakes applications. However, high-performing classifiers are often highly nonlinear and can exhibit complex behavior around the decision boundary, leading to brittle or misleading local explanations. Therefore there is an impending need to quantify the uncertainty of such explanation methods i…
▽ More
Post-hoc explanation methods have become a critical tool for understanding black-box classifiers in high-stakes applications. However, high-performing classifiers are often highly nonlinear and can exhibit complex behavior around the decision boundary, leading to brittle or misleading local explanations. Therefore there is an impending need to quantify the uncertainty of such explanation methods in order to understand when explanations are trustworthy. In this work we propose the Gaussian Process Explanation UnCertainty (GPEC) framework, which generates a unified uncertainty estimate combining decision boundary-aware uncertainty with explanation function approximation uncertainty. We introduce a novel geodesic-based kernel, which captures the complexity of the target black-box decision boundary. We show theoretically that the proposed kernel similarity increases with decision boundary complexity. The proposed framework is highly flexible; it can be used with any black-box classifier and feature attribution method. Empirical results on multiple tabular and image datasets show that the GPEC uncertainty estimate improves understanding of explanations as compared to existing methods.
△ Less
Submitted 4 March, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
A kernel-based quantum random forest for improved classification
Authors:
Maiyuren Srikumar,
Charles D. Hill,
Lloyd C. L. Hollenberg
Abstract:
The emergence of Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation. There is therefore an imperative to develop quantum models with unique model hypotheses to attain expressional and computational advantage. In this work we extend the linear quantum support vector machine (QSVM) with kernel function computed through qua…
▽ More
The emergence of Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation. There is therefore an imperative to develop quantum models with unique model hypotheses to attain expressional and computational advantage. In this work we extend the linear quantum support vector machine (QSVM) with kernel function computed through quantum kernel estimation (QKE), to form a decision tree classifier constructed from a decision directed acyclic graph of QSVM nodes - the ensemble of which we term the quantum random forest (QRF). To limit overfitting, we further extend the model to employ a low-rank Nyström approximation to the kernel matrix. We provide generalisation error bounds on the model and theoretical guarantees to limit errors due to finite sampling on the Nyström-QKE strategy. In doing so, we show that we can achieve lower sampling complexity when compared to QKE. We numerically illustrate the effect of varying model hyperparameters and finally demonstrate that the QRF is able obtain superior performance over QSVMs, while also requiring fewer kernel estimations.
△ Less
Submitted 19 February, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Bayesian regularization of empirical MDPs
Authors:
Samarth Gupta,
Daniel N. Hill,
Lexing Ying,
Inderjit Dhillon
Abstract:
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling f…
▽ More
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shop** store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
△ Less
Submitted 20 September, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions
Authors:
Zulqarnain Khan,
Davin Hill,
Aria Masoomi,
Joshua Bone,
Jennifer Dy
Abstract:
Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particul…
▽ More
Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
△ Less
Submitted 16 April, 2024; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion
Authors:
Adam Block,
Rahul Kidambi,
Daniel N. Hill,
Thorsten Joachims,
Inderjit S. Dhillon
Abstract:
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To…
▽ More
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To overcome this limitation, we propose a new approach that explicitly optimizes the query suggestions for downstream retrieval performance. We formulate this as a problem of ranking a set of rankings, where each query suggestion is represented by the downstream item ranking it produces. We then present a learning method that ranks query suggestions by the quality of their item rankings. The algorithm is based on a counterfactual learning approach that is able to leverage feedback on the items (e.g., clicks, purchases) to evaluate query suggestions through an unbiased estimator, thus avoiding the assumption that users write or select optimal queries. We establish theoretical support for the proposed approach and provide learning-theoretic guarantees. We also present empirical results on publicly available datasets, and demonstrate real-world applicability using data from an online shop** store.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
Authors:
Huaicheng Li,
Daniel S. Berger,
Stanko Novakovic,
Lisa Hsu,
Dan Ernst,
Pantea Zardoshti,
Monish Shah,
Samir Rajadnya,
Scott Lee,
Ishwar Agarwal,
Mark D. Hill,
Marcus Fontoura,
Ricardo Bianchini
Abstract:
Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However, pooling is challenging under cloud performance requirements. This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and sig…
▽ More
Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However, pooling is challenging under cloud performance requirements. This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and significantly reduces DRAM cost. Pond builds on the Compute Express Link (CXL) standard for load/store access to pool memory and two key insights. First, our analysis of cloud production traces shows that pooling across 8-16 sockets is enough to achieve most of the benefits. This enables a small-pool design with low access latency. Second, it is possible to create machine learning models that can accurately predict how much local and pool memory to allocate to a virtual machine (VM) to resemble same-NUMA-node memory performance. Our evaluation with 158 workloads shows that Pond reduces DRAM costs by 7% with performance within 1-5% of same-NUMA-node VM allocations.
△ Less
Submitted 21 October, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Constant Delay Lattice Train Schedules
Authors:
Jean-Lou De Carufel,
Darryl Hill,
Anil Maheshwari,
Sasanka Roy,
Luís Fernando Schultz Xavier da Silveira
Abstract:
The following geometric vehicle scheduling problem has been considered: given continuous curves $f_1, \ldots, f_n : \mathbb{R} \rightarrow \mathbb{R}^2$, find non-negative delays $t_1, \ldots, t_n$ minimizing $\max \{ t_1, \ldots, t_n \}$ such that, for every distinct $i$ {and $j$} and every time $t$, $| f_j (t - t_j) - f_i (t - t_i) | > \ell$, where~$\ell$ is a given safety distance. We study a v…
▽ More
The following geometric vehicle scheduling problem has been considered: given continuous curves $f_1, \ldots, f_n : \mathbb{R} \rightarrow \mathbb{R}^2$, find non-negative delays $t_1, \ldots, t_n$ minimizing $\max \{ t_1, \ldots, t_n \}$ such that, for every distinct $i$ {and $j$} and every time $t$, $| f_j (t - t_j) - f_i (t - t_i) | > \ell$, where~$\ell$ is a given safety distance. We study a variant of this problem where we consider trains (rods) of fixed length $\ell$ that move at constant speed and sets of train lines (tracks), each of which consisting of an axis-parallel line-segment with endpoints in the integer lattice $\mathbb{Z}^d$ and of a direction of movement (towards $\infty$ {or $- \infty$}). We are interested in upper bounds on the maximum delay we need to introduce on any line to avoid collisions, but more specifically on universal upper bounds that apply no matter the set of train lines. We show small universal constant upper bounds for $d = 2$ and any given $\ell$ and also for $d = 3$ and $\ell = 1$. Through clique searching, we are also able to show that several of these upper bounds are tight.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Improved Spanning on Theta-5
Authors:
Prosenjit Bose,
Darryl Hill,
Aurélien Ooms
Abstract:
We show an upper bound of $\frac{
\sin\left(\frac{3π}{10}\right)
}{
\sin\left(\frac{2π}{5}\right)-\sin\left(\frac{3π}{10}\right)
}
<5.70$ on the spanning ratio of $Θ_5$-graphs, improving on the previous best known upper bound of $9.96$ [Bose, Morin, van Renssen, and Verdonschot. The Theta-5-graph is a spanner. Computational Geometry, 2015.]
We show an upper bound of $\frac{
\sin\left(\frac{3π}{10}\right)
}{
\sin\left(\frac{2π}{5}\right)-\sin\left(\frac{3π}{10}\right)
}
<5.70$ on the spanning ratio of $Θ_5$-graphs, improving on the previous best known upper bound of $9.96$ [Bose, Morin, van Renssen, and Verdonschot. The Theta-5-graph is a spanner. Computational Geometry, 2015.]
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
A National Discovery Cloud: Preparing the US for Global Competitiveness in the New Era of 21st Century Digital Transformation
Authors:
Ian Foster,
Daniel Lopresti,
Bill Gropp,
Mark D. Hill,
Katie Schuman
Abstract:
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each dev…
▽ More
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications.
△ Less
Submitted 19 April, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy
Authors:
Rajat Sen,
Alexander Rakhlin,
Lexing Ying,
Rahul Kidambi,
Dean Foster,
Daniel Hill,
Inderjit Dhillon
Abstract:
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weigh…
▽ More
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms. We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step. In the extreme setting, where the total number of arms can be in the millions, we propose a practically-motivated arm hierarchy model that induces a certain structure in mean rewards to ensure statistical and computational efficiency. The hierarchical structure allows for an exponential reduction in the number of relevant arms for each context, thus resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log (|\mathcal{F}|T)})$. Finally, we implement our algorithm using a hierarchical linear function class and show superior performance with respect to well-known benchmarks on simulated bandit feedback experiments using extreme multi-label classification datasets. On a dataset with three million arms, our reduction scheme has an average inference time of only 7.9 milliseconds, which is a 100x improvement.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Advancing Computing's Foundation of US Industry & Society
Authors:
Thomas M. Conte,
Ian T. Foster,
William Gropp,
Mark D. Hill
Abstract:
While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent succes…
▽ More
While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent successes in AI/ML required the synergy of improved algorithms and hardware architectures (e.g., general-purpose graphics processing units). However, unlike in the 20th Century and early 2000s, tomorrow's performance aspirations must be achieved without continued semiconductor scaling formerly provided by Moore's Law and Dennard Scaling. How will one deliver the next 100x improvement in capability at similar or less cost to enable great value? Can we make the next AI leap without 100x better hardware?
This whitepaper argues for a multipronged effort to develop new computing approaches beyond Moore's Law to advance the foundation that computing provides to US industry, education, medicine, science, and government. This impact extends far beyond the IT industry itself, as IT is now central for providing value across society, for example in semi-autonomous vehicles, tele-education, health wearables, viral analysis, and efficient administration. Herein we draw upon considerable visioning work by CRA's Computing Community Consortium (CCC) and the IEEE Rebooting Computing Initiative (IEEE RCI), enabled by thought leader input from industry, academia, and the US government.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Session-Aware Query Auto-completion using Extreme Multi-label Ranking
Authors:
Nishant Yadav,
Rajat Sen,
Daniel N. Hill,
Arya Mazumdar,
Inderjit S. Dhillon
Abstract:
Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by re…
▽ More
Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by recent sequence-to-sequence deep learning models; however, these generative approaches often do not meet the stringent latency requirements of responding to each user keystroke. Moreover, these generative approaches pose the risk of showing nonsensical queries.
In this paper, we provide a solution to this problem: we take the novel approach of modeling session-aware QAC as an eXtreme Multi-Label Ranking (XMR) problem where the input is the previous query in the session and the user's current prefix, while the output space is the set of tens of millions of queries entered by users in the recent past. We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm. The proposed modifications yield a 3.9x improvement in terms of Mean Reciprocal Rank (MRR) over the baseline XMR approach on a public search logs dataset. We are able to maintain an inference latency of less than 10 ms while still using session context. When compared against baseline models of acceptable latency, we observed a 33% improvement in MRR for short prefixes of up to 3 characters. Moreover, our model yielded a statistically significant improvement of 2.81% over a production QAC system in terms of suggestion acceptance rate, when deployed on the search bar of an online shop** store as part of an A/B test.
△ Less
Submitted 21 August, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Opportunities and Challenges for Next Generation Computing
Authors:
Gregory D. Hager,
Mark D. Hill,
Katherine Yelick
Abstract:
Computing has dramatically changed nearly every aspect of our lives, from business and agriculture to communication and entertainment. As a nation, we rely on computing in the design of systems for energy, transportation and defense; and computing fuels scientific discoveries that will improve our fundamental understanding of the world and help develop solutions to major challenges in health and t…
▽ More
Computing has dramatically changed nearly every aspect of our lives, from business and agriculture to communication and entertainment. As a nation, we rely on computing in the design of systems for energy, transportation and defense; and computing fuels scientific discoveries that will improve our fundamental understanding of the world and help develop solutions to major challenges in health and the environment. Computing has changed our world, in part, because our innovations can run on computers whose performance and cost-performance has improved a million-fold over the last few decades. A driving force behind this has been a repeated doubling of the transistors per chip, dubbed Moore's Law. A concomitant enabler has been Dennard Scaling that has permitted these performance doublings at roughly constant power, but, as we will see, both trends face challenges. Consider for a moment the impact of these two trends over the past 30 years. A 1980's supercomputer (e.g. a Cray 2) was rated at nearly 2 Gflops and consumed nearly 200 KW of power. At the time, it was used for high performance and national-scale applications ranging from weather forecasting to nuclear weapons research. A computer of similar performance now fits in our pocket and consumes less than 10 watts. What would be the implications of a similar computing/power reduction over the next 30 years - that is, taking a petaflop-scale machine (e.g. the Cray XK7 which requires about 500 KW for 1 Pflop (=1015 operations/sec) performance) and repeating that process? What is possible with such a computer in your pocket? How would it change the landscape of high capacity computing? In the remainder of this paper, we articulate some opportunities and challenges for dramatic performance improvements of both personal to national scale computing, and discuss some "out of the box" possibilities for achieving computing at this scale.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
MOD: Minimally Ordered Durable Datastructures for Persistent Memory
Authors:
Swapnil Haria,
Mark D. Hill,
Michael M. Swift
Abstract:
Persistent Memory (PM) makes possible recoverable applications that can preserve application progress across system reboots and power failures. Actual recoverability requires careful ordering of cacheline flushes, currently done in two extreme ways. On one hand, expert programmers have reasoned deeply about consistency and durability to create applications centered on a single custom-crafted durab…
▽ More
Persistent Memory (PM) makes possible recoverable applications that can preserve application progress across system reboots and power failures. Actual recoverability requires careful ordering of cacheline flushes, currently done in two extreme ways. On one hand, expert programmers have reasoned deeply about consistency and durability to create applications centered on a single custom-crafted durable datastructure. On the other hand, less-expert programmers have used software transaction memory (STM) to make atomic one or more updates, albeit at a significant performance cost due largely to ordered log updates.
In this work, we propose the middle ground of composable persistent datastructures called Minimally Ordered Durable (MOD) datastructures. MOD is a C++ library of several datastructures---currently, map, set, stack, queue and vector--- that often perform better than STM and yet are relatively easy to use. They allow multiple updates to one or more datastructures to be atomic with respect to failure. Moreover, we provide a recipe to create more recoverable datastructures.
MOD is motivated by our analysis of real Intel Optane PM hardware showing that allowing unordered, overlap** flushes significantly improves performance. MOD reduces ordering by adapting existing techniques for out-of-place updates (like shadow paging) with space-reducing structural sharing (from functional programming). MOD exposes a Basic interface for single updates and a Composition interface for atomically performing multiple updates. Relative to the state-of-the-art Intel PMDK v1.5 STM, MOD improves map, set, stack, queue microbenchmark performance by 40%, and speeds up application benchmark performance by 38%.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
A Zero Attention Model for Personalized Product Search
Authors:
Qingyao Ai,
Daniel N. Hill,
S. V. N. Vishwanathan,
W. Bruce Croft
Abstract:
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful…
▽ More
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful for identifying the individual intent of each product search session, the effect of personalization on product search in practice, however, remains mostly unknown. In this paper, we formulate the problem of personalized product search and conduct large-scale experiments with search logs sampled from a commercial e-commerce search engine. Results from our preliminary analysis show that the potential of personalization depends on query characteristics, interactions between queries, and user purchase histories. Based on these observations, we propose a Zero Attention Model for product search that automatically determines when and how to personalize a user-query pair via a novel attention mechanism. Empirical results on commercial product search logs show that the proposed model not only significantly outperforms state-of-the-art personalized product retrieval models, but also provides important information on the potential of personalization in each product search session.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Accelerator-level Parallelism
Authors:
Mark D. Hill,
Vijay Janapa Reddi
Abstract:
Future applications demand more performance, but technology advances have been faltering. A promising approach to further improve computer system performance under energy constraints is to employ hardware accelerators. Already today, mobile systems concurrently employ multiple accelerators in what we call accelerator-level parallelism (ALP). To spread the benefits of ALP more broadly, we charge co…
▽ More
Future applications demand more performance, but technology advances have been faltering. A promising approach to further improve computer system performance under energy constraints is to employ hardware accelerators. Already today, mobile systems concurrently employ multiple accelerators in what we call accelerator-level parallelism (ALP). To spread the benefits of ALP more broadly, we charge computer scientists to develop the science needed to best achieve the performance and cost goals of ALP hardware and software.
△ Less
Submitted 24 November, 2021; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Don't Persist All : Efficient Persistent Data Structures
Authors:
Pratyush Mahapatra,
Mark D. Hill,
Michael M. Swift
Abstract:
Data structures used in software development have inbuilt redundancy to improve software reliability and to speed up performance. Examples include a Doubly Linked List which allows a faster deletion due to the presence of the previous pointer. With the introduction of Persistent Memory, storing the redundant data fields into persistent memory adds a significant write overhead, and reduces performa…
▽ More
Data structures used in software development have inbuilt redundancy to improve software reliability and to speed up performance. Examples include a Doubly Linked List which allows a faster deletion due to the presence of the previous pointer. With the introduction of Persistent Memory, storing the redundant data fields into persistent memory adds a significant write overhead, and reduces performance. In this work, we focus on three data structures - Doubly Linked List, B+Tree and Hashmap, and showcase alternate partly persistent implementations where we only store a limited set of data fields to persistent memory. After a crash/restart, we use the persistent data fields to recreate the data structures along with the redundant data fields. We compare our implementation with the base implementation and show that we achieve speedups around 5-20% for some data structures, and up to 165% for a flush-dominated data structure.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Three Other Models of Computer System Performance
Authors:
Mark D. Hill
Abstract:
This note argues for more use of simple models beyond Amdahl's Law: Bottleneck Analysis, Little's Law, and a M/M/1 Queue.
This note argues for more use of simple models beyond Amdahl's Law: Bottleneck Analysis, Little's Law, and a M/M/1 Queue.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Authors:
Daniel N Hill,
Houssam Nassif,
Yi Liu,
Anand Iyer,
S V N Vishwanathan
Abstract:
Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to displa…
▽ More
Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Adaptive, Personalized Diversity for Visual Discovery
Authors:
Choon Hui Teo,
Houssam Nassif,
Daniel Hill,
Sriram Srinavasan,
Mitchell Goodman,
Vijai Mohan,
SVN Vishwanathan
Abstract:
Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shop** experience. Here we explore extensions in the direction of adaptive personalization and item diversi…
▽ More
Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shop** experience. Here we explore extensions in the direction of adaptive personalization and item diversification within Stream, a new form of visual browsing and discovery by Amazon. Our system presents the user with a diverse set of interesting items while adapting to user interactions. Our solution consists of three components (1) a Bayesian regression model for scoring the relevance of items while leveraging uncertainty, (2) a submodular diversification framework that re-ranks the top scoring items based on category, and (3) personalized category preferences learned from the user's behavior. When tested on live traffic, our algorithms show a strong lift in click-through-rate and session duration.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
On the Spanning and Routing Ratio of Directed Theta-Four
Authors:
Prosenjit Bose,
Jean-Lou De Carufel,
Darryl Hill,
Michiel Smid
Abstract:
We present a routing algorithm for the directed $Θ_4$-graph, here denoted as the $\overrightarrow{Θ_4}}$-graph, that computes a path between any two vertices $s$ and $t$ having length at most $17$ times the Euclidean distance between $s$ and $t$. To compute this path, at each step, the algorithm only uses knowledge of the location of the current vertex, its (at most four) outgoing edges, the desti…
▽ More
We present a routing algorithm for the directed $Θ_4$-graph, here denoted as the $\overrightarrow{Θ_4}}$-graph, that computes a path between any two vertices $s$ and $t$ having length at most $17$ times the Euclidean distance between $s$ and $t$. To compute this path, at each step, the algorithm only uses knowledge of the location of the current vertex, its (at most four) outgoing edges, the destination vertex, and one additional bit of information in order to determine the next edge to follow. This provides the first known online, local, competitive routing algorithm with constant routing ratio for the $Θ_4$-graph, as well as improving the best known upper bound on the spanning ratio of these graphs from $237$ to $17$. We also show that without this additional bit of information, the routing ratio increases to $\sqrt{290} \approx 17.03$.
△ Less
Submitted 12 July, 2021; v1 submitted 3 August, 2018;
originally announced August 2018.
-
A Unified Framework for Wide Area Measurement System Planning
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns i…
▽ More
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Delay Aware Intelligent Transient Stability Assessment System
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focu…
▽ More
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Deep Learning Based Cryptographic Primitive Classification
Authors:
Gregory D. Hill,
Xavier J. A. Bellekens
Abstract:
Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentrici…
▽ More
Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a DCNN, is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of contemporary data compendiums, hence feeding the model cognition, a methodology for the procedural generation of synthetic cryptographic binaries is defined, utilising core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesis combinable variants which are automatically fed in its core model. Converging at 91% accuracy, CryptoKnight is successfully able to classify the sample algorithms with minimal loss.
△ Less
Submitted 25 September, 2017;
originally announced September 2017.
-
Advanced Cyberinfrastructure for Science, Engineering, and Public Policy
Authors:
Vasant G. Honavar,
Katherine Yelick,
Klara Nahrstedt,
Holly Rushmeier,
Jennifer Rexford,
Mark D. Hill,
Elizabeth Bradley,
Elizabeth Mynatt
Abstract:
Progress in many domains increasingly benefits from our ability to view the systems through a computational lens, i.e., using computational abstractions of the domains; and our ability to acquire, share, integrate, and analyze disparate types of data. These advances would not be possible without the advanced data and computational cyberinfrastructure and tools for data capture, integration, analys…
▽ More
Progress in many domains increasingly benefits from our ability to view the systems through a computational lens, i.e., using computational abstractions of the domains; and our ability to acquire, share, integrate, and analyze disparate types of data. These advances would not be possible without the advanced data and computational cyberinfrastructure and tools for data capture, integration, analysis, modeling, and simulation. However, despite, and perhaps because of, advances in "big data" technologies for data acquisition, management and analytics, the other largely manual, and labor-intensive aspects of the decision making process, e.g., formulating questions, designing studies, organizing, curating, connecting, correlating and integrating crossdomain data, drawing inferences and interpreting results, have become the rate-limiting steps to progress. Advancing the capability and capacity for evidence-based improvements in science, engineering, and public policy requires support for (1) computational abstractions of the relevant domains coupled with computational methods and tools for their analysis, synthesis, simulation, visualization, sharing, and integration; (2) cognitive tools that leverage and extend the reach of human intellect, and partner with humans on all aspects of the activity; (3) nimble and trustworthy data cyber-infrastructures that connect, manage a variety of instruments, multiple interrelated data types and associated metadata, data representations, processes, protocols and workflows; and enforce applicable security and data access and use policies; and (4) organizational and social structures and processes for collaborative and coordinated activity across disciplinary and institutional boundaries.
△ Less
Submitted 30 June, 2017;
originally announced July 2017.
-
Challenges to Kee** the Computer Industry Centered in the US
Authors:
Thomas M. Conte,
Erik P. Debenedictis,
R. Stanley Williams,
Mark D. Hill
Abstract:
It is undeniable that the worldwide computer industry's center is the US, specifically in Silicon Valley. Much of the reason for the success of Silicon Valley had to do with Moore's Law: the observation by Intel co-founder Gordon Moore that the number of transistors on a microchip doubled at a rate of approximately every two years. According to the International Technology Roadmap for Semiconducto…
▽ More
It is undeniable that the worldwide computer industry's center is the US, specifically in Silicon Valley. Much of the reason for the success of Silicon Valley had to do with Moore's Law: the observation by Intel co-founder Gordon Moore that the number of transistors on a microchip doubled at a rate of approximately every two years. According to the International Technology Roadmap for Semiconductors, Moore's Law will end in 2021. How can we rethink computing technology to restart the historic explosive performance growth? Since 2012, the IEEE Rebooting Computing Initiative (IEEE RCI) has been working with industry and the US government to find new computing approaches to answer this question. In parallel, the CCC has held a number of workshops addressing similar questions. This whitepaper summarizes some of the IEEE RCI and CCC findings. The challenge for the US is to lead this new era of computing. Our international competitors are not sitting still: China has invested significantly in a variety of approaches such as neuromorphic computing, chip fabrication facilities, computer architecture, and high-performance simulation and data analytics computing, for example. We must act now, otherwise, the center of the computer industry will move from Silicon Valley and likely move off shore entirely.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Democratizing Design for Future Computing Platforms
Authors:
Luis Ceze,
Mark D. Hill,
Karthikeyan Sankaralingam,
Thomas F. Wenisch
Abstract:
Information and communications technology can continue to change our world. These advances will partially depend upon designs that synergistically combine software with specialized hardware. Today open-source software incubates rapid software-only innovation. The government can unleash software-hardware innovation with programs to develop open hardware components, tools, and design flows that simp…
▽ More
Information and communications technology can continue to change our world. These advances will partially depend upon designs that synergistically combine software with specialized hardware. Today open-source software incubates rapid software-only innovation. The government can unleash software-hardware innovation with programs to develop open hardware components, tools, and design flows that simplify and reduce the cost of hardware design. Such programs will speed development for startup companies, established industry leaders, education, scientific research, and for government intelligence and defense platforms.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Arch2030: A Vision of Computer Architecture Research over the Next 15 Years
Authors:
Luis Ceze,
Mark D. Hill,
Thomas F. Wenisch
Abstract:
Application trends, device technologies and the architecture of systems drive progress in information technologies. However, the former engines of such progress - Moore's Law and Dennard Scaling - are rapidly reaching the point of diminishing returns. The time has come for the computing community to boldly confront a new challenge: how to secure a foundational future for information technology's c…
▽ More
Application trends, device technologies and the architecture of systems drive progress in information technologies. However, the former engines of such progress - Moore's Law and Dennard Scaling - are rapidly reaching the point of diminishing returns. The time has come for the computing community to boldly confront a new challenge: how to secure a foundational future for information technology's continued progress. The computer architecture community engaged in several visioning exercises over the years. Five years ago, we released a white paper, 21st Century Computer Architecture, which influenced funding programs in both academia and industry. More recently, the IEEE Rebooting Computing Initiative explored the future of computing systems in the architecture, device, and circuit domains. This report stems from an effort to continue this dialogue, reach out to the applications and devices/circuits communities, and understand their trends and vision. We aim to identify opportunities where architecture research can bridge the gap between the application and device domains.
△ Less
Submitted 9 December, 2016;
originally announced December 2016.
-
21st Century Computer Architecture
Authors:
Mark D. Hill,
Sarita Adve,
Luis Ceze,
Mary Jane Irwin,
David Kaeli,
Margaret Martonosi,
Josep Torrellas,
Thomas F. Wenisch,
David Wood,
Katherine Yelick
Abstract:
Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to devel…
▽ More
Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to develop by enabling higher-level programming abstractions (e.g., scripting languages and reusable components). Improvements in computer system cost-effectiveness enabled value creation that could never have been imagined by the field's founders (e.g., distributed web search sufficiently inexpensive so as to be covered by advertising links).
The wide benefits of computer performance growth are clear. Recently, Danowitz et al. apportioned computer performance growth roughly equally between technology and architecture, with architecture credited with ~80x improvement since 1985. As semiconductor technology approaches its "end-of-the-road" (see below), computer architecture will need to play an increasing role in enabling future ICT innovation. But instead of asking, "How can I make my chip run faster?," architects must now ask, "How can I enable the 21st century infrastructure, from sensors to clouds, adding value from performance to privacy, but without the benefit of near-perfect technology scaling?". The challenges are many, but with appropriate investment, opportunities abound. Underlying these opportunities is a common theme that future architecture innovations will require the engagement of and investments from innovators in other ICT layers.
△ Less
Submitted 21 September, 2016;
originally announced September 2016.
-
When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads
Authors:
Jason Lowe-Power,
Mark D. Hill,
David A. Wood
Abstract:
Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data--the memory bandwidth--…
▽ More
Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data--the memory bandwidth--has not grown at the same rate. In fact, some of these big-memory systems can access less than 10% of their main memory capacity in one second (billions of processor cycles).
3D die-stacking is one promising solution to this bandwidth problem, and industry is investing significantly in 3D die-stacking. We use a simple back-of-the-envelope-style model to characterize if and when the 3D die-stacked architecture is more cost-effective than current architectures for in-memory big data workloads. We find that die-stacking has much higher performance than current systems (up to 256x lower response times), and it does not require expensive memory over provisioning to meet real-time (10 ms) response time service-level agreements. However, the power requirements of the die-stacked systems are significantly higher (up to 50x) than current systems, and its memory capacity is lower in many cases. Even in this limited case study, we find 3D die-stacking is not a panacea. Today, die-stacking is the most cost-effective solution for strict SLAs and by reducing the power of the compute chip and increasing memory densities die-stacking can be cost-effective under other constraints in the future.
△ Less
Submitted 26 August, 2016;
originally announced August 2016.
-
Accelerating Science: A Computing Research Agenda
Authors:
Vasant G. Honavar,
Mark D. Hill,
Katherine Yelick
Abstract:
The emergence of "big data" offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability t…
▽ More
The emergence of "big data" offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability to acquire, share, integrate and analyze disparate types of data. However, there is a huge gap between our ability to acquire, store, and process data and our ability to make effective use of the data to advance discovery. Despite successful automation of routine aspects of data management and analytics, most elements of the scientific process currently require considerable human expertise and effort. Accelerating science to keep pace with the rate of data acquisition and data processing calls for the development of algorithmic or information processing abstractions, coupled with formal methods and tools for modeling and simulation of natural processes as well as major innovations in cognitive tools for scientists, i.e., computational tools that leverage and extend the reach of human intellect, and partner with humans on a broad range of tasks in scientific discovery (e.g., identifying, prioritizing formulating questions, designing, prioritizing and executing experiments designed to answer a chosen question, drawing inferences and evaluating the results, and formulating new questions, in a closed-loop fashion). This calls for concerted research agenda aimed at: Development, analysis, integration, sharing, and simulation of algorithmic or information processing abstractions of natural processes, coupled with formal methods and tools for their analyses and simulation; Innovations in cognitive tools that augment and extend human intellect and partner with humans in all aspects of science.
△ Less
Submitted 6 April, 2016;
originally announced April 2016.
-
Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps
Authors:
Ayush Dubey,
Greg D. Hill,
Robert Escriva,
Emin Gün Sirer
Abstract:
Graph databases have become an increasingly common infrastructure component. Yet existing systems either operate on offline snapshots, provide weak consistency guarantees, or use expensive concurrency control techniques that limit performance. In this paper, we introduce a new distributed graph database, called Weaver, which enables efficient, transactional graph analyses as well as strictly seria…
▽ More
Graph databases have become an increasingly common infrastructure component. Yet existing systems either operate on offline snapshots, provide weak consistency guarantees, or use expensive concurrency control techniques that limit performance. In this paper, we introduce a new distributed graph database, called Weaver, which enables efficient, transactional graph analyses as well as strictly serializable ACID transactions on dynamic graphs. The key insight that allows Weaver to combine strict serializability with horizontal scalability and high performance is a novel request ordering mechanism called refinable timestamps. This technique couples coarse-grained vector timestamps with a fine-grained timeline oracle to pay the overhead of strong consistency only when needed. Experiments show that Weaver enables a Bitcoin blockchain explorer that is 8x faster than Blockchain.info, and achieves 12x higher throughput than the Titan graph database on social network workloads and 4x lower latency than GraphLab on offline graph traversal workloads.
△ Less
Submitted 19 June, 2016; v1 submitted 28 September, 2015;
originally announced September 2015.
-
On the Stretch Factor of Convex Polyhedra whose Vertices are (Almost) on a Sphere
Authors:
Prosenjit Bose,
Paz Carmi,
Mirela Damian,
Jean-Lou De Carufel,
Darryl Hill,
Anil Maheshwari,
Yuyang Liu,
Michiel Smid
Abstract:
Let $P$ be a convex polyhedron in $\mathbb{R}^3$. The skeleton of $P$ is the graph whose vertices and edges are the vertices and edges of $P$, respectively. We prove that, if these vertices are on the unit-sphere, the skeleton is a $(0.999 \cdot π)$-spanner. If the vertices are very close to this sphere, then the skeleton is not necessarily a spanner. For the case when the boundary of $P$ is betwe…
▽ More
Let $P$ be a convex polyhedron in $\mathbb{R}^3$. The skeleton of $P$ is the graph whose vertices and edges are the vertices and edges of $P$, respectively. We prove that, if these vertices are on the unit-sphere, the skeleton is a $(0.999 \cdot π)$-spanner. If the vertices are very close to this sphere, then the skeleton is not necessarily a spanner. For the case when the boundary of $P$ is between two concentric spheres of radii $1$ and $R>1$, and the angles in all faces are at least $θ$, we prove that the skeleton is a $t$-spanner, where $t$ depends only on $R$ and $θ$. One of the ingredients in the proof is a tight upper bound on the geometric dilation of a convex cycle that is contained in an annulus.
△ Less
Submitted 2 September, 2016; v1 submitted 24 July, 2015;
originally announced July 2015.
-
Improved Spanning Ratio for Low Degree Plane Spanners
Authors:
Prosenjit Bose,
Darryl Hill,
Michiel Smid
Abstract:
We describe an algorithm that builds a plane spanner with a maximum degree of 8 and a spanning ratio of approximately 4.414 with respect to the complete graph. This is the best currently known spanning ratio for a plane spanner with a maximum degree of less than 14.
We describe an algorithm that builds a plane spanner with a maximum degree of 8 and a spanning ratio of approximately 4.414 with respect to the complete graph. This is the best currently known spanning ratio for a plane spanner with a maximum degree of less than 14.
△ Less
Submitted 2 July, 2015; v1 submitted 30 June, 2015;
originally announced June 2015.
-
Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors
Authors:
Jonathan Passerat-Palmbach,
Claude Mazel,
Antoine Mahul,
David Hill
Abstract:
Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Conse-quently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to the recent Mersenne Twister for Grap…
▽ More
Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Conse-quently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to the recent Mersenne Twister for Graphics Processors (MTGP) that has just been released. Our work provides empirically checked statuses designed to initialize a particular configuration of this generator, in order to prevent any potential bias introduced by the parallelization of the PRNG.
△ Less
Submitted 30 January, 2015;
originally announced January 2015.
-
Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU
Authors:
Jonathan Passerat-Palmbach,
Jonathan Caux,
Pridi Siregar,
Claude Mazel,
David Hill
Abstract:
Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer such as a symmetric mul-tiprocessing machine (w…
▽ More
Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer such as a symmetric mul-tiprocessing machine (with many cores), a computing cluster or a computing grid. In this paper, we propose Warp-Level Parallelism (WLP), a GP-GPU-enabled solution to compute MRIP on GP-GPUs (General-Purpose Graphics Processing Units). These devices display a great amount of parallel computational power at low cost, but are tuned to process efficiently the same operation on several data, through different threads. Indeed, this paradigm is called Single Instruction, Multiple Threads (SIMT). Our approach proposes to rely on small threads groups, called warps, to perform independent computations such as replications. We have benchmarked WLP with three different models: it allows MRIP to be computed up to six times faster than with the SIMT computing paradigm.
△ Less
Submitted 7 January, 2015;
originally announced January 2015.
-
How to Correctly Deal With Pseudorandom Numbers in Manycore Environments - Application to GPU programming with Shoverand
Authors:
Jonathan Passerat-Palmbach,
David Hill
Abstract:
Stochastic simulations are often sensitive to the source of randomness that character-izes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computa-tion time by relying more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. S…
▽ More
Stochastic simulations are often sensitive to the source of randomness that character-izes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computa-tion time by relying more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such devices bring new parallelization possibilities, but they also introduce new programming difficulties. Since RNGs are at the base of any stochastic simulation, they also need to be ported to GP-GPU. There is still a lack of well-designed implementations of quality-proven RNGs on GP-GPU platforms. In this paper, we introduce ShoveRand, a frame-work defining common rules to generate random numbers uniformly on GP-GPU. Our framework is designed to cope with any GPU-enabled development platform and to expose a straightfor-ward interface to users. We also provide an existing RNG implementation with this framework to demonstrate its efficiency in both development and ease of use.
△ Less
Submitted 29 December, 2014;
originally announced December 2014.
-
Lifted Inference for Relational Continuous Models
Authors:
Jaesik Choi,
Eyal Amir,
David J. Hill
Abstract:
Relational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real wo…
▽ More
Relational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real world applications. The algorithm applies to Relational Pairwise Models which are (relational) products of potentials of arity 2. Our algorithm is unique in two ways. First, it substantially improves the efficiency of lifted inference with variables of continuous domains. When a relational model has Gaussian potentials, it takes only linear-time compared to cubic time of previous methods. Second, it is the first exact inference algorithm which handles RCMs in a lifted way. The algorithm is illustrated over an example from econometrics. Experimental results show that our algorithm outperforms both a groundlevel inference algorithm and an algorithm built with previously-known lifted methods.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
A competitive game whose maximal Nash-equilibrium payoff requires quantum resources for its achievement
Authors:
Charles D. Hill,
Adrian P. Flitney,
Nicolas C. Menicucci
Abstract:
While it is known that shared quantum entanglement can offer improved solutions to a number of purely cooperative tasks for groups of remote agents, controversy remains regarding the legitimacy of quantum games in a competitive setting--in particular, whether they offer any advantage beyond what is achievable using classical resources. We construct a competitive game between four players based o…
▽ More
While it is known that shared quantum entanglement can offer improved solutions to a number of purely cooperative tasks for groups of remote agents, controversy remains regarding the legitimacy of quantum games in a competitive setting--in particular, whether they offer any advantage beyond what is achievable using classical resources. We construct a competitive game between four players based on the minority game where the maximal Nash-equilibrium payoff when played with the appropriate quantum resource is greater than that obtainable by classical means, assuming a local hidden variable model. The game is constructed in a manner analogous to a Bell inequality. This result is important in confirming the legitimacy of quantum games.
△ Less
Submitted 13 September, 2009; v1 submitted 31 August, 2009;
originally announced August 2009.