Search | arXiv e-print repository

Masked Matrix Multiplication for Emergent Sparsity

Authors: Brian Wheatman, Meghana Madhyastha, Randal Burns

Abstract: Artificial intelligence workloads, especially transformer models, exhibit emergent sparsity in which computations perform selective sparse access to dense data. The workloads are inefficient on hardware designed for dense computations and do not map well onto sparse data representations. We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computati… ▽ More Artificial intelligence workloads, especially transformer models, exhibit emergent sparsity in which computations perform selective sparse access to dense data. The workloads are inefficient on hardware designed for dense computations and do not map well onto sparse data representations. We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computations and avoids branches based on a runtime evaluation of sparsity. We use a combination of dynamic code lookup to adapt to the specific sparsity encoded in the B matrix and preprocessing of sparsity maps of the A and B matrices to compute conditional branches once for the whole computation. For a wide range of sparsity, from 60% to 95% zeros, our implementation performs fewer instructions and increases performance when compared with Intel MKL's dense or sparse matrix multiply routines. Benefits can be as large as 2 times speedup and 4 times fewer instructions. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.04403 [pdf, other]

Edge-Parallel Graph Encoder Embedding

Authors: Ariel Lubonja, Cencheng Shen, Carey Priebe, Randal Burns

Abstract: New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted langu… ▽ More New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted language. We refactor GEE into a parallel program in the Ligra graph engine that maps functions over the edges of the graph and uses lock-free atomic instrutions to prevent data races. On a graph with 1.8B edges, this results in a 500 times speedup over the original implementation and a 17 times speedup over a just-in-time compiled version. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 4 pages, 4 figures

arXiv:2309.12576 [pdf, other]

Understanding Patterns of Deep Learning ModelEvolution in Network Architecture Search

Authors: Robert Underwood, Meghana Madhastha, Randal Burns, Bogdan Nicolae

Abstract: Network Architecture Search and specifically Regularized Evolution is a common way to refine the structure of a deep learning model.However, little is known about how models empirically evolve over time which has design implications for designing caching policies, refining the search algorithm for particular applications, and other important use cases.In this work, we algorithmically analyze and q… ▽ More Network Architecture Search and specifically Regularized Evolution is a common way to refine the structure of a deep learning model.However, little is known about how models empirically evolve over time which has design implications for designing caching policies, refining the search algorithm for particular applications, and other important use cases.In this work, we algorithmically analyze and quantitatively characterize the patterns of model evolution for a set of models from the Candle project and the Nasbench-201 search space.We show how the evolution of the model structure is influenced by the regularized evolution algorithm. We describe how evolutionary patterns appear in distributed settings and opportunities for caching and improved scheduling. Lastly, we describe the conditions that affect when particular model architectures rise and fall in popularity based on their frequency of acting as a donor in a sliding window. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 11 pages, 4 figures

ACM Class: I.2.6; C.4

arXiv:2305.05055 [pdf, other]

doi 10.1145/3627535.3638492

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers

Authors: Brian Wheatman, Randal Burns, Aydın Buluç, Helen Xu

Abstract: This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were… ▽ More This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA achieves 3x faster batch-insert throughput and 4x faster range-query throughput compared with compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees. We further evaluate the CPMA compared with compressed PaC-trees and Aspen, a state-of-the-art system, on a real-world application of dynamic-graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts when compared with compressed PaC-trees. Furthermore, the CPMA is on average 1.3x faster on graph algorithms and 2x faster on batch inserts compared with Aspen. △ Less

Submitted 18 February, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2011.05383 [pdf, other]

PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment

Authors: Meghana Madhyastha, Kunal Lillaney, James Browne, Joshua Vogelstein, Randal Burns

Abstract: We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory. This arises whenever models are larger than memory, but also systematically when models are deployed on low-resource devices, such as in the Internet of Things, or run as Web micro-services where resources are allocated on demand. Our packed serialized tree… ▽ More We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory. This arises whenever models are larger than memory, but also systematically when models are deployed on low-resource devices, such as in the Internet of Things, or run as Web micro-services where resources are allocated on demand. Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from external memory algorithms. The layout interleaves correlated nodes across multiple trees, uses leaf cardinality to collocate the nodes on the most popular paths and is optimized for the I/O blocksize. The result is that each I/O yields a higher fraction of useful data, leading to a 2-6 times reduction in classification latency for interactive workloads. △ Less

Submitted 10 November, 2020; originally announced November 2020.

ACM Class: I.5.5

arXiv:2002.02017 [pdf, other]

Observations on Porting In-memory KV stores to Persistent Memory

Authors: Brian Choi, Parv Saxena, Ryan Huang, Randal Burns

Abstract: Systems that require high-throughput and fault tolerance, such as key-value stores and databases, are looking to persistent memory to combine the performance of in-memory systems with the data-consistent fault-tolerance of nonvolatile stores. Persistent memory devices provide fast bytea-ddressable access to non-volatile memory. We analyze the design space when integrating persistent memory into in… ▽ More Systems that require high-throughput and fault tolerance, such as key-value stores and databases, are looking to persistent memory to combine the performance of in-memory systems with the data-consistent fault-tolerance of nonvolatile stores. Persistent memory devices provide fast bytea-ddressable access to non-volatile memory. We analyze the design space when integrating persistent memory into in-memory key value stores and quantify performance tradeoffs between throughput, latency, and and recovery time. Previous works have explored many design choices, but did not quantify the tradeoffs. We implement persistent memory support in Redis and Memcached, adapting the data structures of each to work in two modes: (1) with all data in persistent memory and (2) a hybrid mode that uses persistent memory for key/value data and non-volatile memory for indexing and metadata. Our experience reveals three actionable design principles that hold in Redis and Memcached, despite their very different implementations. We conclude that the hybrid design increases throughput and decreases latency at a minor cost in recovery time and code complexity △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:1908.11780 [pdf, other]

Towards Marrying Files to Objects

Authors: Kunal Lillaney, Vasily Tarasov, David Pease, Randal Burns

Abstract: To deal with the constant growth of unstructured data, vendors have deployed scalable, resilient, and cost effective object-based storage systems built on RESTful web services. However, many applications rely on richer file-system APIs and semantics, and cannot benefit from object stores. This leads to storage sprawl, as object stores are deployed alongside file systems and data is accessed and ma… ▽ More To deal with the constant growth of unstructured data, vendors have deployed scalable, resilient, and cost effective object-based storage systems built on RESTful web services. However, many applications rely on richer file-system APIs and semantics, and cannot benefit from object stores. This leads to storage sprawl, as object stores are deployed alongside file systems and data is accessed and managed across both systems in an ad-hoc fashion. We believe there is a critical need for a transparent merger of objects and files, consolidating data into a single platform. Such a merger would extend the capabilities of both object and file stores while preserving existing semantics and interfaces. In this position paper, we examine the viability of unifying object stores and file systems, and the various design tradeoffs that exist. Then, using our own implementation of an object-based, POSIX-complete file system, we experimentally demonstrate several critical design considerations. △ Less

Submitted 21 August, 2019; originally announced August 2019.

arXiv:1907.03335 [pdf, other]

Graphyti: A Semi-External Memory Graph Library for FlashGraph

Authors: Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

Abstract: Graph datasets exceed the in-memory capacity of most standalone machines. Traditionally, graph frameworks have overcome memory limitations through scale-out, distributing computing. Emerging frameworks avoid the network bottleneck of distributed data with Semi-External Memory (SEM) that uses a single multicore node and operates on graphs larger than memory. In SEM, $\mathcal{O}(m)$ data resides on… ▽ More Graph datasets exceed the in-memory capacity of most standalone machines. Traditionally, graph frameworks have overcome memory limitations through scale-out, distributing computing. Emerging frameworks avoid the network bottleneck of distributed data with Semi-External Memory (SEM) that uses a single multicore node and operates on graphs larger than memory. In SEM, $\mathcal{O}(m)$ data resides on disk and $\mathcal{O}(n)$ data in memory, for a graph with $n$ vertices and $m$ edges. For developers, this adds complexity because they must explicitly encode I/O within applications. We present principles that are critical for application developers to adopt in order to achieve state-of-the-art performance, while minimizing I/O and memory for algorithms in SEM. We present them in Graphyti, an extensible parallel SEM graph library built on FlashGraph and available in Python via pip. In SEM, Graphyti achieves 80% of the performance of in-memory execution and retains the performance of FlashGraph, which outperforms distributed engines, such as PowerGraph and Galois. △ Less

Submitted 7 July, 2019; originally announced July 2019.

arXiv:1907.02844 [pdf, other]

Geodesic Learning via Unsupervised Decision Forests

Authors: Meghana Madhyastha, Percy Li, James Browne, Veronika Strnadova-Neeley, Carey E. Priebe, Randal Burns, Joshua T. Vogelstein

Abstract: Geodesic distance is the shortest path between two points in a Riemannian manifold. Manifold learning algorithms, such as Isomap, seek to learn a manifold that preserves geodesic distances. However, such methods operate on the ambient dimensionality, and are therefore fragile to noise dimensions. We developed an unsupervised random forest method (URerF) to approximately learn geodesic distances in… ▽ More Geodesic distance is the shortest path between two points in a Riemannian manifold. Manifold learning algorithms, such as Isomap, seek to learn a manifold that preserves geodesic distances. However, such methods operate on the ambient dimensionality, and are therefore fragile to noise dimensions. We developed an unsupervised random forest method (URerF) to approximately learn geodesic distances in linear and nonlinear manifolds with noise. URerF operates on low-dimensional sparse linear combinations of features, rather than the full observed dimensionality. To choose the optimal split in a computationally efficient fashion, we developed a fast Bayesian Information Criterion statistic for Gaussian mixture models. We introduce geodesic precision-recall curves which quantify performance relative to the true latent manifold. Empirical results on simulated and real data demonstrate that URerF is robust to high-dimensional noise, where as other methods, such as Isomap, UMAP, and FLANN, quickly deteriorate in such settings. In particular, URerF is able to estimate geodesic distances on a real connectome dataset better than other approaches. △ Less

Submitted 5 July, 2019; originally announced July 2019.

arXiv:1904.04174 [pdf, other]

doi 10.1145/3318170.3318183

Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN

Authors: Rod Burns, John Lawson, Duncan McBain, Daniel Soutar

Abstract: Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' effectiveness in the fields of image recognition and natural language processing stems primarily from the vast amounts of data available to companies a… ▽ More Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' effectiveness in the fields of image recognition and natural language processing stems primarily from the vast amounts of data available to companies and researchers, coupled with the huge amounts of compute power available in modern accelerators such as GPUs, FPGAs and ASICs. There are a number of approaches available to developers for utilizing GPGPU technologies such as SYCL, OpenCL and CUDA, however many applications require the same low level mathematical routines. Libraries dedicated to accelerating these common routines allow developers to easily make full use of the available hardware without requiring low level knowledge of the hardware themselves, however such libraries are often provided by hardware manufacturers for specific hardware such as cuDNN for Nvidia hardware or MIOpen for AMD hardware. SYCL-DNN is a new open-source library dedicated to providing accelerated routines for neural network operations which are hardware and vendor agnostic. Built on top of the SYCL open standard and written entirely in standard C++, SYCL-DNN allows a user to easily accelerate neural network code for a wide range of hardware using a modern C++ interface. The library is tested on AMD's OpenCL for GPU, Intel's OpenCL for CPU and GPU, ARM's OpenCL for Mali GPUs as well as ComputeAorta's OpenCL for R-Car CV engine and host CPU. In this talk we will present performance figures for SYCL-DNN on this range of hardware, and discuss how high performance was achieved on such a varied set of accelerators with such different hardware features. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: 4 pages, 3 figures. In International Workshop on OpenCL (IWOCL '19), May 13-15, 2019, Boston

arXiv:1902.09527 [pdf, other]

clusterNOR: A NUMA-Optimized Clustering Framework

Authors: Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

Abstract: Clustering algorithms are iterative and have complex data access patterns that result in many small random memory accesses. The performance of parallel implementations suffer from synchronous barriers for each iteration and skewed workloads. We rethink the parallelization of clustering for modern non-uniform memory architectures (NUMA) to maximizes independent, asynchronous computation. We elimina… ▽ More Clustering algorithms are iterative and have complex data access patterns that result in many small random memory accesses. The performance of parallel implementations suffer from synchronous barriers for each iteration and skewed workloads. We rethink the parallelization of clustering for modern non-uniform memory architectures (NUMA) to maximizes independent, asynchronous computation. We eliminate many barriers, reduce remote memory accesses, and maximize cache reuse. We implement the 'Clustering NUMA Optimized Routines' (clusterNOR) extensible parallel framework that provides algorithmic building blocks. The system is generic, we demonstrate nine modern clustering algorithms that have simple implementations. clusterNOR includes (i) in-memory, (ii) semi-external memory, and (iii) distributed memory execution, enabling computation for varying memory and hardware budgets. For algorithms that rely on Euclidean distance, clusterNOR defines an updated Elkan's triangle inequality pruning algorithm that uses asymptotically less memory so that it works on billion-point data sets. clusterNOR extends and expands the scope of the 'knor' library for k-means clustering by generalizing underlying principles, providing a uniform programming interface and expanding the scope to hierarchical and linear algebraic classes of algorithms. The compound effect of our optimizations is an order of magnitude improvement in speed over other state-of-the-art solutions, such as Spark's MLlib and Apple's Turi. △ Less

Submitted 17 January, 2021; v1 submitted 24 February, 2019; originally announced February 2019.

Comments: arXiv admin note: Journal version of arXiv:1606.08905

arXiv:1901.00885 [pdf]

An Interactive Robotic Framework to Facilitate Sensory Experiences for Children with ASD

Authors: Hifza Javed, Rachael Burns, Myounghoon Jeon, Ayanna M. Howard, Chung Hyuk Park

Abstract: The diagnosis of Autism Spectrum Disorder (ASD) in children is commonly accompanied by a diagnosis of sensory processing disorders as well. Abnormalities are usually reported in multiple sensory processing domains, showing a higher prevalence of unusual responses, particularly to tactile, auditory and visual stimuli. This paper discusses a novel robot-based framework designed to target sensory dif… ▽ More The diagnosis of Autism Spectrum Disorder (ASD) in children is commonly accompanied by a diagnosis of sensory processing disorders as well. Abnormalities are usually reported in multiple sensory processing domains, showing a higher prevalence of unusual responses, particularly to tactile, auditory and visual stimuli. This paper discusses a novel robot-based framework designed to target sensory difficulties faced by children with ASD in a controlled setting. The setup consists of a number of sensory stations, together with robotic agents that navigate the stations and interact with the stimuli as they are presented. These stimuli are designed to resemble real world scenarios that form a common part of one's everyday experiences. Given the strong interest of children with ASD in technology in general and robots in particular, we attempt to utilize our robotic platform to demonstrate socially acceptable responses to the stimuli in an interactive, pedagogical setting that encourages the child's social, motor and vocal skills, while providing a diverse sensory experience. A user study was conducted to evaluate the efficacy of the proposed framework, with a total of 18 participants (5 with ASD and 13 typically develo**) between the ages of 4 and 12 years. We describe our methods of data collection, coding of video data and the analysis of the results obtained from the study. We also discuss the limitations of the current work and detail our plans for the future work to improve the validity of the obtained results. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: 18 pages, 12 figures

arXiv:1806.07300 [pdf, other]

Forest Packing: Fast, Parallel Decision Forests

Authors: James Browne, Tyler M. Tomita, Disa Mhembere, Randal Burns, Joshua T. Vogelstein

Abstract: Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the travers… ▽ More Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the traversals of tree data structures incur many random memory accesses and are very slow. We present memory packing techniques that reorganize learned forests to minimize cache misses during classification. The resulting layout is hierarchical. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in cache lines. We extend this layout with out-of-order execution and cache-line prefetching to increase memory throughput. Together, these optimizations increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation. △ Less

Submitted 19 June, 2018; originally announced June 2018.

arXiv:1606.08905 [pdf, other]

knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library

Authors: Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

Abstract: k-means is one of the most influential and utilized machine learning algorithms. Its computation limits the performance and scalability of many statistical analysis and machine learning tasks. We rethink and optimize k-means in terms of modern NUMA architectures to develop a novel parallelization scheme that delays and minimizes synchronization barriers. The \textit{k-means NUMA Optimized Routine}… ▽ More k-means is one of the most influential and utilized machine learning algorithms. Its computation limits the performance and scalability of many statistical analysis and machine learning tasks. We rethink and optimize k-means in terms of modern NUMA architectures to develop a novel parallelization scheme that delays and minimizes synchronization barriers. The \textit{k-means NUMA Optimized Routine} (\textsf{knor}) library has (i) in-memory (\textsf{knori}), (ii) distributed memory (\textsf{knord}), and (iii) semi-external memory (\textsf{knors}) modules that radically improve the performance of k-means for varying memory and hardware budgets. \textsf{knori} boosts performance for single machine datasets by an order of magnitude or more. \textsf{knors} improves the scalability of k-means on a memory budget using SSDs. \textsf{knors} scales to billions of points on a single machine, using a fraction of the resources that distributed in-memory systems require. \textsf{knord} retains \textsf{knori}'s performance characteristics, while scaling in-memory through distributed computation in the cloud. \textsf{knor} modifies Elkan's triangle inequality pruning algorithm such that we utilize it on billion-point datasets without the significant memory overhead of the original algorithm. We demonstrate \textsf{knor} outperforms distributed commercial products like H$_2$O, Turi (formerly Dato, GraphLab) and Spark's MLlib by more than an order of magnitude for datasets of $10^7$ to $10^9$ points. △ Less

Submitted 24 June, 2017; v1 submitted 28 June, 2016; originally announced June 2016.

arXiv:1604.06414 [pdf, other]

FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs

Authors: Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns

Abstract: R is one of the most popular programming languages for statistics and machine learning, but the R framework is relatively slow and unable to scale to large datasets. The general approach for speeding up an implementation in R is to implement the algorithms in C or FORTRAN and provide an R wrapper. FlashR takes a different approach: it executes R code in parallel and scales the code beyond memory c… ▽ More R is one of the most popular programming languages for statistics and machine learning, but the R framework is relatively slow and unable to scale to large datasets. The general approach for speeding up an implementation in R is to implement the algorithms in C or FORTRAN and provide an R wrapper. FlashR takes a different approach: it executes R code in parallel and scales the code beyond memory capacity by utilizing solid-state drives (SSDs) automatically. It provides a small number of generalized operations (GenOps) upon which we reimplement a large number of matrix functions in the R base package. As such, FlashR parallelizes and scales existing R code with little/no modification. To reduce data movement between CPU and SSDs, FlashR evaluates matrix operations lazily, fuses operations at runtime, and uses cache-aware, two-level matrix partitioning. We evaluate FlashR on a variety of machine learning and statistics algorithms on inputs of up to four billion data points. FlashR out-of-core tracks closely the performance of FlashR in-memory. The R code for machine learning algorithms executed in FlashR outperforms the in-memory execution of H2O and Spark MLlib by a factor of 2-10 and outperforms Revolution R Open by more than an order of magnitude. △ Less

Submitted 18 May, 2017; v1 submitted 21 April, 2016; originally announced April 2016.

arXiv:1602.02864 [pdf, other]

doi 10.1109/TPDS.2016.2618791

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Authors: Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua Vogelstein, Carey E. Priebe, Randal Burns

Abstract: Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse matrix on commodity SSDs and dense matrices in memor… ▽ More Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse matrix on commodity SSDs and dense matrices in memory. Our SEM-SpMM incorporates many in-memory optimizations for large power-law graphs. It outperforms the in-memory implementations of Trilinos and Intel MKL and scales to billion-node graphs, far beyond the limitations of memory. Furthermore, on a single large parallel machine, our SEM-SpMM operates as fast as the distributed implementations of Trilinos using five times as much processing power. We also run our implementation in memory (IM-SpMM) to quantify the overhead of kee** data on SSDs. SEM-SpMM achieves almost 100% performance of IM-SpMM on graphs when the dense matrix has more than four columns; it achieves at least 65% performance of IM-SpMM on all inputs. We apply our SpMM to three important data analysis tasks--PageRank, eigensolving, and non-negative matrix factorization--and show that our SEM implementations significantly advance the state of the art. △ Less

Submitted 14 October, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

Comments: published in IEEE Transactions on Parallel and Distributed Systems

arXiv:1602.01421 [pdf, other]

An SSD-based eigensolver for spectral analysis on billion-node graphs

Authors: Da Zheng, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay

Abstract: Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi ei… ▽ More Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine. FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory. We store the entire vector subspace on SSDs and reduce I/O to improve performance through caching the most recent dense matrix. Our result shows that FlashEigen is able to achieve 40%-60% performance of its in-memory implementation and has performance comparable to the Anasazi eigensolvers on a machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph with 3.4 billion vertices and 129 billion edges. It takes about four hours to compute eight eigenvalues of the billion-node graph using 120 GB memory. △ Less

Submitted 26 February, 2016; v1 submitted 3 February, 2016; originally announced February 2016.

arXiv:1506.07566 [pdf, other]

Optimize Unsynchronized Garbage Collection in an SSD Array

Authors: Da Zheng, Randal Burns, Alexander S. Szalay

Abstract: Solid state disks (SSDs) have advanced to outperform traditional hard drives significantly in both random reads and writes. However, heavy random writes trigger fre- quent garbage collection and decrease the performance of SSDs. In an SSD array, garbage collection of individ- ual SSDs is not synchronized, leading to underutilization of some of the SSDs. We propose a software solution to tackle t… ▽ More Solid state disks (SSDs) have advanced to outperform traditional hard drives significantly in both random reads and writes. However, heavy random writes trigger fre- quent garbage collection and decrease the performance of SSDs. In an SSD array, garbage collection of individ- ual SSDs is not synchronized, leading to underutilization of some of the SSDs. We propose a software solution to tackle the unsyn- chronized garbage collection in an SSD array installed in a host bus adaptor (HBA), where individual SSDs are exposed to an operating system. We maintain a long I/O queue for each SSD and flush dirty pages intelligently to fill the long I/O queues so that we hide the performance imbalance among SSDs even when there are few parallel application writes. We further define a policy of select- ing dirty pages to flush and a policy of taking out stale flush requests to reduce the amount of data written to SSDs. We evaluate our solution in a real system. Experi- ments show that our solution fully utilizes all SSDs in an array under random write-heavy workloads. It improves I/O throughput by up to 62% under random workloads of mixed reads and writes when SSDs are under active garbage collection. It causes little extra data writeback and increases the cache hit rate. △ Less

Submitted 24 June, 2015; originally announced June 2015.

arXiv:1506.03410 [pdf, other]

Sparse Projection Oblique Randomer Forests

Authors: Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Abstract: Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortuna… ▽ More Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with >100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. SPORF can easily be incorporated into other ensemble methods such as boosting to obtain potentially similar gains. △ Less

Submitted 3 October, 2019; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: 31 pages; submitted to Journal of Machine Learning Research for review

MSC Class: 68T10 ACM Class: I.5.2

Journal ref: Journal of Machine Learning Research 21(104), 1-39, 2020

arXiv:1506.02079 [pdf, other]

Gradient-Domain Fusion for Color Correction in Large EM Image Stacks

Authors: Michael Kazhdan, Kunal Lillaney, William Roncal, Davi Bock, Joshua Vogelstein, Randal Burns

Abstract: We propose a new gradient-domain technique for processing registered EM image stacks to remove inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic smoothing along the slice axis and then solving a Poisson equation within each slice to re-introduce the detail. The final image stack is continuous across the slice ax… ▽ More We propose a new gradient-domain technique for processing registered EM image stacks to remove inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic smoothing along the slice axis and then solving a Poisson equation within each slice to re-introduce the detail. The final image stack is continuous across the slice axis and maintains sharp details within each slice. Adapting existing out-of-core techniques for solving the linear system, we describe a parallel algorithm with time complexity that is linear in the size of the data and space complexity that is sub-linear, allowing us to process datasets as large as five teravoxels with a 600 MB memory footprint. △ Less

Submitted 5 June, 2015; originally announced June 2015.

arXiv:1412.8576 [pdf, other]

Active Community Detection in Massive Graphs

Authors: Heng Wang, Da Zheng, Randal Burns, Carey Priebe

Abstract: A canonical problem in graph mining is the detection of dense communities. This problem is exacerbated for a graph with a large order and size -- the number of vertices and edges -- as many community detection algorithms scale poorly. In this work we propose a novel framework for detecting active communities that consist of the most active vertices in massive graphs. The framework is applicable to… ▽ More A canonical problem in graph mining is the detection of dense communities. This problem is exacerbated for a graph with a large order and size -- the number of vertices and edges -- as many community detection algorithms scale poorly. In this work we propose a novel framework for detecting active communities that consist of the most active vertices in massive graphs. The framework is applicable to graphs having billions of vertices and hundreds of billions of edges. Our framework utilizes a parallelizable trimming algorithm based on a locality statistic to filter out inactive vertices, and then clusters the remaining active vertices via spectral decomposition on their similarity matrix. We demonstrate the validity of our method with synthetic Stochastic Block Model graphs, using Adjusted Rand Index as the performance metric. We further demonstrate its practicality and efficiency on a most recent real-world Hyperlink Web graph consisting of over 3.5 billion vertices and 128 billion edges. △ Less

Submitted 13 February, 2015; v1 submitted 30 December, 2014; originally announced December 2014.

Comments: published in SDM-Networks 2015

arXiv:1411.6880 [pdf, other]

An Automated Images-to-Graphs Framework for High Resolution Connectomics

Authors: William Gray Roncal, Dean M. Kleissas, Joshua T. Vogelstein, Priya Manavalan, Kunal Lillaney, Michael Pekala, Randal Burns, R. Jacob Vogelstein, Carey E. Priebe, Mark A. Chevillet, Gregory D. Hager

Abstract: Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by m… ▽ More Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by manually tracing each neuronal process at this scale is unmanageable, and therefore researchers are develo** automated image processing modules. Thus far, state-of-the-art algorithms focus only on the solution to a particular task (e.g., neuron segmentation or synapse identification). In this manuscript we present the first fully automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction). To evaluate overall performance and select the best parameters and methods, we also develop a metric to assess the quality of the output graphs. We evaluate a set of algorithms and parameters, searching possible operating points to identify the best available brain graph for our assessment metric. Finally, we deploy a reference end-to-end version of the pipeline on a large, publicly available data set. This provides a baseline result and framework for community analysis and future algorithm development and testing. All code and data derivatives have been made publicly available toward eventually unlocking new biofidelic computational primitives and understanding of neuropathologies. △ Less

Submitted 30 April, 2015; v1 submitted 25 November, 2014; originally announced November 2014.

Comments: 13 pages, first two authors contributed equally V2: Added additional experiments and clarifications; added information on infrastructure and pipeline environment

arXiv:1408.0500 [pdf, other]

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Authors: Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay

Abstract: Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance los… ▽ More Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlap** computation with I/O. To save I/O bandwidth, FlashGraph only accesses edge lists requested by applications from SSDs; to increase I/O throughput and reduce CPU overhead for I/O, it conservatively merges I/O requests. These designs maximize performance for applications with different I/O characteristics. FlashGraph exposes a general and flexible vertex-centric programming interface that can express a wide variety of graph algorithms and their optimizations. We demonstrate that FlashGraph in semi-external memory performs many algorithms with performance up to 80% of its in-memory implementation and significantly outperforms PowerGraph, a popular distributed in-memory graph engine. △ Less

Submitted 25 January, 2015; v1 submitted 3 August, 2014; originally announced August 2014.

Comments: published in FAST'15

arXiv:1405.1965 [pdf]

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes using High-Resolution Neural EM Data

Authors: Ayushi Sinha, William Gray Roncal, Narayanan Kasthuri, Jeff W. Lichtman, Randal Burns, Michael Kazhdan

Abstract: Accurately estimating the wiring diagram of a brain, known as a connectome, at an ultrastructure level is an open research problem. Specifically, precisely tracking neural processes is difficult, especially across many image slices. Here, we propose a novel method to automatically identify and annotate small subcellular structures present in axons, known as axoplasmic reticula, through a 3D volume… ▽ More Accurately estimating the wiring diagram of a brain, known as a connectome, at an ultrastructure level is an open research problem. Specifically, precisely tracking neural processes is difficult, especially across many image slices. Here, we propose a novel method to automatically identify and annotate small subcellular structures present in axons, known as axoplasmic reticula, through a 3D volume of high-resolution neural electron microscopy data. Our method produces high precision annotations, which can help improve automatic segmentation by using our results as seeds for segmentation, and as cues to aid segment merging. △ Less

Submitted 16 April, 2014; originally announced May 2014.

Comments: 2 pages, 1 figure; The 3rd Annual Hopkins Imaging Conference, The Johns Hopkins University, Baltimore, MD

arXiv:1404.4800 [pdf, other]

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

Authors: Ayushi Sinha, William Gray Roncal, Narayanan Kasthuri, Ming Chuang, Priya Manavalan, Dean M. Kleissas, Joshua T. Vogelstein, R. Jacob Vogelstein, Randal Burns, Jeff W. Lichtman, Michael Kazhdan

Abstract: In this paper, we present a new pipeline which automatically identifies and annotates axoplasmic reticula, which are small subcellular structures present only in axons. We run our algorithm on the Kasthuri11 dataset, which was color corrected using gradient-domain techniques to adjust contrast. We use a bilateral filter to smooth out the noise in this data while preserving edges, which highlights… ▽ More In this paper, we present a new pipeline which automatically identifies and annotates axoplasmic reticula, which are small subcellular structures present only in axons. We run our algorithm on the Kasthuri11 dataset, which was color corrected using gradient-domain techniques to adjust contrast. We use a bilateral filter to smooth out the noise in this data while preserving edges, which highlights axoplasmic reticula. These axoplasmic reticula are then annotated using a morphological region growing algorithm. Additionally, we perform Laplacian sharpening on the bilaterally filtered data to enhance edges, and repeat the morphological region growing algorithm to annotate more axoplasmic reticula. We track our annotations through the slices to improve precision, and to create long objects to aid in segment merging. This method annotates axoplasmic reticula with high precision. Our algorithm can easily be adapted to annotate axoplasmic reticula in different sets of brain data by changing a few thresholds. The contribution of this work is the introduction of a straightforward and robust pipeline which annotates axoplasmic reticula with high precision, contributing towards advancements in automatic feature annotations in neural EM data. △ Less

Submitted 16 April, 2014; originally announced April 2014.

Comments: 2 pages, 1 figure

arXiv:1403.3724 [pdf, other]

VESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer vision at Large Scale

Authors: William Gray Roncal, Michael Pekala, Verena Kaynig-Fittkau, Dean M. Kleissas, Joshua T. Vogelstein, Hanspeter Pfister, Randal Burns, R. Jacob Vogelstein, Mark A. Chevillet, Gregory D. Hager

Abstract: An open challenge problem at the forefront of modern neuroscience is to obtain a comprehensive map** of the neural pathways that underlie human brain function; an enhanced understanding of the wiring diagram of the brain promises to lead to new breakthroughs in diagnosing and treating neurological disorders. Inferring brain structure from image data, such as that obtained via electron microscopy… ▽ More An open challenge problem at the forefront of modern neuroscience is to obtain a comprehensive map** of the neural pathways that underlie human brain function; an enhanced understanding of the wiring diagram of the brain promises to lead to new breakthroughs in diagnosing and treating neurological disorders. Inferring brain structure from image data, such as that obtained via electron microscopy (EM), entails solving the problem of identifying biological structures in large data volumes. Synapses, which are a key communication structure in the brain, are particularly difficult to detect due to their small size and limited contrast. Prior work in automated synapse detection has relied upon time-intensive biological preparations (post-staining, isotropic slice thicknesses) in order to simplify the problem. This paper presents VESICLE, the first known approach designed for mammalian synapse detection in anisotropic, non-post-stained data. Our methods explicitly leverage biological context, and the results exceed existing synapse detection methods in terms of accuracy and scalability. We provide two different approaches - one a deep learning classifier (VESICLE-CNN) and one a lightweight Random Forest approach (VESICLE-RF) to offer alternatives in the performance-scalability space. Addressing this synapse detection challenge enables the analysis of high-throughput imaging data soon expected to reach petabytes of data, and provide tools for more rapid estimation of brain-graphs. Finally, to facilitate community efforts, we developed tools for large-scale object detection, and demonstrated this framework to find $\approx$ 50,000 synapses in 60,000 $μm ^3$ (220 GB on disk) of electron microscopy data. △ Less

Submitted 7 September, 2015; v1 submitted 14 March, 2014; originally announced March 2014.

Comments: v4: added clarifying figures and updates for readability. v3: fixed metadata. 11 pp v2: Added CNN classifier, significant changes to improve performance and generalization

Journal ref: Proceedings of the British Machine Vision Conference (BMVC), pages 81.1-81.13. BMVA Press, September 2015

arXiv:1312.4875 [pdf, other]

doi 10.1109/GlobalSIP.2013.6736878

MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics

Authors: William Gray Roncal, Zachary H. Koterba, Disa Mhembere, Dean M. Kleissas, Joshua T. Vogelstein, Randal Burns, Anita R. Bowles, Dimitrios K. Donavos, Sephira Ryman, Rex E. Jung, Lei Wu, Vince Calhoun, R. Jacob Vogelstein

Abstract: Currently, connectomes (e.g., functional or structural brain graphs) can be estimated in humans at $\approx 1~mm^3$ scale using a combination of diffusion weighted magnetic resonance imaging, functional magnetic resonance imaging and structural magnetic resonance imaging scans. This manuscript summarizes a novel, scalable implementation of open-source algorithms to rapidly estimate magnetic resona… ▽ More Currently, connectomes (e.g., functional or structural brain graphs) can be estimated in humans at $\approx 1~mm^3$ scale using a combination of diffusion weighted magnetic resonance imaging, functional magnetic resonance imaging and structural magnetic resonance imaging scans. This manuscript summarizes a novel, scalable implementation of open-source algorithms to rapidly estimate magnetic resonance connectomes, using both anatomical regions of interest (ROIs) and voxel-size vertices. To assess the reliability of our pipeline, we develop a novel nonparametric non-Euclidean reliability metric. Here we provide an overview of the methods used, demonstrate our implementation, and discuss available user extensions. We conclude with results showing the efficacy and reliability of the pipeline over previous state-of-the-art. △ Less

Submitted 17 December, 2013; originally announced December 2013.

Comments: Published as part of 2013 IEEE GlobalSIP conference

arXiv:1310.0041 [pdf, other]

Gradient-Domain Processing for Large EM Image Stacks

Authors: Michael Kazhdan, Randal Burns, Bobby Kasthuri, Jeff Lichtman, Jacob Vogelstein, Joshua Vogelstein

Abstract: We propose a new gradient-domain technique for processing registered EM image stacks to remove the inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic diffusion to smooth the data along the slice axis and then solving a screened-Poisson equation within each slice to re-introduce the detail. The final image stack i… ▽ More We propose a new gradient-domain technique for processing registered EM image stacks to remove the inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic diffusion to smooth the data along the slice axis and then solving a screened-Poisson equation within each slice to re-introduce the detail. The final image stack is both continuous across the slice axis (facilitating the tracking of information between slices) and maintains sharp details within each slice (supporting automatic feature detection). To support this editing, we describe the implementation of the first multigrid solver designed for efficient gradient domain processing of large, out-of-core, voxel grids. △ Less

Submitted 30 September, 2013; originally announced October 2013.

arXiv:1306.3543 [pdf, other]

The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

Authors: Randal Burns, William Gray Roncal, Dean Kleissas, Kunal Lillaney, Priya Manavalan, Eric Perlman, Daniel R. Berger, Davi D. Bock, Kwanghun Chung, Logan Grosenick, Narayanan Kasthuri, Nicholas C. Weiler, Karl Deisseroth, Michael Kazhdan, Jeff Lichtman, R. Clay Reid, Stephen J. Smith, Alexander S. Szalay, Joshua T. Vogelstein, R. Jacob Vogelstein

Abstract: We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on hi… ▽ More We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at http://openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization. △ Less

Submitted 18 June, 2013; v1 submitted 14 June, 2013; originally announced June 2013.

Comments: 11 pages, 13 figures

arXiv:1107.1821 [pdf, other]

Where Have You Been? Secure Location Provenance for Mobile Devices

Authors: Ragib Hasan, Randal Burns

Abstract: With the advent of mobile computing, location-based services have recently gained popularity. Many applications use the location provenance of users, i.e., the chronological history of the users' location for purposes ranging from access control, authentication, information sharing, and evaluation of policies. However, location provenance is subject to tampering and collusion attacks by malicious… ▽ More With the advent of mobile computing, location-based services have recently gained popularity. Many applications use the location provenance of users, i.e., the chronological history of the users' location for purposes ranging from access control, authentication, information sharing, and evaluation of policies. However, location provenance is subject to tampering and collusion attacks by malicious users. In this paper, we examine the secure location provenance problem. We introduce a witness-endorsed scheme for generating collusion-resistant location proofs. We also describe two efficient and privacy-preserving schemes for protecting the integrity of the chronological order of location proofs. These schemes, based on hash chains and Bloom filters respectively, allow users to prove the order of any arbitrary subsequence of their location history to auditors. Finally, we present experimental results from our proof-of-concept implementation on the Android platform and show that our schemes are practical in today's mobile devices. △ Less

Submitted 9 July, 2011; originally announced July 2011.

Comments: 14 pages

arXiv:1106.6062 [pdf, other]

The Life and Death of Unwanted Bits: Towards Proactive Waste Data Management in Digital Ecosystems

Authors: Ragib Hasan, Randal Burns

Abstract: Our everyday data processing activities create massive amounts of data. Like physical waste and trash, unwanted and unused data also pollutes the digital environment by degrading the performance and capacity of storage systems and requiring costly disposal. In this paper, we propose using the lessons from real life waste management in handling waste data. We show the impact of waste data on the pe… ▽ More Our everyday data processing activities create massive amounts of data. Like physical waste and trash, unwanted and unused data also pollutes the digital environment by degrading the performance and capacity of storage systems and requiring costly disposal. In this paper, we propose using the lessons from real life waste management in handling waste data. We show the impact of waste data on the performance and operational costs of our computing systems. To allow better waste data management, we define a waste hierarchy for digital objects and provide insights into how to identify and categorize waste data. Finally, we introduce novel ways of reusing, reducing, and recycling data and software to minimize the impact of data wastage △ Less

Submitted 1 July, 2011; v1 submitted 29 June, 2011; originally announced June 2011.

Comments: Fixed references

arXiv:0909.1760 [pdf]

LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases

Authors: Xiaodan Wang, Randal Burns, Tanu Malik

Abstract: Workloads that comb through vast amounts of data are gaining importance in the sciences. These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance. To maximize throughput for data-intensive queries, we put forth LifeRaft: a query processing system that batches queries with overlap** data requirements. Rather t… ▽ More Workloads that comb through vast amounts of data are gaining importance in the sciences. These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance. To maximize throughput for data-intensive queries, we put forth LifeRaft: a query processing system that batches queries with overlap** data requirements. Rather than scheduling queries in arrival order, LifeRaft executes queries concurrently against an ordering of the data that maximizes data sharing among queries. This decreases I/O and increases cache utility. However, such batch processing can increase query response time by starving interactive workloads. LifeRaft addresses starvation using techniques inspired by head scheduling in disk drives. Depending upon the workload saturation and queuing times, the system adaptively and incrementally trades-off processing queries in arrival order and data-driven batch processing. Evaluating LifeRaft in the SkyQuery federation of astronomy databases reveals a two-fold improvement in query throughput. △ Less

Submitted 9 September, 2009; originally announced September 2009.

Comments: CIDR 2009

arXiv:0901.3923 [pdf, ps, other]

Model-Based Event Detection in Wireless Sensor Networks

Authors: Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay

Abstract: In this paper we present an application of techniques from statistical signal processing to the problem of event detection in wireless sensor networks used for environmental monitoring. The proposed approach uses the well-established Principal Component Analysis (PCA) technique to build a compact model of the observed phenomena that is able to capture daily and seasonal trends in the collected m… ▽ More In this paper we present an application of techniques from statistical signal processing to the problem of event detection in wireless sensor networks used for environmental monitoring. The proposed approach uses the well-established Principal Component Analysis (PCA) technique to build a compact model of the observed phenomena that is able to capture daily and seasonal trends in the collected measurements. We then use the divergence between actual measurements and model predictions to detect the existence of discrete events within the collected data streams. Our preliminary results show that this event detection mechanism is sensitive enough to detect the onset of rain events using the temperature modality of a wireless sensor network. △ Less

Submitted 25 January, 2009; originally announced January 2009.

Journal ref: Workshop for Data Sharing and Interoperability on the World Wide Web (DSI 2007). April 2007, In Proceedings

arXiv:cs/0701170 [pdf]

Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service

Authors: Katalin Szlavecz, Andreas Terzis, Stuart Ozer, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Randal Burns, Jim Gray, Alex Szalay

Abstract: Wireless sensor networks can revolutionize soil ecology by providing measurements at temporal and spatial granularities previously impossible. This paper presents a soil monitoring system we developed and deployed at an urban forest in Baltimore as a first step towards realizing this vision. Motes in this network measure and save soil moisture and temperature in situ every minute. Raw measuremen… ▽ More Wireless sensor networks can revolutionize soil ecology by providing measurements at temporal and spatial granularities previously impossible. This paper presents a soil monitoring system we developed and deployed at an urban forest in Baltimore as a first step towards realizing this vision. Motes in this network measure and save soil moisture and temperature in situ every minute. Raw measurements are periodically retrieved by a sensor gateway and stored in a central database where calibrated versions are derived and stored. The measurement database is published through Web Services interfaces. In addition, analysis tools let scientists analyze current and historical data and help manage the sensor network. The article describes the system design, what we learned from the deployment, and initial results obtained from the sensors. The system measures soil factors with unprecedented temporal precision. However, the deployment required device-level programming, sensor calibration across space and time, and cross-referencing measurements with external sources. The database, web server, and data analysis design required considerable innovation and expertise. So, the ratio of computer-scientists to ecologists was 3:1. Before sensor networks can fulfill their potential as instruments that can be easily deployed by scientists, these technical problems must be addressed so that the ratio is one nerd per ten ecologists. △ Less

Submitted 26 January, 2007; originally announced January 2007.

Report number: MSR TR 2006 90

Showing 1–35 of 35 results for author: Burns, R