Skip to main content

Showing 1–30 of 30 results for author: Bergeron, B

.
  1. pPython Performance Study

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Window… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.14908

  2. pPython for Parallel Python Programming

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Kurt Keville, Anna Klein, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. The core data structure in pPython is a distributed numerical array whose distribution onto multiple processors is specified with a map c… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:astro-ph/0606464

  3. 3D Real-Time Supercomputer Monitoring

    Authors: Bill Bergeron, Matthew Hubbell, Dylan Sequeira, Winter Williams, William Arcand, David Bestor, Chansup, Byun, Vijay Gadepally, Michael Houle, Michael Jones, Anna Klien, Peter Michaleas, Lauren Milechin, Julie Mullen Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient me… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  4. Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

    Authors: Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: IEEE HPEC 2021

  5. Accuracy and Performance Comparison of Video Action Recognition Approaches

    Authors: Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally

    Abstract: Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-t… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at IEEE HPEC 2020

  6. Benchmarking network fabrics for data distributed training of deep neural networks

    Authors: Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther, Jeremy Kepner

    Abstract: Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simp… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at IEEE HPEC 2020

  7. Best of Both Worlds: High Performance Interactive and Batch Launching

    Authors: Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Andrew Kirby, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

    Abstract: Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long run… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  8. Large Scale Parallelization Using File-Based Communications

    Authors: Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

    Abstract: In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node message file transfers when both the sending and r… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  9. Securing HPC using Federated Authentication

    Authors: Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther, Jeremy Kepner

    Abstract: Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security. Integrating with the user's more frequently used account at their primary organization both provides a better experience to the end user and makes account compromise or changes in affiliation more likely to be noticed and acted upon. Additionally, with m… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

  10. Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers

    Authors: Julia Mullen, Albert Reuther, William Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

    Abstract: For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively develop and share code projects across the globe. As the HPC commu… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

    Comments: 15 pages, 3 figures, First Workshop on Interactive High Performance Computing (WIHPC) 2018 held in conjunction with ISC High Performance 2018 in Frankfurt, Germany

    ACM Class: D.2.6

  11. arXiv:1808.08353  [pdf, other

    cs.DC cs.CR

    Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

    Authors: Vijay Gadepally, Jeremy Kepner, Lauren Milechin, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Matthew Hubbell, Micheal Houle, Micheal Jones, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Siddharth Samsi, Albert Reuther

    Abstract: Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environme… ▽ More

    Submitted 25 August, 2018; originally announced August 2018.

    Comments: Accepted to IEEE HPEC 2018

  12. arXiv:1808.04345  [pdf

    cs.DC

    Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer

    Authors: Michael Jones, Jeremy Kepner, Bradley Orchard, Albert Reuther, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Peter Michaleas

    Abstract: Simulation, machine learning, and data analysis require a wide range of software which can be dependent upon specific operating systems, such as Microsoft Windows. Running this software interactively on massively parallel supercomputers can present many challenges. Traditional methods of scaling Microsoft Windows applications to run on thousands of processors have typically relied on heavyweight v… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

  13. Measuring the Impact of Spectre and Meltdown

    Authors: Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther, Jeremy Kepner

    Abstract: The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we measure the performance impact on several workloads relevant to… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

  14. Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

    Authors: Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Peter Michaleas

    Abstract: Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: 6 pages, 7 figures, IEEE High Performance Extreme Computing Conference 2018

    ACM Class: C.4; D.4.1

  15. arXiv:1803.01281  [pdf, other

    cs.DC cs.DM cs.DS cs.PF math.CO

    Design, Generation, and Validation of Extreme Scale Power-Law Graphs

    Authors: Jeremy Kepner, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Tim Davis, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Anna Klein, Peter Michaleas, Roger Pearce, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Geoff Sanders, Charles Yee, Albert Reuther

    Abstract: Massive power-law graphs drive many fields: metagenomics, brain map**, Internet-of-things, cybersecurity, and sparse machine learning. The development of novel algorithms and systems to process these data requires the design, generation, and validation of enormous graphs with exactly known properties. Such graphs accelerate the proper testing of new algorithms and systems and are a prerequisite… ▽ More

    Submitted 3 March, 2018; originally announced March 2018.

    Comments: 8 pages, 6 figures, IEEE IPDPS 2018 Graph Algorithm Building Blocks (GABB) workshop

  16. arXiv:1708.00544  [pdf, other

    cs.DC astro-ph.IM cs.NI cs.OS cs.PF

    Performance Measurements of Supercomputing and Cloud Storage Solutions

    Authors: Michael Jones, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Peter Michaleas, Andrew Prout, Albert Reuther, Siddharth Samsi, Paul Monticiollo

    Abstract: Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to infor… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

    Comments: 5 pages, 4 figures, to appear in IEEE HPEC 2017

  17. arXiv:1707.05900  [pdf

    cs.DC cs.HC cs.SE

    MIT SuperCloud Portal Workspace: Enabling HPC Web Application Deployment

    Authors: Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Michael Houle, Michael Jones, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Albert Reuther, Jeremy Kepner

    Abstract: The MIT SuperCloud Portal Workspace enables the secure exposure of web services running on high performance computing (HPC) systems. The portal allows users to run any web application as an HPC job and access it from their workstation while providing authentication, encryption, and access control at the system level to prevent unintended access. This capability permits users to seamlessly utilize… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: 6 pages, 3 figures, to appear in IEEE HPEC 2017

  18. arXiv:1707.03515  [pdf

    cs.PF astro-ph.IM cs.DC physics.comp-ph

    Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

    Authors: Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

    Abstract: Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: 6 pages; 9 figures; accepted to IEEE HPEC 2017

  19. Scalable System Scheduling for HPC and Big Data

    Authors: Albert Reuther, Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Michael Jones, Peter Michaleas, Andrew Prout, Antonio Rosa, Jeremy Kepner

    Abstract: In the rapidly expanding field of parallel processing, job schedulers are the "operating systems" of modern big data architectures and supercomputing systems. Job schedulers allocate computing resources and control the execution of processes on those resources. Historically, job schedulers were the domain of supercomputers, and job schedulers were designed to run massive, long-running computations… ▽ More

    Submitted 8 May, 2017; originally announced May 2017.

    Comments: 34 pages, 7 figures

  20. arXiv:1609.07545  [pdf, other

    cs.DB cs.DC cs.PF q-bio.QM

    Benchmarking SciDB Data Import on HPC Systems

    Authors: Siddharth Samsi, Laura Brattain, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner, Albert Reuther

    Abstract: SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity ha… ▽ More

    Submitted 23 September, 2016; originally announced September 2016.

    Comments: 5 pages, 4 figures, IEEE High Performance Extreme Computing (HPEC) 2016, best paper finalist

  21. Scheduler Technologies in Support of High Performance Data Analysis

    Authors: Albert Reuther, Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Michael Jones, Peter Michaleas, Andrew Prout, Antonio Rosa, Jeremy Kepner

    Abstract: Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have diversified from long-running, synchronously-parallel simulations to include short-duration, independently parallel high performance data analysis (HPDA) jobs. Each of th… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

    Comments: 6 pages, 5 figures, IEEE High Performance Extreme Computing Conference 2016

  22. LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

    Authors: Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

    Abstract: The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel pro… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

    Comments: 8 pages; 19 figures; IEEE HPEC 2016

  23. Enhancing HPC Security with a User-Based Firewall

    Authors: Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Michael Houle, Michael Jones, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Albert Reuther, Jeremy Kepner

    Abstract: HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Many methods have been employed to perform this authentication.… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

  24. arXiv:1606.05794  [pdf

    cs.DC cs.CY cs.OS eess.SY

    Scalability of VM Provisioning Systems

    Authors: Mike Jones, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Lauren Milechin, Vijay Gadepally, Matt Hubbell, Jeremy Kepner, Pete Michaleas, Julie Mullen, Andy Prout, Tony Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

    Abstract: Virtual machines and virtualized hardware have been around for over half a century. The commoditization of the x86 platform and its rapidly growing hardware capabilities have led to recent exponential growth in the use of virtualization both in the enterprise and high performance computing (HPC). The startup time of a virtualized environment is a key performance metric for high performance computi… ▽ More

    Submitted 18 June, 2016; originally announced June 2016.

    Comments: 5 pages; 6 figures; accepted to the IEEE High Performance Extreme Computing (HPEC) conference 2016

  25. D4M: Bringing Associative Arrays to Database Engines

    Authors: Vijay Gadepally, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

    Abstract: The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Numerous tools exist that allow users to store, query and index these massive quantities of data. Each storage or database engine comes with the pr… ▽ More

    Submitted 28 August, 2015; originally announced August 2015.

  26. Lustre, Hadoop, Accumulo

    Authors: Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

    Abstract: Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the lar… ▽ More

    Submitted 8 July, 2015; originally announced July 2015.

    Comments: 6 pages; accepted to IEEE High Performance Extreme Computing conference, Waltham, MA, 2015

  27. Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

    Authors: Andrew Prout, Jeremy Kepner, Peter Michaleas, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther

    Abstract: The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned b… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

    Comments: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2015. arXiv admin note: text overlap with arXiv:1406.4923

  28. Big Data Strategies for Data Center Infrastructure Management Using a 3D Gaming Platform

    Authors: Matthew Hubbell, Andrew Moran, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Peter Michaleas, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee, Jeremy Kepner

    Abstract: High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is more critical than ever. The costs and co… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

    Comments: 6 pages; accepted to IEEE High Peformance Extreme Computing (HPEC) conference 2015

  29. arXiv:1407.3859  [pdf

    cs.DB astro-ph.IM cs.DC

    D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database

    Authors: Jeremy Kepner, Christian Anderson, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Matthew Hubbell, Peter Michaleas, Julie Mullen, David O'Gwynn, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee

    Abstract: Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed… ▽ More

    Submitted 14 July, 2014; originally announced July 2014.

    Comments: 6 pages; IEEE HPEC 2013

  30. arXiv:1406.4923  [pdf

    cs.DB astro-ph.IM cs.CE cs.DC cs.MS

    Achieving 100,000,000 database inserts per second using Accumulo and D4M

    Authors: Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee

    Abstract: The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the b… ▽ More

    Submitted 18 June, 2014; originally announced June 2014.

    Comments: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 2014