Skip to main content

Showing 1–10 of 10 results for author: Blanas, S

.
  1. arXiv:2404.04621  [pdf, other

    cs.PL cs.DB

    IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store Applications

    Authors: Chujun Geng, Spyros Blanas, Michael D. Bond, Yang Wang

    Abstract: This paper presents the first dynamic predictive analysis for data store applications under weak isolation levels, called Isopredict. Given an observed serializable execution of a data store application, Isopredict generates and solves SMT constraints to find an unserializable execution that is a feasible execution of the application. Isopredict introduces novel techniques that handle divergent ap… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Journal ref: Proc. ACM Program. Lang., Vol. 8, No. PLDI, Article 161. Publication date: June 2024

  2. SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling

    Authors: Yamei Tu, Olga Li, Junpeng Wang, Han-Wei Shen, Przemek Powalko, Irina Tomescu-Dubrow, Kazimierz M. Slomczynski, Spyros Blanas, J. Craig Jenkins

    Abstract: Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in di… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics Volume: 29, Issue: 6, 01 June 2023 pgs. 2862-2874

  3. arXiv:2009.11463  [pdf, other

    cs.DB cs.DS

    Algorithms for a Topology-aware Massively Parallel Computation Model

    Authors: Xiao Hu, Paraschos Koutris, Spyros Blanas

    Abstract: Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability, and can communicate with every other unit with the same latency and bandwidth. However, this strong assumption of a uniform topology rarely holds in practical settings, where computing units are connected through complex networks. To address this issue… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  4. arXiv:1810.00511  [pdf, ps, other

    cs.DB

    Chasing Similarity: Distribution-aware Aggregation Scheduling (Extended Version)

    Authors: Feilong Liu, Ario Salmasi, Spyros Blanas, Anastasios Sidiropoulos

    Abstract: Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an optional local pre-aggregation step and then repartitions the intermediate result across the network. While local pre-aggregation works well for low-cardinality aggregations, the network communication cost remains sig… ▽ More

    Submitted 29 November, 2018; v1 submitted 30 September, 2018; originally announced October 2018.

  5. arXiv:1807.11149  [pdf, ps, other

    cs.DB

    To Ship or Not to (Function) Ship (Extended version)

    Authors: Feilong Liu, Niranjan Kamat, Spyros Blanas, Arnab Nandi

    Abstract: Sampling is often used to reduce query latency for interactive big data analytics. The established parallel data processing paradigm relies on function ship**, where a coordinator dispatches queries to worker nodes and then collects the results. The commoditization of high-performance networking makes data ship** possible, where the coordinator directly reads data in the workers' memory using… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

    Comments: 4 pages, 3 figures

  6. arXiv:1805.05874  [pdf, ps, other

    cs.DC cs.DB

    Approximate Distributed Joins in Apache Spark

    Authors: Do Le Quoc, Istemi Ekin Akkus, Pramod Bhatotia, Spyros Blanas, Ruichuan Chen, Christof Fetzer, Thorsten Strufe

    Abstract: The join operation is a fundamental building block of parallel data processing. Unfortunately, it is very resource-intensive to compute an equi-join across massive datasets. The approximate computing paradigm allows users to trade accuracy and latency for expensive data processing operations. The equi-join operator is thus a natural candidate for optimization using approximation techniques. Althou… ▽ More

    Submitted 15 May, 2018; originally announced May 2018.

  7. arXiv:1702.08327  [pdf, ps, other

    cs.DB

    ArrayBridge: Interweaving declarative array processing with high-performance computing

    Authors: Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, Paul Brown

    Abstract: Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in res… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

    Comments: 12 pages, 13 figures

    ACM Class: H.2.8

  8. Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Authors: Feilong Liu, Spyros Blanas

    Abstract: Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time… ▽ More

    Submitted 21 July, 2015; v1 submitted 10 July, 2015; originally announced July 2015.

    Comments: 15 pages, 8 figures, extended version of the paper to appear in SoCC'15

  9. arXiv:1503.08482  [pdf, ps, other

    cs.DB

    Towards Exascale Scientific Metadata Management

    Authors: Spyros Blanas, Surendra Byna

    Abstract: Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines hav… ▽ More

    Submitted 29 March, 2015; originally announced March 2015.

  10. arXiv:1201.0228  [pdf, other

    cs.DB

    High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    Authors: Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, Mike Zwilling

    Abstract: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read… ▽ More

    Submitted 31 December, 2011; originally announced January 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 4, pp. 298-309 (2011)