Skip to main content

Showing 1–14 of 14 results for author: Karanasos, K

.
  1. arXiv:2210.14047  [pdf, other

    cs.DB

    OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Logs [Technical Report]

    Authors: Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesús Camacho-Rodríguez, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

    Abstract: Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity a… ▽ More

    Submitted 3 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    ACM Class: H.2

  2. End-to-end Optimization of Machine Learning Prediction Queries

    Authors: Kwanghyun Park, Karla Saur, Dalitso Banda, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos

    Abstract: Prediction queries are widely used across industries to perform advanced analytics and draw insights from data. They include a data processing part (e.g., for joining, filtering, cleaning, featurizing the datasets) and a machine learning (ML) part invoking one or more trained models to perform predictions. These parts have so far been optimized in isolation, leaving significant opportunities for o… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  3. arXiv:2203.01877  [pdf, other

    cs.DB cs.AI cs.LG

    Query Processing on Tensor Computation Runtimes

    Authors: Dong He, Supun Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesús Camacho-Rodríguez, Konstantinos Karanasos, Matteo Interlandi

    Abstract: The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientis… ▽ More

    Submitted 9 February, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

    Journal ref: Proceedings of the VLDB Endowment, 15(11): 2811 - 2825, 2022

  4. KEA: Tuning an Exabyte-Scale Data Infrastructure

    Authors: Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino

    Abstract: Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficienc… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  5. arXiv:2010.04804  [pdf, other

    cs.LG

    A Tensor Compiler for Unified Machine Learning Prediction Serving

    Authors: Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi

    Abstract: Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure---the bespoke solutions typical in large web companies are simply untenable. Model scoring, the process of obtaining predictions from a trained model over new data, is a primary contributor to infrastructure complexity and cost as models are trained once but used many times. In this paper w… ▽ More

    Submitted 19 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

  6. arXiv:1912.09536  [pdf, other

    cs.LG cs.DC stat.ML

    Data Science through the looking glass and what we found there

    Authors: Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

    Abstract: The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to c… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

  7. arXiv:1911.00231  [pdf, other

    cs.DB cs.LG

    Extending Relational Query Processing with ML Inference

    Authors: Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino

    Abstract: The broadening adoption of machine learning in the enterprise is increasing the pressure for strict governance and cost-effective performance, in particular for the common and consequential steps of model storage and inference. The RDBMS provides a natural starting point, given its mature infrastructure for fast data access and processing, along with support for enterprise features (e.g., encrypti… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

  8. arXiv:1909.00084  [pdf, other

    cs.DB cs.DC cs.LG

    Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

    Authors: Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh **dal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu

    Abstract: Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex… ▽ More

    Submitted 27 December, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

  9. arXiv:1908.09048  [pdf, other

    cs.LG cs.DC stat.ML

    Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms

    Authors: Liqun Shao, Yiwen Zhu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Siqi Liu, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos

    Abstract: Microsoft's internal big data analytics platform is comprised of hundreds of thousands of machines, serving over half a million jobs daily, from thousands of users. The majority of these jobs are recurring and are crucial for the company's operation. Although administrators spend significant effort tuning system performance, some jobs inevitably experience slowdowns, i.e., their execution time deg… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

  10. arXiv:1906.06590  [pdf, other

    cs.DB

    Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems

    Authors: Alekh **dal, Lalitha Viswanathan, Konstantinos Karanasos

    Abstract: Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimizers designed for systems that run on dedicated resourc… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

  11. arXiv:1906.05162  [pdf, other

    cs.DB

    Kaskade: Graph Views for Efficient Graph Analytics

    Authors: Joana M. F. da Trindade, Konstantinos Karanasos, Carlo Curino, Samuel Madden, Julian Shun

    Abstract: Graphs are an increasingly popular way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive subgraph traversals and complex analytical computations. These real-world graphs are often substantially more structured than a generic vertex-and-edge model would sug… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  12. arXiv:1112.2610  [pdf, other

    cs.DB

    The ViP2P Platform: XML Views in P2P

    Authors: Konstantinos Karanasos, Asterios Katsifodimos, Ioana Manolescu, Spyros Zoupanos

    Abstract: The growing volumes of XML data sources on the Web or produced by enterprises, organizations etc. raise many performance challenges for data management applications. In this work, we are concerned with the distributed, peer-to-peer management of large corpora of XML documents, based on distributed hash table (or DHT, in short) overlay networks. We present ViP2P (standing for Views in Peer-to-Peer)… ▽ More

    Submitted 12 December, 2011; originally announced December 2011.

    Comments: RR-7812 (2011)

  13. arXiv:1110.6648  [pdf, other

    cs.DB

    View Selection in Semantic Web Databases

    Authors: François Goasdoué, Konstantinos Karanasos, Julien Leblay, Ioana Manolescu

    Abstract: We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view s… ▽ More

    Submitted 30 October, 2011; originally announced October 2011.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 2, pp. 97-108 (2011)

  14. arXiv:1008.2186  [pdf, ps, other

    cs.DB cs.AI

    RDFViewS: A Storage Tuning Wizard for RDF Applications

    Authors: François Goasdoué, Konstantinos Karanasos, Julien Leblay, Ioana Manolescu

    Abstract: In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and stora… ▽ More

    Submitted 12 August, 2010; originally announced August 2010.

    Journal ref: ACM International Conference on Information and Knowledge Management, Toronto : Canada (2010)