Skip to main content

Showing 1–14 of 14 results for author: Polyzotis, N

.
  1. arXiv:2112.06439  [pdf, other

    cs.LG cs.DB

    What can Data-Centric AI Learn from Data and ML Engineering?

    Authors: Neoklis Polyzotis, Matei Zaharia

    Abstract: Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data. These range from traditional business data processing applications (e.g., "how much should we charge each of our customers this month?") to production ML systems such as recommendation engines. Th… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

  2. arXiv:2103.16007  [pdf, other

    cs.DB cs.LG

    Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

    Authors: Doris Xin, Hui Miao, Aditya Parameswaran, Neoklis Polyzotis

    Abstract: Machine learning (ML) is now commonplace, powering data-driven applications in various organizations. Unlike the traditional perception of ML in research, ML production pipelines are complex, with many interlocking analytical components beyond training, whose sub-parts are often run multiple times on overlap** subsets of data. However, there is a lack of quantitative evidence regarding the lifes… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Journal ref: Proceedings of the 2021 International Conference on Management of Data

  3. arXiv:2010.02013  [pdf

    cs.SE cs.LG

    Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)

    Authors: Konstantinos, Katsiapis, Abhijit Karmarkar, Ahmet Altay, Aleksandr Zaks, Neoklis Polyzotis, Anusha Ramesh, Ben Mathes, Gautam Vasudevan, Irene Giannoumis, Jarek Wilkiewicz, Jiri Simsa, Justin Hong, Mitch Trott, NoƩ Lutz, Pavel A. Dournov, Robert Crowe, Sarah Sirajuddin, Tris Brian Warkentin, Zhitao Li

    Abstract: Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering was an eventuality. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more a… ▽ More

    Submitted 7 October, 2020; v1 submitted 28 September, 2020; originally announced October 2020.

    Comments: 16 pages

  4. arXiv:1910.01177  [pdf, other

    stat.ML cs.LG

    Improving Differentially Private Models with Active Learning

    Authors: Zhengli Zhao, Nicolas Papernot, Sameer Singh, Neoklis Polyzotis, Augustus Odena

    Abstract: Broad adoption of machine learning techniques has increased privacy concerns for models trained on sensitive data such as medical records. Existing techniques for training differentially private (DP) models give rigorous privacy guarantees, but applying these techniques to neural networks can severely degrade model performance. This performance reduction is an obstacle to deploying private models… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

  5. arXiv:1807.06068  [pdf, other

    cs.DB cs.LG

    Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

    Authors: Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang

    Abstract: As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to hel** users trace model performance problems all the way to the data. We focus on the particular problem of slicing data to identify subsets of the validation data where the model performs poorly. This is an i… ▽ More

    Submitted 6 January, 2019; v1 submitted 16 July, 2018; originally announced July 2018.

  6. arXiv:1712.01208  [pdf, other

    cs.DB cs.DS cs.NE

    The Case for Learned Index Structures

    Authors: Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis

    Abstract: Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be… ▽ More

    Submitted 30 April, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

  7. arXiv:1304.1838  [pdf, other

    cs.DB cs.DC cs.PF

    Towards a Workload for Evolutionary Analytics

    Authors: Jeff LeFevre, Jagan Sankaranarayanan, Hakan Hacigumus, Junichi Tatemura, Neoklis Polyzotis

    Abstract: Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify sev… ▽ More

    Submitted 27 June, 2013; v1 submitted 5 April, 2013; originally announced April 2013.

    Comments: 10 pages

    Journal ref: DanaC: Workshop on Data analytics in the Cloud, June 2013, New York, NY

  8. arXiv:1304.1411  [pdf, ps, other

    cs.DB

    RITA: An Index-Tuning Advisor for Replicated Databases

    Authors: Quoc Trung Tran, Ivo Jimenez, Rui Wang, Neoklis Polyzotis, Anastasia Ailamaki

    Abstract: Given a replicated database, a divergent design tunes the indexes in each replica differently in order to specialize it for a specific subset of the workload. This specialization brings significant performance gains compared to the common practice of having the same indexes in all replicas, but requires the development of new tuning tools for database administrators. In this paper we introduce RIT… ▽ More

    Submitted 19 July, 2013; v1 submitted 4 April, 2013; originally announced April 2013.

    Comments: 15 pages, 11 figures

  9. arXiv:1303.6609  [pdf, other

    cs.DB cs.DC cs.DS

    Exploiting Opportunistic Physical Design in Large-scale Data Analytics

    Authors: Jeff LeFevre, Jagan Sankaranarayanan, Hakan Hacigumus, Junichi Tatemura, Neoklis Polyzotis, Michael J. Carey

    Abstract: Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these materializations yield a large set of materialized views that typically capture common computation among successive queries from the same analyst, or even across queries o… ▽ More

    Submitted 10 December, 2013; v1 submitted 26 March, 2013; originally announced March 2013.

    Comments: 15 pages

  10. arXiv:1303.3517  [pdf, other

    cs.DC cs.DB cs.LG

    Iterative MapReduce for Large Scale Machine Learning

    Authors: Joshua Rosen, Neoklis Polyzotis, Vinayak Borkar, Yingyi Bu, Michael J. Carey, Markus Weimer, Tyson Condie, Raghu Ramakrishnan

    Abstract: Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds,… ▽ More

    Submitted 13 March, 2013; originally announced March 2013.

  11. arXiv:1203.0160  [pdf, other

    cs.DB cs.LG cs.PF

    Scaling Datalog for Machine Learning on Big Data

    Authors: Yingyi Bu, Vinayak Borkar, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan

    Abstract: In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to… ▽ More

    Submitted 2 March, 2012; v1 submitted 1 March, 2012; originally announced March 2012.

  12. arXiv:1104.3214  [pdf

    cs.DB

    CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

    Authors: Debabrata Dash, Neoklis Polyzotis, Anastasia Ailamaki

    Abstract: Index tuning, i.e., selecting the indexes appropriate for a workload, is a crucial problem in database system tuning. In this paper, we solve index tuning for large problem instances that are common in practice, e.g., thousands of queries in the workload, thousands of candidate indexes and several hard and soft constraints. Our work is the first to reveal that the index tuning problem has a well s… ▽ More

    Submitted 16 April, 2011; originally announced April 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 6, pp. 362-372 (2011)

  13. arXiv:1103.3102  [pdf

    cs.DB cs.DS

    Human-Assisted Graph Search: It's Okay to Ask Questions

    Authors: Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis, Jennifer Widom

    Abstract: We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form "Is there a target node that is reachable from the current node?". This general problem has applications in many domains that can utilize human intelligence, including cur… ▽ More

    Submitted 16 March, 2011; originally announced March 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 5, pp. 267-278 (2011)

  14. arXiv:1004.1249  [pdf, other

    cs.DB

    Semi-Automatic Index Tuning: Kee** DBAs in the Loop

    Authors: Karl Schnaitter, Neoklis Polyzotis

    Abstract: To obtain good system performance, a DBA must choose a set of indices that is appropriate for the workload. The system can aid in this challenging task by providing recommendations for the index configuration. We propose a new index recommendation technique, termed semi-automatic tuning, that keeps the DBA "in the loop" by generating recommendations that use feedback about the DBA's preferences. T… ▽ More

    Submitted 30 October, 2011; v1 submitted 8 April, 2010; originally announced April 2010.