Skip to main content

Showing 1–8 of 8 results for author: Naughton, J F

.
  1. arXiv:2311.04824  [pdf, other

    cs.DB cs.DC cs.PL

    Bilevel Relations and Their Applications to Data Insights

    Authors: Xi Wu, Xiangyao Yu, Shaleen Deep, Ahmed Mahmood, Uyeong Jang, Stratis Viglas, Somesh Jha, John Cieslewicz, Jeffrey F. Naughton

    Abstract: Many data-insight analytic tasks in anomaly detection, metric attribution, and experimentation analysis can be modeled as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc manner. While various frameworks have been proposed (e.g., DIFF, VLDB 2019), a systematic and general treatment is lacking. This paper describes bilevel rela… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Some overlap on examples and experiments with arXiv:2302.00120. The latter draft will be revised to focus on implementation

  2. arXiv:2302.00120  [pdf, other

    cs.DB cs.DC cs.PL

    Holistic Cube Analysis: A Query Framework for Data Insights

    Authors: Xi Wu, Shaleen Deep, Joe Benassi, Fengan Li, Yaqi Zhang, Uyeong Jang, James Foster, Stella Kim, Yu**g Sun, Long Nguyen, Stratis Viglas, Somesh Jha, John Cieslewicz, Jeffrey F. Naughton

    Abstract: Many data insight questions can be viewed as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc user defined manner. This paper presents Holistic Cube Analysis (HoCA), a framework that augments the capabilities of relational queries for such problems. HoCA first augments the relational data model and introduces a new data type A… ▽ More

    Submitted 1 July, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: Establishing initial concepts of HoCA

  3. arXiv:1702.06943  [pdf, other

    cs.LG cs.DB stat.ML

    Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent

    Authors: Fengan Li, Lingjiao Chen, Yi**g Zeng, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu

    Abstract: Data compression is a popular technique for improving the efficiency of data processing workloads such as SQL queries and more recently, machine learning (ML) with classical batch gradient methods. But the efficacy of such ideas for mini-batch stochastic gradient descent (MGD), arguably the workhorse algorithm of modern ML, is an open question. MGD's unique data access pattern renders prior art, i… ▽ More

    Submitted 20 January, 2019; v1 submitted 22 February, 2017; originally announced February 2017.

    Comments: Accepted to Sigmod 2019

  4. arXiv:1606.04722  [pdf, other

    cs.LG cs.CR cs.DB stat.ML

    Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

    Authors: Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey F. Naughton

    Abstract: While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high develo… ▽ More

    Submitted 23 March, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

  5. arXiv:1601.05748  [pdf, ps, other

    cs.DB

    Sampling-Based Query Re-Optimization

    Authors: Wentao Wu, Jeffrey F. Naughton, Harneet Singh

    Abstract: Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. Specif… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

    Comments: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016)

  6. arXiv:1512.06388  [pdf, other

    cs.CR cs.DB cs.LG

    Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences

    Authors: Xi Wu, Matthew Fredrikson, Wentao Wu, Somesh Jha, Jeffrey F. Naughton

    Abstract: Private regression has received attention from both database and security communities. Recent work by Fredrikson et al. (USENIX Security 2014) analyzed the functional mechanism (Zhang et al. VLDB 2012) for training linear regression models over medical data. Unfortunately, they found that model accuracy is already unacceptable with differential privacy when $\varepsilon = 5$. We address this issue… ▽ More

    Submitted 20 December, 2015; originally announced December 2015.

  7. arXiv:1408.6589  [pdf, ps, other

    cs.DB

    Uncertainty Aware Query Execution Time Prediction

    Authors: Wentao Wu, Xi Wu, Hakan Hacıgümüş, Jeffrey F. Naughton

    Abstract: Predicting query execution time is a fundamental issue underlying many database management tasks. Existing predictors rely on information such as cardinality estimates and system performance constants that are difficult to know exactly. As a result, accurate prediction still remains elusive for many queries. However, existing predictors provide a single, point estimate of the true execution time,… ▽ More

    Submitted 27 August, 2014; originally announced August 2014.

    Comments: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the VLDB Endowment (PVLDB), Vol. 7(14), 2014

  8. arXiv:1312.4283  [pdf, ps, other

    cs.DB

    On Load Shedding in Complex Event Processing

    Authors: Yeye He, Siddharth Barman, Jeffrey F. Naughton

    Abstract: Complex Event Processing (CEP) is a stream processing model that focuses on detecting event patterns in continuous event streams. While the CEP model has gained popularity in the research communities and commercial technologies, the problem of gracefully degrading performance under heavy load in the presence of resource constraints, or load shedding, has been largely overlooked. CEP is similar to… ▽ More

    Submitted 16 December, 2013; originally announced December 2013.

    Comments: The conference version of this work to appear in the International Conference on Database Theory (ICDT), 2014