Skip to main content

Showing 1–2 of 2 results for author: Smith, C R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.01363  [pdf, other

    cs.LG stat.AP

    Faking feature importance: A cautionary tale on the use of differentially-private synthetic data

    Authors: Oscar Giles, Kasra Hosseini, Grigorios Mingas, Oliver Strickson, Louise Bowler, Camila Rangel Smith, Harrison Wilde, Jen Ning Lim, Bilal Mateen, Kasun Amarasinghe, Rayid Ghani, Alison Heppenstall, Nik Lomax, Nick Malleson, Martin O'Reilly, Sebastian Vollmerteke

    Abstract: Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering a… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 27 pages, 8 figures

  2. arXiv:2004.12929  [pdf, other

    cs.DB

    Data Engineering for Data Analytics: A Classification of the Issues, and Case Studies

    Authors: Alfredo Nazabal, Christopher K. I. Williams, Giovanni Colavizza, Camila Rangel Smith, Angus Williams

    Abstract: Consider the situation where a data analyst wishes to carry out an analysis on a given dataset. It is widely recognized that most of the analyst's time will be taken up with \emph{data engineering} tasks such as acquiring, understanding, cleaning and preparing the data. In this paper we provide a description and classification of such tasks into high-levels groups, namely data organization, data q… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: 24 pages, 1 figure, submitted to IEEE Transactions on Knowledge and Data Engineering