Skip to main content

Showing 1–8 of 8 results for author: Eck, B

.
  1. arXiv:2312.09380  [pdf, other

    stat.CO

    Two-sample KS test with approxQuantile in Apache Spark

    Authors: Bradley Eck, Duygu Kabakci-Zorlu, Amadou Ba

    Abstract: The classical two-sample test of Kolmogorov-Smirnov (KS) is widely used to test whether empirical samples come from the same distribution. Even though most statistical packages provide an implementation, carrying out the test in big data settings can be challenging because it requires a full sort of the data. The popular Apache Spark system for big data processing provides a 1-sample KS test, but… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 6 pages, 3 figures, python code in appendix. To appear at IEEE Big Data 2023

  2. arXiv:2304.08352  [pdf, other

    cs.CL cs.AI

    What Makes a Good Dataset for Symbol Description Reading?

    Authors: Karol Lynch, Joern Ploennigs, Bradley Eck

    Abstract: The usage of mathematical formulas as concise representations of a document's key ideas is common practice. Correctly interpreting these formulas, by identifying mathematical symbols and extracting their descriptions, is an important task in document understanding. This paper makes the following contributions to the mathematical identifier description reading (MIDR) task: (i) introduces the Math… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  3. arXiv:2211.06239  [pdf, other

    cs.LG stat.AP

    A monitoring framework for deployed machine learning models with supply chain examples

    Authors: Bradley Eck, Duygu Kabakci-Zorlu, Yan Chen, France Savard, Xiaowei Bao

    Abstract: Actively monitoring machine learning models during production operations helps ensure prediction quality and detection and remediation of unexpected or undesired conditions. Monitoring models already deployed in big data environments brings the additional challenges of adding monitoring in parallel to the existing modelling workflow and controlling resource requirements. In this paper, we describe… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 8 pages, 9 figures, IEEE Big Data 2022

  4. arXiv:2103.07248  [pdf, other

    cs.LG cs.NE

    Knowledge- and Data-driven Services for Energy Systems using Graph Neural Networks

    Authors: Francesco Fusco, Bradley Eck, Robert Gormally, Mark Purcell, Seshu Tirupathi

    Abstract: The transition away from carbon-based energy sources poses several challenges for the operation of electricity distribution systems. Increasing shares of distributed energy resources (e.g. renewable energy generators, electric vehicles) and internet-connected sensing and control devices (e.g. smart heating and cooling) require new tools to support accurate, datadriven decision making. Modelling th… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in proceedings of IEEE Conference of Big Data 2020

  5. arXiv:2003.12141  [pdf, other

    cs.DC cs.AI cs.CY

    Scalable Deployment of AI Time-series Models for IoT

    Authors: Bradley Eck, Francesco Fusco, Robert Gormally, Mark Purcell, Seshu Tirupathi

    Abstract: IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI time-series models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks.… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Journal ref: Workshop AI for Internet of Things, IJCAI 2019

  6. AI Modelling and Time-series Forecasting Systems for Trading Energy Flexibility in Distribution Grids

    Authors: Bradley Eck, Francesco Fusco, Robert Gormally, Mark Purcell, Seshu Tirupathi

    Abstract: We demonstrate progress on the deployment of two sets of technologies to support distribution grid operators integrating high shares of renewable energy sources, based on a market for trading local energy flexibilities. An artificial-intelligence (AI) grid modelling tool, based on probabilistic graphs, predicts congestions and estimates the amount and location of energy flexibility required to avo… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

  7. arXiv:1811.08566  [pdf, other

    stat.CO stat.OT

    Castor: Contextual IoT Time Series Data and Model Management at Scale

    Authors: Bei Chen, Bradley Eck, Francesco Fusco, Robert Gormally, Mark Purcell, Mathieu Sinn, Seshu Tirupathi

    Abstract: We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monit… ▽ More

    Submitted 8 February, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: 6 pages, 6 figures, ICDM 2018

  8. arXiv:1810.09354  [pdf, other

    cs.CV

    Generation of Virtual Dual Energy Images from Standard Single-Shot Radiographs using Multi-scale and Conditional Adversarial Network

    Authors: Bo Zhou, Xunyu Lin, Brendan Eck, Jun Hou, David L. Wilson

    Abstract: Dual-energy (DE) chest radiographs provide greater diagnostic information than standard radiographs by separating the image into bone and soft tissue, revealing suspicious lesions which may otherwise be obstructed from view. However, acquisition of DE images requires two physical scans, necessitating specialized hardware and processing, and images are prone to motion artifact. Generation of virtua… ▽ More

    Submitted 14 April, 2021; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: 16 pages, 7 figures, accepted by Asian Conference on Computer Vision (2018 ACCV), code available at https://github.com/bbbbbbzhou/Virtual-Dual-Energy