Skip to main content

Showing 1–10 of 10 results for author: Zola, J

.
  1. arXiv:2306.15763  [pdf, other

    cs.SE cs.LG

    Predicting the Impact of Batch Refactoring Code Smells on Application Resource Consumption

    Authors: Asif Imran, Tevfik Kosar, Jaroslaw Zola, Muhammed Fatih Bulut

    Abstract: Automated batch refactoring has become a de-facto mechanism to restructure software that may have significant design flaws negatively impacting the code quality and maintainability. Although automated batch refactoring techniques are known to significantly improve overall software quality and maintainability, their impact on resource utilization is not well studied. This paper aims to bridge the g… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  2. Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache Spark

    Authors: Frank Schoeneman, Jaroslaw Zola

    Abstract: Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input graphs, necessitating parallel approaches. In this work, we provide detailed analysis of parallel APSP performance on distributed memory clusters with Apache Sp… ▽ More

    Submitted 7 August, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  3. arXiv:1808.10776  [pdf, other

    cs.DC cs.LG

    Scalable Manifold Learning for Big Data with Apache Spark

    Authors: Frank Schoeneman, Jaroslaw Zola

    Abstract: Non-linear spectral dimensionality reduction methods, such as Isomap, remain important technique for learning manifolds. However, due to computational complexity, exact manifold learning using Isomap is currently impossible from large-scale data. In this paper, we propose a distributed memory framework implementing end-to-end exact Isomap under Apache Spark model. We show how each critical step of… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

  4. arXiv:1806.06477  [pdf, other

    cs.CR

    Privacy Preserving Analytics on Distributed Medical Data

    Authors: Marina Blanton, Ah Reum Kang, Subhadeep Karan, Jaroslaw Zola

    Abstract: Objective: To enable privacy-preserving learning of high quality generative and discriminative machine learning models from distributed electronic health records. Methods and Results: We describe general and scalable strategy to build machine learning models in a provably privacy-preserving way. Compared to the standard approaches using, e.g., differential privacy, our method does not require al… ▽ More

    Submitted 17 June, 2018; originally announced June 2018.

  5. arXiv:1804.04640  [pdf, other

    stat.ML cs.LG

    Fast Counting in Machine Learning Applications

    Authors: Subhadeep Karan, Matthew Eichhorn, Blake Hurlburt, Grant Iraci, Jaroslaw Zola

    Abstract: We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and… ▽ More

    Submitted 7 January, 2019; v1 submitted 12 April, 2018; originally announced April 2018.

  6. arXiv:1802.06823  [pdf, other

    stat.ML cs.LG

    Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes

    Authors: Frank Schoeneman, Varun Chandola, Nils Napp, Olga Wodo, Jaroslaw Zola

    Abstract: Scientific and engineering processes deliver massive high-dimensional data sets that are generated as non-linear transformations of an initial state and few process parameters. Map** such data to a low-dimensional manifold facilitates better understanding of the underlying processes, and enables their optimization. In this paper, we first show that off-the-shelf non-linear spectral dimensionalit… ▽ More

    Submitted 6 August, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

  7. arXiv:1711.07370  [pdf, other

    cs.CE cs.DC q-bio.GN

    Applications and Challenges of Real-time Mobile DNA Analysis

    Authors: Steven Y. Ko, Lauren Sassoubre, Jaroslaw Zola

    Abstract: The DNA sequencing is the process of identifying the exact order of nucleotides within a given DNA molecule. The new portable and relatively inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential to move DNA sequencing outside of laboratory, leading to faster and more accessible DNA-based diagnostics. However, portable DNA sequencing and analysis are challenging for mobile… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

  8. Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

    Authors: Subhadeep Karan, Jaroslaw Zola

    Abstract: In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In… ▽ More

    Submitted 24 October, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

  9. Error Metrics for Learning Reliable Manifolds from Streaming Data

    Authors: Frank Schoeneman, Suchismit Mahapatra, Varun Chandola, Nils Napp, Jaroslaw Zola

    Abstract: Spectral dimensionality reduction is frequently used to identify low-dimensional structure in high-dimensional data. However, learning manifolds, especially from the streaming data, is computationally and memory expensive. In this paper, we argue that a stable manifold can be learned using only a fraction of the stream, and the remaining stream can be mapped to the manifold in a significantly less… ▽ More

    Submitted 11 January, 2017; v1 submitted 12 November, 2016; originally announced November 2016.

  10. Exact Structure Learning of Bayesian Networks by Optimal Path Extension

    Authors: Subhadeep Karan, Jaroslaw Zola

    Abstract: Bayesian networks are probabilistic graphical models often used in big data analytics. The problem of exact structure learning is to find a network structure that is optimal under certain scoring criteria. The problem is known to be NP-hard and the existing methods are both computationally and memory intensive. In this paper, we introduce a new approach for exact structure learning. Our strategy i… ▽ More

    Submitted 21 March, 2017; v1 submitted 8 August, 2016; originally announced August 2016.

    Comments: Published in the IEEE BigData 2016, this version contains a correction to Figure 1c