Skip to main content

Showing 1–2 of 2 results for author: Dandamudi, A

.
  1. arXiv:2310.07898  [pdf, other

    cs.SE cs.DB

    FlorDB: Multiversion Hindsight Logging for Continuous Training

    Authors: Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen

    Abstract: Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and… ▽ More

    Submitted 2 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  2. arXiv:2006.07357  [pdf, other

    cs.DC cs.DB cs.SE

    Hindsight Logging for Model Training

    Authors: Rolando Garcia, Eric Liu, Vikram Sreekanti, Bobby Yan, Anusha Dandamudi, Joseph E. Gonzalez, Joseph M. Hellerstein, Koushik Sen

    Abstract: In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible. Optimistic logging can be accompanied by program checkpoints; this allows developers to a… ▽ More

    Submitted 2 December, 2020; v1 submitted 12 June, 2020; originally announced June 2020.