-
Tracing Distributed Algorithms Using Replay Clocks
Authors:
Ishaan Lagwankar
Abstract:
In this thesis, we introduce replay clocks (RepCl), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to replay a computation as it happened, with the ability to represent concurrent events effectively. It builds on the structures introduced by vector clocks (VC) and the Hybrid Logical Clock (HLC), comb…
▽ More
In this thesis, we introduce replay clocks (RepCl), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to replay a computation as it happened, with the ability to represent concurrent events effectively. It builds on the structures introduced by vector clocks (VC) and the Hybrid Logical Clock (HLC), combining their infrastructures to provide efficient replay. With such a clock, a user can replay a computation whilst considering multiple paths of executions, and check for constraint violations and properties that potential pathways could take in the presence of concurrent events. Specifically, if event e must occur before f then the replay clock must ensure that e is replayed before f. On the other hand, if e and f could occur in any order, replay should not force an order between them. We demonstrate that RepCl can be implemented with less than four integers for 64 processes for various system parameters if clocks are synchronized within 1ms. Furthermore, the overhead of RepCl (for computing timestamps and message size) is proportional to the size of the clock. Using simulations in a custom distributed system and NS-3, a state-of-the-art network simulator, we identify the expected overhead of RepCl. We also identify how a user can then identify feasibility region for RepCl, where unabridged replay is possible. Using the RepCl, we provide a tracer for distributed computations, that allows any computation using the RepCl to be replayed efficiently. The visualization allows users to analyze specific properties and constraints in an online fashion, with the ability to consider concurrent paths independently. The visualization provides per-process views and an overarching view of the whole computation based on the time recorded by the RepCl for each event.
△ Less
Submitted 18 June, 2024;
originally announced July 2024.
-
Replay Clocks
Authors:
Ishaan Lagwankar,
Sandeep S Kulkarni
Abstract:
In this work, we focus on the problem of replay clocks (RepCL). The need for replay clocks arises from the observation that analyzing distributed computation for all desired properties of interest may not be feasible in an online environment. These properties can be analyzed by replaying the computation. However, to be beneficial, such replay must account for all the uncertainty that is possible i…
▽ More
In this work, we focus on the problem of replay clocks (RepCL). The need for replay clocks arises from the observation that analyzing distributed computation for all desired properties of interest may not be feasible in an online environment. These properties can be analyzed by replaying the computation. However, to be beneficial, such replay must account for all the uncertainty that is possible in a distributed computation. Specifically, if event 'e' must occur before 'f' then the replay clock must ensure that 'e' is replayed before 'f'. On the other hand, if 'e' and 'f' could occur in any order then replay should not force an order between them.
After identifying the limitations of existing clocks to provide the replay primitive, we present RepCL and identify an efficient representation for the same. We demonstrate that RepCL can be implemented with less than four integers for 64 processes for various system parameters if clocks are synchronized within 1 ms. Furthermore, the overhead of RepCL (for computing/comparing timestamps and message size) is proportional to the size of the clock. Using simulations, we identify the expected overhead of RepCL based on the given system settings. We also identify how a user can the identify feasibility region for RepCL. Specifically, given the desired overhead of RepCL, it identifies the region where unabridged replay is possible.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Causality Diagrams using Hybrid Vector Clocks
Authors:
Ishaan Lagwankar,
Kanishka Wijewardena
Abstract:
Causality in distributed systems is a concept that has long been explored and numerous approaches have been made to use causality as a way to trace distributed system execution. Traditional approaches usually used system profiling and newer approaches profiled clocks of systems to detect failures and construct timelines that caused those failures. Since the advent of logical clocks, these profiles…
▽ More
Causality in distributed systems is a concept that has long been explored and numerous approaches have been made to use causality as a way to trace distributed system execution. Traditional approaches usually used system profiling and newer approaches profiled clocks of systems to detect failures and construct timelines that caused those failures. Since the advent of logical clocks, these profiles have become more and more accurate with ways to characterize concurrency and distributions, with accurate diagrams for message passing. Vector clocks addressed the shortcomings of using traditional logical clocks, by storing information about other processes in the system as well. Hybrid vector clocks are a novel approach to this concept where clocks need not store all the process information. Rather, we store information of processes within an acceptable skew of the focused process. This gives us an efficient way of profiling with substantially reduced costs to the system. Building on this idea, we propose the idea of building causal traces using information generated from the hybrid vector clock. The hybrid vector clock would provide us with a strong sense of concurrency and distribution, and we theorize that all the information generated from the clock is sufficient to develop a causal trace for debugging. We post-process and parse the clocks generated from an execution trace to develop a swimlane on a web interface, that traces the points of failure of a distributed system. We also provide an API to reuse this concept for any generic distributed system framework.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Testing new physics models with global comparisons to collider measurements: the Contur toolkit
Authors:
A. Buckley,
J. M. Butterworth,
L. Corpe,
M. Habedank,
D. Huang,
D. Yallup,
M. Altakach,
G. Bassman,
I. Lagwankar,
J. Rocamonde,
H. Saunders,
B. Waugh,
G. Zilgalvis
Abstract:
Measurements at particle collider experiments, even if primarily aimed at understanding Standard Model processes, can have a high degree of model independence, and implicitly contain information about potential contributions from physics beyond the Standard Model. The Contur package allows users to benefit from the hundreds of measurements preserved in the Rivet library to test new models against…
▽ More
Measurements at particle collider experiments, even if primarily aimed at understanding Standard Model processes, can have a high degree of model independence, and implicitly contain information about potential contributions from physics beyond the Standard Model. The Contur package allows users to benefit from the hundreds of measurements preserved in the Rivet library to test new models against the bank of LHC measurements to date. This method has proven to be very effective in several recent publications from the Contur team, but ultimately, for this approach to be successful, the authors believe that the Contur tool needs to be accessible to the wider high energy physics community. As such, this manual accompanies the first user-facing version: Contur v2. It describes the design choices that have been made, as well as detailing pitfalls and common issues to avoid. The authors hope that with the help of this documentation, external groups will be able to run their own Contur studies, for example when proposing a new model, or pitching a new search.
△ Less
Submitted 19 August, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.