Showing 1–2 of 2 results for author: Chulkov, A

Search v0.5.6 released 2020-02-24

arXiv:2403.03751 [pdf, other]

cs.SE

Trigram-Based Persistent IDE Indices with Quick Startup

Authors: Zakhar Iakovlev, Alexey Chulkov, Nikita Golikov, Vyacheslav Lukianov, Nikita Zinoviev, Dmitry Ivanov, Vitaly Aksenov

Abstract: One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these fi… ▽ More One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these files. However, in a code repository, the trigram index evolves across different versions. Upon checking out a new version, this index is typically built from scratch, which is a time-consuming task, while we want our index to have almost zero-time startup. Thus, we explore the persistent version of a trigram index for full-text and key word patterns search. Our approach just uses the current version of the trigram index and applies only the changes between versions during checkout, significantly enhancing performance. Furthermore, we extend our data structure to accommodate CamelHump search for class and function names. △ Less

Submitted 6 March, 2024; originally announced March 2024.
arXiv:2306.03272 [pdf, other]

cs.DC

Better Write Amplification for Streaming Data Processing

Authors: Andrei Chulkov, Maxim Akhmedov

Abstract: Many current applications have to perform data processing in a streaming fashion. Doing so at a large scale requires a parallel system that must be equipped to handle straggling workers and different kinds of failures. YT is the main driver behind distributed systems at Yandex, home to its distributed file system, lock service, key-value storage, and internal MapReduce platform. We implement a new… ▽ More Many current applications have to perform data processing in a streaming fashion. Doing so at a large scale requires a parallel system that must be equipped to handle straggling workers and different kinds of failures. YT is the main driver behind distributed systems at Yandex, home to its distributed file system, lock service, key-value storage, and internal MapReduce platform. We implement a new component of this system designed for performing streaming MapReduce operations, utilizing different core YT solutions to achieve fault-tolerance and exactly-once semantics while maintaining efficiency and low write amplification factors. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: YT is now openly available as YTSaurus: see github.com/ytsaurus/ytsaurus

Search v0.5.6 released 2020-02-24