-
Geometric Covering using Random Fields
Authors:
Felipe Goncalves,
Daniel Keren,
Amit Shahar,
Gal Yehuda
Abstract:
A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_1,\varepsilon)$-clusterable if there are $k_1$ balls of radius $\varepsilon$ that cover $S$. A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_2,δ)$-far from being clusterable if there are at least $k_2$ vectors in $S$, with all pairwise distances at least $δ$. We propose a probabilistic algorithm to distinguish between these two cases. Our algor…
▽ More
A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_1,\varepsilon)$-clusterable if there are $k_1$ balls of radius $\varepsilon$ that cover $S$. A set of vectors $S \subseteq \mathbb{R}^d$ is $(k_2,δ)$-far from being clusterable if there are at least $k_2$ vectors in $S$, with all pairwise distances at least $δ$. We propose a probabilistic algorithm to distinguish between these two cases. Our algorithm reaches a decision by only looking at the extreme values of a scalar valued hash function, defined by a random field, on $S$; hence, it is especially suitable in distributed and online settings. An important feature of our method is that the algorithm is oblivious to the number of vectors: in the online setting, for example, the algorithm stores only a constant number of scalars, which is independent of the stream length.
We introduce random field hash functions, which are a key ingredient in our paradigm. Random field hash functions generalize locality-sensitive hashing (LSH). In addition to the LSH requirement that ``nearby vectors are hashed to similar values", our hash function also guarantees that the ``hash values are (nearly) independent random variables for distant vectors". We formulate necessary conditions for the kernels which define the random fields applied to our problem, as well as a measure of kernel optimality, for which we provide a bound. Then, we propose a method to construct kernels which approximate the optimal one.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Adaptive Communication Bounds for Distributed Online Learning
Authors:
Michael Kamp,
Mario Boley,
Michael Mock,
Daniel Keren,
Assaf Schuster,
Izchak Sharfman
Abstract:
We consider distributed online learning protocols that control the exchange of information between local learners in a round-based learning scenario. The learning performance of such a protocol is intuitively optimal if approximately the same loss is incurred as in a hypothetical serial setting. If a protocol accomplishes this, it is inherently impossible to achieve a strong communication bound at…
▽ More
We consider distributed online learning protocols that control the exchange of information between local learners in a round-based learning scenario. The learning performance of such a protocol is intuitively optimal if approximately the same loss is incurred as in a hypothetical serial setting. If a protocol accomplishes this, it is inherently impossible to achieve a strong communication bound at the same time. In the worst case, every input is essential for the learning performance, even for the serial setting, and thus needs to be exchanged between the local learners. However, it is reasonable to demand a bound that scales well with the hardness of the serialized prediction problem, as measured by the loss received by a serial online learning algorithm. We provide formal criteria based on this intuition and show that they hold for a simplified version of a previously published protocol.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.
-
Efficient Detection of Complex Event Patterns Using Lazy Chain Automata
Authors:
Ilya Kolchinsky,
Assaf Schuster,
Danny Keren
Abstract:
Complex Event Processing (CEP) is an emerging field with important applications in many areas. CEP systems collect events arriving from input data streams and use them to infer more complex events according to predefined patterns. The Non-deterministic Finite Automaton (NFA) is one of the most popular mechanisms on which such systems are based. During the event detection process, NFAs incrementall…
▽ More
Complex Event Processing (CEP) is an emerging field with important applications in many areas. CEP systems collect events arriving from input data streams and use them to infer more complex events according to predefined patterns. The Non-deterministic Finite Automaton (NFA) is one of the most popular mechanisms on which such systems are based. During the event detection process, NFAs incrementally extend previously observed partial matches until a full match for the query is found. As a result, each arriving event needs to be processed to determine whether a new partial match is to be initiated or an existing one extended. This method may be highly inefficient when many of the events do not result in output matches. We present a lazy evaluation mechanism that defers processing of frequent event types and stores them internally upon arrival. Events are then matched in ascending order of frequency, thus minimizing potentially redundant computations. We introduce a Lazy Chain NFA, which utilizes the above principle, and does not depend on the underlying pattern structure. An algorithm for constructing a Lazy Chain NFA for common pattern types is presented, including conjunction, negation and iteration. Finally, we experimentally evaluate our mechanism on real-world stock trading data. The results demonstrate a performance gain of two orders of magnitude over traditional NFA-based approaches, with significantly reduced memory resource requirements.
△ Less
Submitted 2 July, 2018; v1 submitted 15 December, 2016;
originally announced December 2016.