Search | arXiv e-print repository

Intriguing Properties of Modern GANs

Abstract: Modern GANs achieve remarkable performance in terms of generating realistic and diverse samples. This has led many to believe that ``GANs capture the training data manifold''. In this work we show that this interpretation is wrong. We empirically show that the manifold learned by modern GANs does not fit the training distribution: specifically the manifold does not pass through the training exampl… ▽ More Modern GANs achieve remarkable performance in terms of generating realistic and diverse samples. This has led many to believe that ``GANs capture the training data manifold''. In this work we show that this interpretation is wrong. We empirically show that the manifold learned by modern GANs does not fit the training distribution: specifically the manifold does not pass through the training examples and passes closer to out-of-distribution images than to in-distribution images. We also investigate the distribution over images implied by the prior over the latent codes and study whether modern GANs learn a density that approximates the training distribution. Surprisingly, we find that the learned density is very far from the data distribution and that GANs tend to assign higher density to out-of-distribution images. Finally, we demonstrate that the set of images used to train modern GANs are often not part of the typical set described by the GANs' distribution. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.05535 [pdf, other]

On Optimizing Deterministic Concurrent Scheduling for Smart Contracts and Blockchains

Authors: Yaron Hay, Roy Friedman

Abstract: Executing smart contracts is a compute and storage-intensive task, which currently dominates modern blockchain's performance. Given that computers are becoming increasingly multicore, concurrency is an attractive approach to improve programs' execution runtime. A unique challenge of blockchains is that all replicas (minors or validators) must execute all smart contracts in the same logical order t… ▽ More Executing smart contracts is a compute and storage-intensive task, which currently dominates modern blockchain's performance. Given that computers are becoming increasingly multicore, concurrency is an attractive approach to improve programs' execution runtime. A unique challenge of blockchains is that all replicas (minors or validators) must execute all smart contracts in the same logical order to maintain the semantics of State Machine Replication (SMR). While non-conflicting transactions can be executed in any actual order, replicas need to enforce a unique logical order among all pairs of conflicting transactions. In this work, we formally study the maximal level of parallelism obtainable when focusing on the conflict graphs between transactions packaged in the same block, rather than relying on the total ordering order. To that end, we describe a generic framework for Active State Machine Replication (ASMR) that is strictly serializable. The generic framework allows for shifting our focus to develo** efficient execution engines for transactions without introducing non-deterministic results. Then, we suggest the concept of graph scheduling, and the minimal latency scheduling problem, which we prove to be NP-Hard. We show that the restricted version of the problem for homogeneous transactions is equivalent to the classic Graph Vertex Coloring Problem, yet the heterogenous case is more complex. We discuss practical implications of these results. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 68 pages, 31 figures, LaTeX with Auxiliary Files, short single line

arXiv:2309.03045 [pdf, other]

An Evaluation of Software Sketches

Authors: Roy Friedman

Abstract: This work presents a detailed evaluation of Rust (software) implementations of several popular sketching solutions, as well as recently proposed optimizations. We compare these solutions in terms of computational speed, memory consumption, and several approximation error metrics. Overall, we find a simple hashing based solution employed with the Nitro sampling technique [22] gives the best trade-o… ▽ More This work presents a detailed evaluation of Rust (software) implementations of several popular sketching solutions, as well as recently proposed optimizations. We compare these solutions in terms of computational speed, memory consumption, and several approximation error metrics. Overall, we find a simple hashing based solution employed with the Nitro sampling technique [22] gives the best trade-off between memory, error and speed. Our findings also include some novel insights about how to best combine sampling with Counting Cuckoo filters depending on the application. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2305.01628 [pdf, other]

The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers

Authors: Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim, Eyal Shnarch

Abstract: Applying language models to natural language processing tasks typically relies on the representations in the final model layer, as intermediate hidden layer representations are presumed to be less informative. In this work, we argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference. Spec… ▽ More Applying language models to natural language processing tasks typically relies on the representations in the final model layer, as intermediate hidden layer representations are presumed to be less informative. In this work, we argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference. Specifically, in choosing between the probable next token predictions of a generative model, the predictions of lower layers can be used to highlight which candidates are best avoided. We propose a novel approach that utilizes the contrast between layers to improve text generation outputs, and show that it mitigates degenerative behaviors of the model in open-ended generation, significantly improving the quality of generated texts. Furthermore, our results indicate that contrasting between model layers at inference time can yield substantial benefits to certain aspects of general language model capabilities, more effectively extracting knowledge during inference from a given set of model parameters. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 9 pages, 8 figures; To be published in ACL 2023

arXiv:2206.13367 [pdf, other]

Multilevel Bidirectional Cache Filter

Authors: Ohad Eytan, Roy Friedman

Abstract: Modern caches are often required to handle a massive amount of data, which exceeds the amount of available memory; thus, hybrid caches, specifically DRAM/SSD combination, become more and more prevalent. In such environments, in addition to the classical hit-ratio target, saving writes to the second-level cache is a dominant factor to avoid write amplification and wear out, two notorious phenomena… ▽ More Modern caches are often required to handle a massive amount of data, which exceeds the amount of available memory; thus, hybrid caches, specifically DRAM/SSD combination, become more and more prevalent. In such environments, in addition to the classical hit-ratio target, saving writes to the second-level cache is a dominant factor to avoid write amplification and wear out, two notorious phenomena of SSD. This paper presents BiDiFilter, a novel multilevel caching scheme that controls demotions and promotions between cache levels using a frequency sketch filter. Further, it splits the higher cache level into two areas to keep the most recent and the most frequent items close to the user. We conduct an extensive evaluation over real-world traces, comparing to previous multilevel policies. We show that using our mechanism yields an x10 saving of writes in almost all cases and often improving latencies by up to 20%. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2205.12240 [pdf, other]

VIRATrustData: A Trust-Annotated Corpus of Human-Chatbot Conversations About COVID-19 Vaccines

Authors: Roni Friedman, João Sedoc, Shai Gretz, Assaf Toledo, Rose Weeks, Naor Bar-Zeev, Yoav Katz, Noam Slonim

Abstract: Public trust in medical information is crucial for successful application of public health policies such as vaccine uptake. This is especially true when the information is offered remotely, by chatbots, which have become increasingly popular in recent years. Here, we explore the challenging task of human-bot turn-level trust classification. We rely on a recently released data of observationally-co… ▽ More Public trust in medical information is crucial for successful application of public health policies such as vaccine uptake. This is especially true when the information is offered remotely, by chatbots, which have become increasingly popular in recent years. Here, we explore the challenging task of human-bot turn-level trust classification. We rely on a recently released data of observationally-collected (rather than crowdsourced) dialogs with VIRA chatbot, a COVID-19 Vaccine Information Resource Assistant. These dialogs are centered around questions and concerns about COVID-19 vaccines, where trust is particularly acute. We annotated $3k$ VIRA system-user conversational turns for Low Institutional Trust or Low Agent Trust vs. Neutral or High Trust. We release the labeled dataset, VIRATrustData, the first of its kind to the best of our knowledge. We demonstrate how this task is non-trivial and compare several models that predict the different levels of trust. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2205.11966 [pdf, other]

Benchmark Data and Evaluation Framework for Intent Discovery Around COVID-19 Vaccine Hesitancy

Authors: Shai Gretz, Assaf Toledo, Roni Friedman, Dan Lahav, Rose Weeks, Naor Bar-Zeev, João Sedoc, Pooja Sangha, Yoav Katz, Noam Slonim

Abstract: The COVID-19 pandemic has made a huge global impact and cost millions of lives. As COVID-19 vaccines were rolled out, they were quickly met with widespread hesitancy. To address the concerns of hesitant people, we launched VIRA, a public dialogue system aimed at addressing questions and concerns surrounding the COVID-19 vaccines. Here, we release VIRADialogs, a dataset of over 8k dialogues conduct… ▽ More The COVID-19 pandemic has made a huge global impact and cost millions of lives. As COVID-19 vaccines were rolled out, they were quickly met with widespread hesitancy. To address the concerns of hesitant people, we launched VIRA, a public dialogue system aimed at addressing questions and concerns surrounding the COVID-19 vaccines. Here, we release VIRADialogs, a dataset of over 8k dialogues conducted by actual users with VIRA, providing a unique real-world conversational dataset. In light of rapid changes in users' intents, due to updates in guidelines or in response to new information, we highlight the important task of intent discovery in this use-case. We introduce a novel automatic evaluation framework for intent discovery, leveraging the existing intent classifier of VIRA. We use this framework to report baseline intent discovery results over VIRADialogs, that highlight the difficulty of this task. △ Less

Submitted 11 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2203.04803 [pdf, other]

Limited Associativity Caching in the Data Plane

Authors: Roy Friedman, Or Goaz, Dor Hovav

Abstract: In-network caching promises to improve the performance of networked and edge applications as it shortens the paths data need to travel. This is by storing so-called hot items in the network switches on-route between clients who access the data and the storage servers who maintain it. Since the data flows through those switches in any case, it is natural to cache hot items there. Most software-ma… ▽ More In-network caching promises to improve the performance of networked and edge applications as it shortens the paths data need to travel. This is by storing so-called hot items in the network switches on-route between clients who access the data and the storage servers who maintain it. Since the data flows through those switches in any case, it is natural to cache hot items there. Most software-managed caches treat the cache as a fully associative region. Alas, a fully associative design seems to be at odds with programmable switches' goal of handling packets in a short bounded amount of time, as well as their restricted programming model. In this work, we present PKache, a generic limited associativity cache implementation in the programmable switches' domain-specific P4 language, and demonstrate its utility by realizing multiple popular cache management schemes. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2201.01958 [pdf, other]

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Authors: Rana Shahout, Roy Friedman, Ran Ben Basat

Abstract: Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this w… ▽ More Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID. Existing quantile sketches are designed for a single number stream (e.g., containing just the latency). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand. We first present a simple sampling algorithm that serves as a benchmark. Then, we design an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee. Finally, we present SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the latencies of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. Our solutions are rigorously analyzed, and we demonstrate the superiority of our approach using extensive simulations. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2110.10577 [pdf, other]

Overview of the 2021 Key Point Analysis Shared Task

Authors: Roni Friedman, Lena Dankin, Yufang Hou, Ranit Aharonov, Yoav Katz, Noam Slonim

Abstract: We describe the 2021 Key Point Analysis (KPA-2021) shared task on key point analysis that we organized as a part of the 8th Workshop on Argument Mining (ArgMining 2021) at EMNLP 2021. We outline various approaches and discuss the results of the shared task. We expect the task and the findings reported in this paper to be relevant for researchers working on text summarization and argument mining. We describe the 2021 Key Point Analysis (KPA-2021) shared task on key point analysis that we organized as a part of the 8th Workshop on Argument Mining (ArgMining 2021) at EMNLP 2021. We outline various approaches and discuss the results of the shared task. We expect the task and the findings reported in this paper to be relevant for researchers working on text summarization and argument mining. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2109.03021 [pdf, other]

Limited Associativity Makes Concurrent Software Caches a Breeze

Authors: Dolev Adas, Gil Einziger, Roy Friedman

Abstract: Software caches optimize the performance of diverse storage systems, databases and other software systems. Existing works on software caches automatically resort to fully associative cache designs. Our work shows that limited associativity caches are a promising direction for concurrent software caches. Specifically, we demonstrate that limited associativity enables simple yet efficient realizatio… ▽ More Software caches optimize the performance of diverse storage systems, databases and other software systems. Existing works on software caches automatically resort to fully associative cache designs. Our work shows that limited associativity caches are a promising direction for concurrent software caches. Specifically, we demonstrate that limited associativity enables simple yet efficient realizations of multiple cache management schemes that can be trivially parallelized. We show that the obtained hit ratio is usually similar to fully associative caches of the same management policy, but the throughput is improved by up to X5 compared to production-grade caching libraries, especially in multi-threaded executions. △ Less

Submitted 19 July, 2021; originally announced September 2021.

arXiv:2108.12287 [pdf, ps, other]

Evaluation of individual attributes associated with shared HIV risk behaviors among two network-based studies of people who inject drugs

Authors: Valerie Ryan, TingFang Lee, Ashley L. Buchanan, Natallia V. Katenka, Samuel R. Friedman, Georgios Nikolopoulos

Abstract: Social context plays an important role in perpetuating or reducing HIV risk behaviors. This study analyzed the network and individual attributes that were associated with the likelihood that people who inject drugs (PWID) will engage in HIV risk behaviors with one another. We analyze data collected in the Social Risk Factors and HIV Risk Study (SFHR) and Transmission Reduction Intervention Project… ▽ More Social context plays an important role in perpetuating or reducing HIV risk behaviors. This study analyzed the network and individual attributes that were associated with the likelihood that people who inject drugs (PWID) will engage in HIV risk behaviors with one another. We analyze data collected in the Social Risk Factors and HIV Risk Study (SFHR) and Transmission Reduction Intervention Project (TRIP) to perform the analysis. Exponential random graph models were used to determine which attributes were associated with the likelihood of people engaging in HIV risk behaviors, such as injection behaviors that are associated with one another, among PWID. Results across all models and across both data sets indicated that people were more likely to engage in risk behaviors with others who were similar to them in some way (e.g., were the same sex, race/ethnicity, living conditions). In both SFHR and TRIP, we explore the effects of missingness at individual and network levels on the likelihood of individuals to engage in HIV risk behaviors among PWID. In this study, we found that known individual-level risk factors, including housing instability and race/ethnicity, are also important factors in determining the structure of the observed network among PWID. Future development of interventions should consider not only individual risk factors, but communities and social influences leaving individuals vulnerable to HIV risk. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: 19 pages

arXiv:2107.12788 [pdf, other]

On the data persistency of replicated erasure codes in distributed storage systems

Authors: Roy Friedman, Rafał Kapelko, Karol Marchwicki

Abstract: This paper studies the fundamental problem of data persistency for a general family of redundancy schemes in distributed storage systems, called replicated erasure codes. Namely, we analyze two strategies of replicated erasure codes distribution: random and symmetric. For both strategies we derive closed analytical and asymptotic formulas for expected data persistency despite nodes failure. This paper studies the fundamental problem of data persistency for a general family of redundancy schemes in distributed storage systems, called replicated erasure codes. Namely, we analyze two strategies of replicated erasure codes distribution: random and symmetric. For both strategies we derive closed analytical and asymptotic formulas for expected data persistency despite nodes failure. △ Less

Submitted 27 July, 2021; originally announced July 2021.

arXiv:2106.06758 [pdf, other]

Every Bite Is an Experience: Key Point Analysis of Business Reviews

Authors: Roy Bar-Haim, Lilach Eden, Yoav Kantor, Roni Friedman, Noam Slonim

Abstract: Previous work on review summarization focused on measuring the sentiment toward the main aspects of the reviewed product or business, or on creating a textual summary. These approaches provide only a partial view of the data: aspect-based sentiment summaries lack sufficient explanation or justification for the aspect rating, while textual summaries do not quantify the significance of each element,… ▽ More Previous work on review summarization focused on measuring the sentiment toward the main aspects of the reviewed product or business, or on creating a textual summary. These approaches provide only a partial view of the data: aspect-based sentiment summaries lack sufficient explanation or justification for the aspect rating, while textual summaries do not quantify the significance of each element, and are not well-suited for representing conflicting views. Recently, Key Point Analysis (KPA) has been proposed as a summarization framework that provides both textual and quantitative summary of the main points in the data. We adapt KPA to review data by introducing Collective Key Point Mining for better key point extraction; integrating sentiment analysis into KPA; identifying good key point candidates for review summaries; and leveraging the massive amount of available reviews and their metadata. We show empirically that these novel extensions of KPA substantially improve its performance. We demonstrate that promising results can be achieved without any domain-specific annotation, while human supervision can lead to further improvement. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: ACL-IJCNLP 2021

arXiv:2105.08770 [pdf, other]

Lightweight Robust Size Aware Cache Management

Authors: Gil Einziger, Ohad Eytan, Roy Friedman, Benjamin Manes

Abstract: Modern key-value stores, object stores, Internet proxy caches, as well as Content Delivery Networks (CDN) often manage objects of diverse sizes, e.g., blobs, video files of different lengths, images with varying resolution, and small documents. In such workloads, size-aware cache policies outperform size-oblivious algorithms. Unfortunately, existing size-aware algorithms tend to be overly complica… ▽ More Modern key-value stores, object stores, Internet proxy caches, as well as Content Delivery Networks (CDN) often manage objects of diverse sizes, e.g., blobs, video files of different lengths, images with varying resolution, and small documents. In such workloads, size-aware cache policies outperform size-oblivious algorithms. Unfortunately, existing size-aware algorithms tend to be overly complicated and computationally~expensive. Our work follows a more approachable pattern; we extend the prevalent (size-oblivious) TinyLFU cache admission policy to handle variable sized items. Implementing our approach inside two popular caching libraries only requires minor changes. We show that our algorithms yield competitive or better hit-ratios and byte hit-ratios compared to the state of the art size-aware algorithms such as AdaptSize, LHD, LRB, and GDSF. Further, a runtime comparison indicates that our implementation is faster by up to x3 compared to the best alternative, i.e., it imposes much lower CPU overhead. △ Less

Submitted 23 May, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

arXiv:2104.09895 [pdf, other]

Posterior Sampling for Image Restoration using Explicit Patch Priors

Authors: Roy Friedman, Yair Weiss

Abstract: Almost all existing methods for image restoration are based on optimizing the mean squared error (MSE), even though it is known that the best estimate in terms of MSE may yield a highly atypical image due to the fact that there are many plausible restorations for a given noisy image. In this paper, we show how to combine explicit priors on patches of natural images in order to sample from the post… ▽ More Almost all existing methods for image restoration are based on optimizing the mean squared error (MSE), even though it is known that the best estimate in terms of MSE may yield a highly atypical image due to the fact that there are many plausible restorations for a given noisy image. In this paper, we show how to combine explicit priors on patches of natural images in order to sample from the posterior probability of a full image given a degraded image. We prove that our algorithm generates correct samples from the distribution $p(x|y) \propto \exp(-E(x|y))$ where $E(x|y)$ is the cost function minimized in previous patch-based approaches that compute a single restoration. Unlike previous approaches that computed a single restoration using MAP or MMSE, our method makes explicit the uncertainty in the restored images and guarantees that all patches in the restored images will be typical given the patch prior. Unlike previous approaches that used implicit priors on fixed-size images, our approach can be used with images of any size. Our experimental results show that posterior sampling using patch priors yields images of high perceptual quality and high PSNR on a range of challenging image restoration problems. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2103.14071 [pdf, other]

Accelerating Big-Data Sorting Through Programmable Switches

Authors: Yamit Barshatz-Schneor, Roy Friedman

Abstract: Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most popular sorting algorithm in databases is merge sort. In modern data-centers, data is stored in storage servers, while processing takes place in compute servers.… ▽ More Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most popular sorting algorithm in databases is merge sort. In modern data-centers, data is stored in storage servers, while processing takes place in compute servers. Hence, in order to compute queries on the data, it must travel through the network from the storage servers to the compute servers. This creates a potential for utilizing programmable switches to perform partial sorting in order to accelerate the sorting process at the server side. This is possible because, as mentioned above, data packets pass through the switch in any case on their way to the server. Alas, programmable switches offer a very restricted and non-intuitive programming model, which is why realizing this is not-trivial. We devised a novel partial sorting algorithm that fits the programming model and restrictions of programmable switches and can expedite merge sort at the server. We also utilize built-in parallelism in the switch to divide the data into sequential ranges. Thus, the server needs to sort each range separately and then concatenate them to one sorted stream. This way, the server needs to sort smaller sections and each of these sections is already partially sorted. Hence, the server does less work, and the access pattern becomes more virtual-memory friendly. We evaluated the performance improvements obtained when utilizing our partial sorting algorithm over several data stream compositions with various switch configurations. Our study exhibits an improvement of 20%-75% in the sorting run-time when using our approach compared to plain sorting on the original stream. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2010.14189 [pdf, other]

Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

Authors: Dolev Adas, Roy Friedman

Abstract: In applications such as sharded data processing systems, sharded in-memory key-value stores, data flow programming and load sharing applications, multiple concurrent data producers are feeding requests into the same data consumer. This can be naturally realized through concurrent queues, where each consumer pulls its tasks from its dedicated queue. For scalability, wait-free queues are often prefe… ▽ More In applications such as sharded data processing systems, sharded in-memory key-value stores, data flow programming and load sharing applications, multiple concurrent data producers are feeding requests into the same data consumer. This can be naturally realized through concurrent queues, where each consumer pulls its tasks from its dedicated queue. For scalability, wait-free queues are often preferred over lock based structures. The vast majority of wait-free queue implementations, and even lock-free ones, support the multi-producer multi-consumer model. Yet, this comes at a premium, since implementing wait-free multi-producer multi-consumer queues requires utilizing complex helper data structures. The latter increases the memory consumption of such queues and limits their performance and scalability. Additionally, many such designs employ (hardware) cache unfriendly memory access patterns. In this work we study the implementation of wait-free multi-producer single-consumer queues. Specifically, we propose Jiffy, an efficient memory frugal novel wait-free multi-producer single-consumer queue and formally prove its correctness. We then compare the performance and memory requirements of Jiffy with other state of the art lock-free and wait-free queues. We show that indeed Jiffy can maintain good performance with up to 128 threads, delivers up to 50% better throughput than the next best construction we compared against, and consumes ~90% less memory. △ Less

Submitted 2 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2010.05369 [pdf, other]

Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Authors: Roy Bar-Haim, Yoav Kantor, Lilach Eden, Roni Friedman, Dan Lahav, Noam Slonim

Abstract: When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed to summarize arguments by map** them to a small set of expe… ▽ More When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed to summarize arguments by map** them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments. The current work advances key point analysis in two important respects: first, we develop a method for automatic extraction of key points, which enables fully automatic analysis, and is shown to achieve performance comparable to a human expert. Second, we demonstrate that the applicability of key point analysis goes well beyond argumentation data. Using models trained on publicly available argumentation datasets, we achieve promising results in two additional domains: municipal surveys and user reviews. An additional contribution is an in-depth evaluation of argument-to-key point matching models, where we substantially outperform previous results. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2005.01619 [pdf, other]

From Arguments to Key Points: Towards Automatic Argument Summarization

Authors: Roy Bar-Haim, Lilach Eden, Roni Friedman, Yoav Kantor, Dan Lahav, Noam Slonim

Abstract: Generating a concise summary from a large collection of arguments on a given topic is an intriguing yet understudied problem. We propose to represent such summaries as a small set of talking points, termed "key points", each scored according to its salience. We show, by analyzing a large dataset of crowd-contributed arguments, that a small number of key points per topic is typically sufficient for… ▽ More Generating a concise summary from a large collection of arguments on a given topic is an intriguing yet understudied problem. We propose to represent such summaries as a small set of talking points, termed "key points", each scored according to its salience. We show, by analyzing a large dataset of crowd-contributed arguments, that a small number of key points per topic is typically sufficient for covering the vast majority of the arguments. Furthermore, we found that a domain expert can often predict these key points in advance. We study the task of argument-to-key point map**, and introduce a novel large-scale dataset for this task. We report empirical results for an extensive set of experiments with this dataset, showing promising performance. △ Less

Submitted 9 June, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: ACL 2020

arXiv:1911.11408 [pdf, other]

A Large-scale Dataset for Argument Quality Ranking: Construction and Analysis

Authors: Shai Gretz, Roni Friedman, Edo Cohen-Karlik, Assaf Toledo, Dan Lahav, Ranit Aharonov, Noam Slonim

Abstract: Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset… ▽ More Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset annotated for point-wise argument quality, larger by a factor of five than previously released datasets. Moreover, we address the core issue of inducing a labeled score from crowd annotations by performing a comprehensive evaluation of different approaches to this problem. In addition, we analyze the quality dimensions that characterize this dataset. Finally, we present a neural method for argument quality ranking, which outperforms several baselines on our own dataset, as well as previous methods published for another dataset. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Accepted to AAAI 2020

arXiv:1909.01007 [pdf, other]

Automatic Argument Quality Assessment -- New Datasets and Methods

Authors: Assaf Toledo, Shai Gretz, Edo Cohen-Karlik, Roni Friedman, Elad Venezian, Dan Lahav, Michal Jacovi, Ranit Aharonov, Noam Slonim

Abstract: We explore the task of automatic assessment of argument quality. To that end, we actively collected 6.3k arguments, more than a factor of five compared to previously examined data. Each argument was explicitly and carefully annotated for its quality. In addition, 14k pairs of arguments were annotated independently, identifying the higher quality argument in each pair. In spite of the inherent subj… ▽ More We explore the task of automatic assessment of argument quality. To that end, we actively collected 6.3k arguments, more than a factor of five compared to previously examined data. Each argument was explicitly and carefully annotated for its quality. In addition, 14k pairs of arguments were annotated independently, identifying the higher quality argument in each pair. In spite of the inherent subjective nature of the task, both annotation schemes led to surprisingly consistent results. We release the labeled datasets to the community. Furthermore, we suggest neural methods based on a recently released language model, for argument ranking as well as for argument-pair classification. In the former task, our results are comparable to state-of-the-art; in the latter task our results significantly outperform earlier methods. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: Published at EMNLP 2019

arXiv:1908.02675 [pdf, other]

A Generic Efficient Biased Optimizer for Consensus Protocols

Authors: Yehonatan Buchnik, Roy Friedman

Abstract: Consensus is one of the most fundamental distributed computing problems. In particular, it serves as a building block in many replication based fault-tolerant systems and in particular in multiple recent blockchain solutions. Depending on its exact variant and other environmental assumptions, solving consensus requires multiple communication rounds. Yet, there are known optimistic protocols that g… ▽ More Consensus is one of the most fundamental distributed computing problems. In particular, it serves as a building block in many replication based fault-tolerant systems and in particular in multiple recent blockchain solutions. Depending on its exact variant and other environmental assumptions, solving consensus requires multiple communication rounds. Yet, there are known optimistic protocols that guarantee termination in a single communication round under favorable conditions. In this paper we present a generic optimizer than can turn any consensus protocol into an optimized protocol that terminates in a single communication round whenever all nodes start with the same predetermined value and no Byzantine failures occur (although node crashes are allowed). This is regardless of the network timing assumptions and additional oracle capabilities assumed by the base consensus protocol being optimized. In the case of benign failures, our optimizer works whenever the number of faulty nodes $f<n/2$. For Byzantine behavior, our optimizer's resiliency depends on the validity variant sought. In the case of classical validity, it can accommodate $f<n/4$ Byzantine failures. With the more recent external validity function assumption, it works whenever $f<n/3$. Either way, our optimizer only relies on oral messages, thereby imposing very light-weight crypto requirements. △ Less

Submitted 7 August, 2019; originally announced August 2019.

arXiv:1901.03279 [pdf, other]

FireLedger: A High Throughput Blockchain Consensus Protocol

Authors: Yehonatan Buchnik, Roy Friedman

Abstract: Blockchains are distributed secure ledgers to which transactions are issued continuously and each block of transactions is tightly coupled to its predecessors. Permissioned blockchains place special emphasis on transactions throughput. In this paper we present FireLedger, which leverages the iterative nature of blockchains in order to improve their throughput in optimistic execution scenarios. Fir… ▽ More Blockchains are distributed secure ledgers to which transactions are issued continuously and each block of transactions is tightly coupled to its predecessors. Permissioned blockchains place special emphasis on transactions throughput. In this paper we present FireLedger, which leverages the iterative nature of blockchains in order to improve their throughput in optimistic execution scenarios. FireLedger trades latency for throughput in the sense that in FireLedger the last f + 1 blocks of each node's blockchain are considered tentative, i.e., they may be rescinded in case one of the last f + 1 blocks proposers was Byzantine. Yet, when optimistic assumptions are met, a new block is decided in each communication step, which consists of a proposer that sends only its proposal and all other participants are sending a single bit each. Our performance study demonstrates that in a single Amazon data-center, FireLedger running on 10 mid-range Amazon nodes obtains a throughput of up to 160K transactions per second for (typical Bitcoin size) 512 bytes transactions. In a 10 nodes Amazon geo-distributed setting with 512 bytes transactions, FireLedger obtains a throughput of 30K tps. Moreover, on higher end Amazon machines, FireLedger obtains $20%-600%$ better throughput than state of the art protocols like HotStuff and BFT-SMaRt, depending on the exact configuration. △ Less

Submitted 1 November, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

Comments: The name of the protocol was changed from TOY to FireLedger. Protocol presentation and related work sections were improved and some typos were fixed

arXiv:1804.10740 [pdf, other]

Heavy Hitters over Interval Queries

Authors: Ran Ben Basat, Roy Friedman, Rana Shahout

Abstract: Heavy hitters and frequency measurements are fundamental in many networking applications such as load balancing, QoS, and network security. This paper considers a generalized sliding window model that supports frequency and heavy hitters queries over an interval given at \emph{query time}. This enables drill-down queries, in which the behavior of the network can be examined in finer and finer gran… ▽ More Heavy hitters and frequency measurements are fundamental in many networking applications such as load balancing, QoS, and network security. This paper considers a generalized sliding window model that supports frequency and heavy hitters queries over an interval given at \emph{query time}. This enables drill-down queries, in which the behavior of the network can be examined in finer and finer granularities. For this model, we asymptotically improve the space bounds of existing work, reduce the update and query time to a constant, and provide deterministic solutions. When evaluated over real Internet packet traces, our fastest algorithm processes packets $90$--$250$ times faster, serves queries at least $730$ times quicker and consumes at least $40\%$ less space than the known method. △ Less

Submitted 13 November, 2018; v1 submitted 28 April, 2018; originally announced April 2018.

arXiv:1712.01779 [pdf, other]

Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free

Authors: Eran Assaf, Ran Ben Basat, Gil Einziger, Roy Friedman

Abstract: For many networking applications, recent data is more significant than older data, motivating the need for sliding window solutions. Various capabilities, such as DDoS detection and load balancing, require insights about multiple metrics including Bloom filters, per-flow counting, count distinct and entropy estimation. In this work, we present a unified construction that solves all the above pro… ▽ More For many networking applications, recent data is more significant than older data, motivating the need for sliding window solutions. Various capabilities, such as DDoS detection and load balancing, require insights about multiple metrics including Bloom filters, per-flow counting, count distinct and entropy estimation. In this work, we present a unified construction that solves all the above problems in the sliding window model. Our single solution offers a better space to accuracy tradeoff than the state-of-the-art for each of these individual problems! We show this both analytically and by running multiple real Internet backbone and datacenter packet traces. △ Less

Submitted 5 December, 2017; originally announced December 2017.

Comments: To appear in IEEE INFOCOM 2018

arXiv:1710.03155 [pdf, other]

Fast Flow Volume Estimation

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman

Abstract: The increasing popularity of jumbo frames means growing variance in the size of packets transmitted in modern networks. Consequently, network monitoring tools must maintain explicit traffic volume statistics rather than settle for packet counting as before. We present constant time algorithms for volume estimations in streams and sliding windows, which are faster than previous work. Our solutions… ▽ More The increasing popularity of jumbo frames means growing variance in the size of packets transmitted in modern networks. Consequently, network monitoring tools must maintain explicit traffic volume statistics rather than settle for packet counting as before. We present constant time algorithms for volume estimations in streams and sliding windows, which are faster than previous work. Our solutions are formally analyzed and are extensively evaluated over multiple real-world packet traces as well as synthetic ones. For streams, we demonstrate a run-time improvement of up to 2.4X compared to the state of the art. On sliding windows, we exhibit a memory reduction of over 100X on all traces and an asymptotic runtime improvement to a constant. Finally, we apply our approach to hierarchical heavy hitters and achieve an empirical 2.4-7X speedup. △ Less

Submitted 15 October, 2017; v1 submitted 9 October, 2017; originally announced October 2017.

Comments: To appear in ACM ICDCN 2018

arXiv:1707.06778 [pdf, other]

doi 10.1145/3098822.3098832

Constant Time Updates in Hierarchical Heavy Hitters

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo Caggiani Luizelli, Erez Waisbard

Abstract: Monitoring tasks, such as anomaly and DDoS detection, require identifying frequent flow aggregates based on common IP prefixes. These are known as \emph{hierarchical heavy hitters} (HHH), where the hierarchy is determined based on the type of prefixes of interest in a given application. The per packet complexity of existing HHH algorithms is proportional to the size of the hierarchy, imposing sign… ▽ More Monitoring tasks, such as anomaly and DDoS detection, require identifying frequent flow aggregates based on common IP prefixes. These are known as \emph{hierarchical heavy hitters} (HHH), where the hierarchy is determined based on the type of prefixes of interest in a given application. The per packet complexity of existing HHH algorithms is proportional to the size of the hierarchy, imposing significant overheads. In this paper, we propose a randomized constant time algorithm for HHH. We prove probabilistic precision bounds backed by an empirical evaluation. Using four real Internet packet traces, we demonstrate that our algorithm indeed obtains comparable accuracy and recall as previous works, while running up to 62 times faster. Finally, we extended Open vSwitch (OVS) with our algorithm and showed it is able to handle 13.8 million packets per second. In contrast, incorporating previous works in OVS only obtained 2.5 times lower throughput. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: To appear in ACM SIGCOMM 2017

arXiv:1703.01166 [pdf, other]

Give Me Some Slack: Efficient Network Measurements

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman

Abstract: Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing \emph{slack} in the window size on the asymptotic requirements of sliding window problems. That is,… ▽ More Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing \emph{slack} in the window size on the asymptotic requirements of sliding window problems. That is, the algorithm can dynamically adjust the window size between $W$ and $W(1+τ)$ where $τ$ is a small positive parameter. We demonstrate this model's attractiveness by showing that it enables efficient algorithms to problems such as MAX and GENERAL-SUM that require $Ω(W)$ bits even for constant factor approximations in the exact sliding window model. Additionally, for problems that admit sub-linear approximation algorithms such as BASIC-SUMMING and COUNT-DISTINCT, the slack model enables a further asymptotic improvement. △ Less

Submitted 24 April, 2018; v1 submitted 3 March, 2017; originally announced March 2017.

arXiv:1701.04021 [pdf, other]

Optimal Elephant Flow Detection

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman, Yaron Kassner

Abstract: Monitoring the traffic volumes of elephant flows, including the total byte count per flow, is a fundamental capability for online network measurements. We present an asymptotically optimal algorithm for solving this problem in terms of both space and time complexity. This improves on previous approaches, which can only count the number of packets in constant time. We evaluate our work on real pack… ▽ More Monitoring the traffic volumes of elephant flows, including the total byte count per flow, is a fundamental capability for online network measurements. We present an asymptotically optimal algorithm for solving this problem in terms of both space and time complexity. This improves on previous approaches, which can only count the number of packets in constant time. We evaluate our work on real packet traces, demonstrating an up to X2.5 speedup compared to the best alternative. △ Less

Submitted 15 January, 2017; originally announced January 2017.

Comments: Accepted to IEEE INFOCOM 2017

arXiv:1612.02962 [pdf, other]

Randomized Admission Policy for Efficient Top-k and Frequency Estimation

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman, Yaron Kassner

Abstract: Network management protocols often require timely and meaningful insight about per flow network traffic. This paper introduces Randomized Admission Policy (RAP) - a novel algorithm for the frequency and top-k estimation problems, which are fundamental in network monitoring. We demonstrate space reductions compared to the alternatives by a factor of up to 32 on real packet traces and up to 128 on h… ▽ More Network management protocols often require timely and meaningful insight about per flow network traffic. This paper introduces Randomized Admission Policy (RAP) - a novel algorithm for the frequency and top-k estimation problems, which are fundamental in network monitoring. We demonstrate space reductions compared to the alternatives by a factor of up to 32 on real packet traces and up to 128 on heavy-tailed workloads. For top-k identification, RAP exhibits memory savings by a factor of between 4 and 64 depending on the skew of the workload. These empirical results are backed by formal analysis, indicating the asymptotic space improvement of our probabilistic admission approach. Additionally, we present d-Way RAP, a hardware friendly variant of RAP that empirically maintains its space and accuracy benefits. △ Less

Submitted 9 December, 2016; originally announced December 2016.

Comments: Conference version accepted to IEEE INFOCOM2017

arXiv:1610.02885 [pdf, other]

Hardening Cassandra Against Byzantine Failures

Authors: Roy Friedman, Roni Licher

Abstract: Cassandra is one of the most widely used distributed data stores these days. Cassandra supports flexible consistency guarantees over a wide-column data access model and provides almost linear scale-out performance. This enables application developers to tailor the performance and availability of Cassandra to their exact application's needs and required semantics. Yet, Cassandra is designed to with… ▽ More Cassandra is one of the most widely used distributed data stores these days. Cassandra supports flexible consistency guarantees over a wide-column data access model and provides almost linear scale-out performance. This enables application developers to tailor the performance and availability of Cassandra to their exact application's needs and required semantics. Yet, Cassandra is designed to withstand benign failures, and cannot cope with most forms of Byzantine attacks. In this work, we present an analysis of Cassandra's vulnerabilities and propose protocols for hardening Cassandra against Byzantine failures. We examine several alternative design choices and compare between them both qualitatively and empirically by using the Yahoo! Cloud Serving Benchmark (YCSB) performance benchmark. We include incremental performance analysis for our algorithmic and cryptographic adjustments, supporting our design choices. △ Less

Submitted 10 October, 2016; originally announced October 2016.

arXiv:1606.01364 [pdf, other]

ICE Buckets: Improved Counter Estimation for Network Measurement

Authors: Gil Einziger, Benny Fellman, Roy Friedman, Yaron Kassner

Abstract: Measurement capabilities are essential for a variety of network applications, such as load balancing, routing, fairness and intrusion detection. These capabilities require large counter arrays in order to monitor the traffic of all network flows. While commodity SRAM memories are capable of operating at line speed, they are too small to accommodate large counter arrays. Previous works suggested es… ▽ More Measurement capabilities are essential for a variety of network applications, such as load balancing, routing, fairness and intrusion detection. These capabilities require large counter arrays in order to monitor the traffic of all network flows. While commodity SRAM memories are capable of operating at line speed, they are too small to accommodate large counter arrays. Previous works suggested estimators, which trade precision for reduced space. However, in order to accurately estimate the largest counter, these methods compromise the accuracy of the smaller counters. In this work, we present a closed form representation of the optimal estimation function. We then introduce Independent Counter Estimation Buckets (ICE-Buckets), a novel algorithm that improves estimation accuracy for all counters. This is achieved by separating the flows to buckets and configuring the optimal estimation function according to each bucket's counter scale. We prove a tighter upper bound on the relative error and demonstrate an accuracy improvement of up to 57 times on real Internet packet traces. △ Less

Submitted 4 June, 2016; originally announced June 2016.

arXiv:1604.02450 [pdf, other]

doi 10.4230/LIPIcs.SWAT.2016

Efficient Summing over Sliding Windows

Authors: Ran Ben Basat, Gil Einziger, Roy Friedman, Yaron Kassner

Abstract: This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of Ω(1/ε + log W) memory bits for Wε-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/ε + log W) bits, indicating that t… ▽ More This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of Ω(1/ε + log W) memory bits for Wε-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/ε + log W) bits, indicating that the algorithm is optimal and that the bound is tight. Next, the more general problem of maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is addressed. The paper shows that approximating the sum within an additive error of RWε can also be done using Θ(1/ε + log W) bits for ε=Ω(1/W). For ε=o(1/W), we present a succinct algorithm which uses B(1 + o(1)) bits, where B=Θ(Wlog(1/Wε)) is the derived lower bound. We show that all lower bounds generalize to randomized algorithms as well. All algorithms process new elements and answer queries in O(1) worst-case time. △ Less

Submitted 3 April, 2016; originally announced April 2016.

Comments: A shorter version appears in SWAT 2016

arXiv:1604.00641 [pdf, other]

COARA: Code Offloading on Android with AspectJ

Authors: Roy Friedman, Nir Hauser

Abstract: Smartphones suffer from limited computational capabilities and battery life. A method to mitigate these problems is code offloading: executing application code on a remote server. We introduce COARA, a middleware platform for code offloading on Android that uses aspect-oriented programming (AOP) with AspectJ. AOP allows COARA to intercept code for offloading without a customized compiler or modifi… ▽ More Smartphones suffer from limited computational capabilities and battery life. A method to mitigate these problems is code offloading: executing application code on a remote server. We introduce COARA, a middleware platform for code offloading on Android that uses aspect-oriented programming (AOP) with AspectJ. AOP allows COARA to intercept code for offloading without a customized compiler or modification of the operating system. COARA requires minimal changes to application source code, and does not require the application developer to be aware of AOP. Since state transfer to the server is often a bottleneck that hinders performance, COARA uses AOP to intercept the transmission of large objects from the client and replaces them with object proxies. The server can begin execution of the offloaded application code, regardless of whether all required objects been transferred to the server. We run COARA with Android applications from the Google Play store on a Nexus 4 running unmodified Android 4.3 to prove that our platform improves performance and reduces energy consumption. Our approach yields speedups of 24x and 6x over WiFi and 3G respectively. △ Less

Submitted 3 April, 2016; originally announced April 2016.

arXiv:1512.00727 [pdf, other]

TinyLFU: A Highly Efficient Cache Admission Policy

Authors: Gil Einziger, Roy Friedman, Ben Manes

Abstract: This paper proposes to use a frequency based cache admission policy in order to boost the effectiveness of caches subject to skewed access distributions. Given a newly accessed item and an eviction candidate from the cache, our scheme decides, based on the recent access history, whether it is worth admitting the new item into the cache at the expense of the eviction candidate. Realizing this con… ▽ More This paper proposes to use a frequency based cache admission policy in order to boost the effectiveness of caches subject to skewed access distributions. Given a newly accessed item and an eviction candidate from the cache, our scheme decides, based on the recent access history, whether it is worth admitting the new item into the cache at the expense of the eviction candidate. Realizing this concept is enabled through a novel approximate LFU structure called TinyLFU, which maintains an approximate representation of the access frequency of a large sample of recently accessed items. TinyLFU is very compact and light-weight as it builds upon Bloom filter theory. We study the properties of TinyLFU through simulations of both synthetic workloads as well as multiple real traces from several sources. These simulations demonstrate the performance boost obtained by enhancing various replacement policies with the TinyLFU eviction policy. Also, a new combined replacement and eviction policy scheme nicknamed W-TinyLFU is presented. W-TinyLFU is demonstrated to obtain equal or better hit-ratios than other state of the art replacement policies on these traces. It is the only scheme to obtain such good results on all traces. △ Less

Submitted 3 December, 2015; v1 submitted 2 December, 2015; originally announced December 2015.

Comments: A much earlier and shorter version of this work appeared in the Euromicro PDP 2014 conference

arXiv:1411.6478 [pdf, other]

Fisheye Consistency: Kee** Data in Synch in a Georeplicated World

Authors: Roy Friedman, Michel Raynal, François Taïani

Abstract: Over the last thirty years, numerous consistency conditions for replicated data have been proposed and implemented. Popular examples of such conditions include linearizability (or atomicity), sequential consistency, causal consistency, and eventual consistency. These consistency conditions are usually defined independently from the computing entities (nodes) that manipulate the replicated data; i.… ▽ More Over the last thirty years, numerous consistency conditions for replicated data have been proposed and implemented. Popular examples of such conditions include linearizability (or atomicity), sequential consistency, causal consistency, and eventual consistency. These consistency conditions are usually defined independently from the computing entities (nodes) that manipulate the replicated data; i.e., they do not take into account how computing entities might be linked to one another, or geographically distributed. To address this lack, as a first contribution, this paper introduces the notion of proximity graph between computing nodes. If two nodes are connected in this graph, their operations must satisfy a strong consistency condition, while the operations invoked by other nodes are allowed to satisfy a weaker condition. The second contribution is the use of such a graph to provide a generic approach to the hybridization of data consistency conditions into the same system. We illustrate this approach on sequential consistency and causal consistency, and present a model in which all data operations are causally consistent, while operations by neighboring processes in the proximity graph are sequentially consistent. The third contribution of the paper is the design and the proof of a distributed algorithm based on this proximity graph, which combines sequential consistency and causal consistency (the resulting condition is called fisheye consistency). In doing so the paper not only extends the domain of consistency conditions, but provides a generic provably correct solution of direct relevance to modern georeplicated systems. △ Less

Submitted 22 October, 2015; v1 submitted 24 November, 2014; originally announced November 2014.

arXiv:0807.1253 [pdf, ps, other]

doi 10.1098/rspa.2008.0465

Informed Traders

Authors: Dorje C. Brody, Mark H. A. Davis, Robyn L. Friedman, Lane P. Hughston

Abstract: An asymmetric information model is introduced for the situation in which there is a small agent who is more susceptible to the flow of information in the market than the general market participant, and who tries to implement strategies based on the additional information. In this model market participants have access to a stream of noisy information concerning the future return of an asset, wher… ▽ More An asymmetric information model is introduced for the situation in which there is a small agent who is more susceptible to the flow of information in the market than the general market participant, and who tries to implement strategies based on the additional information. In this model market participants have access to a stream of noisy information concerning the future return of an asset, whereas the informed trader has access to a further information source which is obscured by an additional noise that may be correlated with the market noise. The informed trader uses the extraneous information source to seek statistical arbitrage opportunities, while at the same time accommodating the additional risk. The amount of information available to the general market participant concerning the asset return is measured by the mutual information of the asset price and the associated cash flow. The worth of the additional information source is then measured in terms of the difference of mutual information between the general market participant and the informed trader. This difference is shown to be nonnegative when the signal-to-noise ratio of the information flow is known in advance. Explicit trading strategies leading to statistical arbitrage opportunities, taking advantage of the additional information, are constructed, illustrating how excess information can be translated into profit. △ Less

Submitted 17 November, 2008; v1 submitted 8 July, 2008; originally announced July 2008.

Comments: 20 pages, 5 figures. Version to appear in the Proceedings of the Royal Society A

Journal ref: Proceedings of the Royal Society London A465, 1103-1122 (2009)

arXiv:cs/0605133 [pdf, ps, other]

Efficient Route Tracing from a Single Source

Authors: Benoit Donnet Philippe Raoult Timur Friedman

Abstract: Traceroute is a networking tool that allows one to discover the path that packets take from a source machine, through the network, to a destination machine. It is widely used as an engineering tool, and also as a scientific tool, such as for discovery of the network topology at the IP level. In prior work, authors on this technical report have shown how to improve the efficiency of route tracing… ▽ More Traceroute is a networking tool that allows one to discover the path that packets take from a source machine, through the network, to a destination machine. It is widely used as an engineering tool, and also as a scientific tool, such as for discovery of the network topology at the IP level. In prior work, authors on this technical report have shown how to improve the efficiency of route tracing from multiple cooperating monitors. However, it is not unusual for a route tracing monitor to operate in isolation. Somewhat different strategies are required for this case, and this report is the first systematic study of those requirements. Standard traceroute is inefficient when used repeatedly towards multiple destinations, as it repeatedly probes the same interfaces close to the source. Others have recognized this inefficiency and have proposed tracing backwards from the destinations and stop** probing upon encounter with a previously-seen interface. One of this technical report's contributions is to quantify for the first time the efficiency of this approach. Another contribution is to describe the effect of non-responding destinations on this efficiency. Since a large portion of destination machines do not reply to probe packets, backwards probing from the destination is often infeasible. We propose an algorithm to tackle non-responding destinations, and we find that our algorithm can strongly decrease probing redundancy at the cost of a small reduction in node and link discovery. △ Less

Submitted 29 May, 2006; originally announced May 2006.

Showing 1–39 of 39 results for author: Friedman, R