Search | arXiv e-print repository

Data-dependent and Oracle Bounds on Forgetting in Continual Learning

Abstract: In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle… ▽ More In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle upper bounds that apply regardless of model and algorithm choice, as well as bounds for Gibbs posteriors. We derive an algorithm inspired by our bounds and demonstrate empirically that our approach yields improved forward and backward transfer. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.02953 [pdf, other]

Analysis of the Identifying Regulation with Adversarial Surrogates Algorithm

Authors: Ron Teichner, Ron Meir, Michael Margaliot

Abstract: Given a time-series of noisy measured outputs of a dynamical system z[k], k=1...N, the Identifying Regulation with Adversarial Surrogates (IRAS) algorithm aims to find a non-trivial first integral of the system, namely, a scalar function g() such that g(z[i]) = g(z[j]), for all i,j. IRAS has been suggested recently and was used successfully in several learning tasks in models from biology and phys… ▽ More Given a time-series of noisy measured outputs of a dynamical system z[k], k=1...N, the Identifying Regulation with Adversarial Surrogates (IRAS) algorithm aims to find a non-trivial first integral of the system, namely, a scalar function g() such that g(z[i]) = g(z[j]), for all i,j. IRAS has been suggested recently and was used successfully in several learning tasks in models from biology and physics. Here, we give the first rigorous analysis of this algorithm in a specific setting. We assume that the observations admit a linear first integral and that they are contaminated by Gaussian noise. We show that in this case the IRAS iterations are closely related to the self-consistent-field (SCF) iterations for solving a generalized Rayleigh quotient minimization problem. Using this approach, we derive several sufficient conditions guaranteeing local convergence of IRAS to the correct first integral. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2403.14705 [pdf, other]

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication

Authors: Boaz Carmeli, Yonatan Belinkov, Ron Meir

Abstract: Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human. A large body of work has attempted to evaluate the emergent communication via various evaluation measures, with \emph{compositionality} featuring as a prominent desired trait. However, current evaluation procedures do not directly expose the composit… ▽ More Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human. A large body of work has attempted to evaluate the emergent communication via various evaluation measures, with \emph{compositionality} featuring as a prominent desired trait. However, current evaluation procedures do not directly expose the compositionality of the emergent communication. We propose a procedure to assess the compositionality of emergent communication by finding the best-match between emerged words and natural language concepts. The best-match algorithm provides both a global score and a translation-map from emergent words to natural language concepts. To the best of our knowledge, it is the first time that such direct and interpretable map** between emergent words and human concepts is provided. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.13366 [pdf, other]

Statistical curriculum learning: An elimination algorithm achieving an oracle risk

Authors: Omer Cohen, Ron Meir, Nir Weinberger

Abstract: We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The… ▽ More We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The first two, referred to as strong/weak-oracle learners, receive high/low degrees of information about the models, and use these to learn. The third, a fully adaptive learner, estimates the target parameter vector without any prior information. In the single source case, we propose an elimination learning method, whose risk matches that of a strong-oracle learner. In the multiple source case, we advocate that the risk of the weak-oracle learner is a realistic benchmark for the risk of adaptive learners. We develop an adaptive multiple elimination-rounds CL algorithm, and characterize instance-dependent conditions for its risk to match that of the weak-oracle learner. We consider instance-dependent minimax lower bounds, and discuss the challenges associated with defining the class of instances for the bound. We derive two minimax lower bounds, and determine the conditions under which the performance weak-oracle learner is minimax optimal. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.02265 [pdf, other]

Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics

Authors: Dror Freirich, Nir Weinberger, Ron Meir

Abstract: Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-$1$ distance induced by a general metric as the perception index, and an arbitrary distortion ma… ▽ More Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-$1$ distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.15116 [pdf, other]

Efficient Online Crowdsourcing with Complex Annotations

Authors: Reshef Meir, Viet-An Nguyen, Xu Chen, Jagdish Ramakrishnan, Udi Weinsberg

Abstract: Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annot… ▽ More Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annotation (such as bounding boxes and taxonomy paths), that works in an online crowdsourcing setting. We prove that the expected average similarity of a labeler is linear in their accuracy \emph{conditional on the reported label}. This enables us to infer reported label accuracy in a broad range of scenarios. We conduct extensive evaluations on real-world crowdsourcing data from Meta and show the effectiveness of our proposed online algorithms in improving the cost-quality trade-off. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: full version of a paper accepted to AAAI'24

arXiv:2307.02295 [pdf, other]

Meta-Learning Adversarial Bandit Algorithms

Authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu

Abstract: We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inne… ▽ More We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD. △ Less

Submitted 1 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Merger of arXiv:2205.14128 and arXiv:2205.15921, with some additional improvements; to appear in NeurIPS 2023

arXiv:2306.02400 [pdf, other]

Perceptual Kalman Filters: Online State Estimation under a Perfect Perceptual-Quality Constraint

Authors: Dror Freirich, Tomer Michaeli, Ron Meir

Abstract: Many practical settings call for the reconstruction of temporal signals from corrupted or missing data. Classic examples include decoding, tracking, signal enhancement and denoising. Since the reconstructed signals are ultimately viewed by humans, it is desirable to achieve reconstructions that are pleasing to human perception. Mathematically, perfect perceptual-quality is achieved when the distri… ▽ More Many practical settings call for the reconstruction of temporal signals from corrupted or missing data. Classic examples include decoding, tracking, signal enhancement and denoising. Since the reconstructed signals are ultimately viewed by humans, it is desirable to achieve reconstructions that are pleasing to human perception. Mathematically, perfect perceptual-quality is achieved when the distribution of restored signals is the same as that of natural signals, a requirement which has been heavily researched in static estimation settings (i.e. when a whole signal is processed at once). Here, we study the problem of optimal causal filtering under a perfect perceptual-quality constraint, which is a task of fundamentally different nature. Specifically, we analyze a Gaussian Markov signal observed through a linear noisy transformation. In the absence of perceptual constraints, the Kalman filter is known to be optimal in the MSE sense for this setting. Here, we show that adding the perfect perceptual quality constraint (i.e. the requirement of temporal consistency), introduces a fundamental dilemma whereby the filter may have to "knowingly" ignore new information revealed by the observations in order to conform to its past decisions. This often comes at the cost of a significant increase in the MSE (beyond that encountered in static settings). Our analysis goes beyond the classic innovation process of the Kalman filter, and introduces the novel concept of an unutilized information process. Using this tool, we present a recursive formula for perceptual filters, and demonstrate the qualitative effects of perfect perceptual-quality estimation on a video reconstruction problem. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.10969 [pdf, ps, other]

Strategic Proxy Voting on the Line

Authors: Gili Bielous, Reshef Meir

Abstract: This paper offers a framework for the study of strategic behavior in proxy voting, where non-active voters delegate their votes to active voters. We further study how proxy voting affects the strategic behavior of non-active voters and proxies (active voters) under complete and partial information. We focus on the median voting rule for single-peaked preferences. Our results show strategyproofne… ▽ More This paper offers a framework for the study of strategic behavior in proxy voting, where non-active voters delegate their votes to active voters. We further study how proxy voting affects the strategic behavior of non-active voters and proxies (active voters) under complete and partial information. We focus on the median voting rule for single-peaked preferences. Our results show strategyproofness with respect to non-active voters. Furthermore, while strategyproofness does not extend to proxies, we show that the outcome is bounded and, under mild restrictions, strategic behavior leads to socially optimal outcomes. We further show that our results extend to partial information settings, and in particular for regret-averse agents. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: A preliminary version of this paper was presented in EUMAS2022

arXiv:2303.06923 [pdf, ps, other]

Strategy-proof Budgeting via a VCG-like Mechanism

Authors: Jonathan Wagner, Reshef Meir

Abstract: We present a strategy-proof public goods budgeting mechanism where agents determine both the total volume of expanses and the specific allocation. It is constructed as a modification of VCG to a less typical environment, namely where we do not assume quasi-linear utilities nor direct revelation. We further show that under plausible assumptions it satisfies strategy-proofness in strictly dominant s… ▽ More We present a strategy-proof public goods budgeting mechanism where agents determine both the total volume of expanses and the specific allocation. It is constructed as a modification of VCG to a less typical environment, namely where we do not assume quasi-linear utilities nor direct revelation. We further show that under plausible assumptions it satisfies strategy-proofness in strictly dominant strategies, and consequently implements the social optimum as a Coalition Proof Nash Equilibrium. A primary (albeit not an exclusive) motivation of our model is Participatory Budgeting, where members of a community collectively decide the spending policy of public tax dollars. While incentives alignment in our mechanism, as in classic VCG, is achieved via individual payments we charge from agents, in a PB context that seems unreasonable. Our second main result thus provides that, under further specifications relevant in that context, these payments will vanish in large populations. In the last section we expand the mechanism's definition to a class of mechanisms in which the designer can prioritize certain outcomes she sees as desirable. In particular we give the example of favoring equitable (egalitarian) allocations. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 27 pages, Manuscript submitted for review to the 24nd ACM Conference on Economics & Computation (EC'23)

arXiv:2303.00435 [pdf, other]

Mitigating Skewed Bidding for Conference Paper Assignment

Authors: Inbal Rozencweig, Reshef Meir, Nick Mattei, Ofra Amir

Abstract: The explosion of conference paper submissions in AI and related fields, has underscored the need to improve many aspects of the peer review process, especially the matching of papers and reviewers. Recent work argues that the key to improve this matching is to modify aspects of the \emph{bidding phase} itself, to ensure that the set of bids over papers is balanced, and in particular to avoid \emph… ▽ More The explosion of conference paper submissions in AI and related fields, has underscored the need to improve many aspects of the peer review process, especially the matching of papers and reviewers. Recent work argues that the key to improve this matching is to modify aspects of the \emph{bidding phase} itself, to ensure that the set of bids over papers is balanced, and in particular to avoid \emph{orphan papers}, i.e., those papers that receive no bids. In an attempt to understand and mitigate this problem, we have developed a flexible bidding platform to test adaptations to the bidding process. Using this platform, we performed a field experiment during the bidding phase of a medium-size international workshop that compared two bidding methods. We further examined via controlled experiments on Amazon Mechanical Turk various factors that affect bidding, in particular the order in which papers are presented \cite{cabanac2013capitalizing,fiez2020super}; and information on paper demand \cite{meir2021market}. Our results suggest that several simple adaptations, that can be added to any existing platform, may significantly reduce the skew in bids, thereby improving the allocation for both reviewers and conference organizers. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Camera ready version of a paper accepted to AAMAS'23: https://aamas2023.soton.ac.uk/program/accepted-papers/

arXiv:2301.08873 [pdf, other]

Convergence of Multi-Issue Iterative Voting under Uncertainty

Authors: Joshua Kavner, Reshef Meir, Francesca Rossi, Lirong Xia

Abstract: We study the effect of strategic behavior in iterative voting for multiple issues under uncertainty. We introduce a model synthesizing simultaneous multi-issue voting with Meir, Lev, and Rosenschein (2014)'s local dominance theory and determine its convergence properties. After demonstrating that local dominance improvement dynamics may fail to converge, we present two sufficient model refinements… ▽ More We study the effect of strategic behavior in iterative voting for multiple issues under uncertainty. We introduce a model synthesizing simultaneous multi-issue voting with Meir, Lev, and Rosenschein (2014)'s local dominance theory and determine its convergence properties. After demonstrating that local dominance improvement dynamics may fail to converge, we present two sufficient model refinements that guarantee convergence from any initial vote profile for binary issues: constraining agents to have O-legal preferences and endowing agents with less uncertainty about issues they are modifying than others. Our empirical studies demonstrate that although cycles are common when agents have no uncertainty, introducing uncertainty makes convergence almost guaranteed in practice. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: 19 pages, 4 figures

arXiv:2211.02412 [pdf, other]

Emergent Quantized Communication

Authors: Boaz Carmeli, Ron Meir, Yonatan Belinkov

Abstract: The field of emergent communication aims to understand the characteristics of communication as it emerges from artificial agents solving tasks that require information exchange. Communication with discrete messages is considered a desired characteristic, for both scientific and applied reasons. However, training a multi-agent system with discrete communication is not straightforward, requiring eit… ▽ More The field of emergent communication aims to understand the characteristics of communication as it emerges from artificial agents solving tasks that require information exchange. Communication with discrete messages is considered a desired characteristic, for both scientific and applied reasons. However, training a multi-agent system with discrete communication is not straightforward, requiring either reinforcement learning algorithms or relaxing the discreteness requirement via a continuous approximation such as the Gumbel-softmax. Both these solutions result in poor performance compared to fully continuous communication. In this work, we propose an alternative approach to achieve discrete communication -- quantization of communicated messages. Using message quantization allows us to train the model end-to-end, achieving superior performance in multiple setups. Moreover, quantization is a natural framework that runs the gamut from continuous to discrete communication. Thus, it sets the ground for a broader view of multi-agent communication in the deep learning era. △ Less

Submitted 19 January, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2207.00614 [pdf, other]

Integral Probability Metrics PAC-Bayes Bounds

Authors: Ron Amit, Baruch Epstein, Shay Moran, Ron Meir

Abstract: We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst… ▽ More We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and improved bounds in favorable cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space. △ Less

Submitted 25 December, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2206.04816 [pdf, other]

Empirical Bayes approach to Truth Discovery problems

Authors: Tsviel Ben Shabat, Reshef Meir, David Azriel

Abstract: When aggregating information from conflicting sources, one's goal is to find the truth. Most real-value \emph{truth discovery} (TD) algorithms try to achieve this goal by estimating the competence of each source and then aggregating the conflicting information by weighing each source's answer proportionally to her competence. However, each of those algorithms requires more than a single source for… ▽ More When aggregating information from conflicting sources, one's goal is to find the truth. Most real-value \emph{truth discovery} (TD) algorithms try to achieve this goal by estimating the competence of each source and then aggregating the conflicting information by weighing each source's answer proportionally to her competence. However, each of those algorithms requires more than a single source for such estimation and usually does not consider different estimation methods other than a weighted mean. Therefore, in this work we formulate, prove, and empirically test the conditions for an Empirical Bayes Estimator (EBE) to dominate the weighted mean aggregation. Our main result demonstrates that EBE, under mild conditions, can be used as a second step of any TD algorithm in order to reduce the expected error. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: full version of a paper accepted to UAI'22

arXiv:2205.15921 [pdf, ps, other]

Online Meta-Learning in Adversarial Multi-Armed Bandits

Authors: Ilya Osadchiy, Kfir Y. Levy, Ron Meir

Abstract: We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per… ▽ More We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes. In the case where the best arm distribution is far from uniform, it improves upon the best bound that can be achieved by any online algorithm executed on each episode individually without meta-learning. △ Less

Submitted 12 July, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: v1: The paper is submitted to NeurIPS 2022. An older version was rejected from ICML 2022 v2: Added a reference to concurrent work in Prior Art section

arXiv:2201.07546 [pdf, other]

Welfare vs. Representation in Participatory Budgeting

Authors: Roy Fairstein, Reshef Meir, Dan Vilenchik, Kobi Gal

Abstract: Participatory budgeting (PB) is a democratic process for allocating funds to projects based on the votes of members of the community. Different rules have been used to aggregate participants' votes. Past research has studied the trade-off between notions of social welfare and fairness in the multi-winner setting (a special case of participatory budgeting with identical project costs) by Lackner an… ▽ More Participatory budgeting (PB) is a democratic process for allocating funds to projects based on the votes of members of the community. Different rules have been used to aggregate participants' votes. Past research has studied the trade-off between notions of social welfare and fairness in the multi-winner setting (a special case of participatory budgeting with identical project costs) by Lackner and Skowron (2020). But there is little understanding of this trade-off in the more general PB setting. This paper provides a theoretical and empirical study of the worst-case guarantees of several common rules to better understand the trade-off between social welfare, representation. We show that many of the guarantees from the multi-winner setting do not generalize to the PB setting, and that the introduction of costs leads to substantially worse guarantees, thereby exacerbating the welfare-representation trade-off. We extend our theoretical analysis to studying how the requirement of proportionality over voting rules affects this trade-off. We further study how the requirement of proportionality over voting rules effects the guarantees on social welfare and representation. We study the latter point also empirically, both on real and synthetic datasets. We show that variants of the recently suggested voting rule Rule-X (which satisfies proportionality) do very well in practice both with respect to social welfare and representation. △ Less

Submitted 25 May, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

arXiv:2107.05320 [pdf, other]

Metalearning Linear Bandits by Prior Update

Authors: Amit Peleg, Naama Pearl, Ron Meir

Abstract: Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior. In practice, such information is often lacking. This problem is exacerbated in setups with partial information, where a misspecified prior may lead to poor exploration and performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that a… ▽ More Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior. In practice, such information is often lacking. This problem is exacerbated in setups with partial information, where a misspecified prior may lead to poor exploration and performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior is sufficiently close to the true prior, the performance of the applied algorithm is close to that of the algorithm that uses the true prior. Furthermore, we address the task of learning the prior through metalearning, where a learner updates her estimate of the prior across multiple task instances in order to improve performance on future tasks. We provide an algorithm and regret bounds, demonstrate its effectiveness in comparison to an algorithm that knows the correct prior, and support our theoretical results empirically. Our theoretical results hold for a broad class of algorithms, including Thompson Sampling and Information Directed Sampling. △ Less

Submitted 2 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

Journal ref: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

arXiv:2107.02555 [pdf, other]

A Theory of the Distortion-Perception Tradeoff in Wasserstein Space

Authors: Dror Freirich, Tomer Michaeli, Ron Meir

Abstract: The lower the distortion of an estimator, the more the distribution of its outputs generally deviates from the distribution of the signals it attempts to estimate. This phenomenon, known as the perception-distortion tradeoff, has captured significant attention in image restoration, where it implies that fidelity to ground truth images comes at the expense of perceptual quality (deviation from stat… ▽ More The lower the distortion of an estimator, the more the distribution of its outputs generally deviates from the distribution of the signals it attempts to estimate. This phenomenon, known as the perception-distortion tradeoff, has captured significant attention in image restoration, where it implies that fidelity to ground truth images comes at the expense of perceptual quality (deviation from statistics of natural images). However, despite the increasing popularity of performing comparisons on the perception-distortion plane, there remains an important open question: what is the minimal distortion that can be achieved under a given perception constraint? In this paper, we derive a closed form expression for this distortion-perception (DP) function for the mean squared-error (MSE) distortion and the Wasserstein-2 perception index. We prove that the DP function is always quadratic, regardless of the underlying distribution. This stems from the fact that estimators on the DP curve form a geodesic in Wasserstein space. In the Gaussian setting, we further provide a closed form expression for such estimators. For general distributions, we show how these estimators can be constructed from the estimators at the two extremes of the tradeoff: The global MSE minimizer, and a minimizer of the MSE under a perfect perceptual quality constraint. The latter can be obtained as a stochastic transformation of the former. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.05360 [pdf, other]

Proportional Participatory Budgeting with Substitute Projects

Authors: Roy Fairstein, Reshef Meir, Kobi Gal

Abstract: Participatory budgeting is a democratic process for allocating funds to projects based on the votes of members of the community. However, most input methods of voters' preferences prevent the voters from expressing complex relationships among projects, leading to outcomes that do not reflect their preferences well enough. In this paper, we propose an input method that begins to address this challe… ▽ More Participatory budgeting is a democratic process for allocating funds to projects based on the votes of members of the community. However, most input methods of voters' preferences prevent the voters from expressing complex relationships among projects, leading to outcomes that do not reflect their preferences well enough. In this paper, we propose an input method that begins to address this challenge, by allowing participants to express substitutes over projects. Then, we extend a known aggregation mechanism from the literature (Rule X) to handle substitute projects. We prove that our extended rule preserves proportionality under natural conditions, and show empirically that it obtains substantially more welfare than the original mechanism on instances with substitutes. △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2104.05082

The Core of Approval Participatory Budgeting with Uniform Costs (or with up to Four Projects) is Non-Empty

Authors: Reshef Meir

Abstract: In the Approval Participatory Budgeting problem an agent prefers a set of projects $W'$ over $W$ if she approves strictly more projects in $W'$. A set of projects $W$ is in the core, if there is no other set of projects $W'$ and set of agents $K$ that both prefer $W'$ over $W$ and can fund $W'$. It is an open problem whether the core can be empty, even when project costs are uniform. the latter ca… ▽ More In the Approval Participatory Budgeting problem an agent prefers a set of projects $W'$ over $W$ if she approves strictly more projects in $W'$. A set of projects $W$ is in the core, if there is no other set of projects $W'$ and set of agents $K$ that both prefer $W'$ over $W$ and can fund $W'$. It is an open problem whether the core can be empty, even when project costs are uniform. the latter case is known as the multiwinner voting core. We show that in any instance with uniform costs or with at most four projects (and any number of agents), the core is nonempty. △ Less

Submitted 6 December, 2022; v1 submitted 11 April, 2021; originally announced April 2021.

Comments: There is a serious error in the result

arXiv:2103.00445 [pdf, other]

Ensemble Bootstrap** for Q-Learning

Authors: Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Abstract: Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may… ▽ More Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games. △ Less

Submitted 20 April, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

arXiv:2102.02610 [pdf, other]

Strategyproof Facility Location Mechanisms on Discrete Trees

Authors: Alina Filimonov, Reshef Meir

Abstract: We address the problem of strategyproof (SP) facility location mechanisms on discrete trees. Our main result is a full characterization of onto and SP mechanisms. In particular, we prove that when a single agent significantly affects the outcome, the trajectory of the facility is almost contained in the trajectory of the agent, and both move in the same direction along the common edges. We show ti… ▽ More We address the problem of strategyproof (SP) facility location mechanisms on discrete trees. Our main result is a full characterization of onto and SP mechanisms. In particular, we prove that when a single agent significantly affects the outcome, the trajectory of the facility is almost contained in the trajectory of the agent, and both move in the same direction along the common edges. We show tight relations of our characterization to previous results on discrete lines and on continuous trees. We then derive further implications of the main result for infinite discrete lines. △ Less

Submitted 4 February, 2021; originally announced February 2021.

Comments: Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

arXiv:2007.02040 [pdf, other]

Discount Factor as a Regularizer in Reinforcement Learning

Authors: Ron Amit, Ron Meir, Kamil Ciosek

Abstract: Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For severa… ▽ More Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For several Temporal-Difference (TD) learning methods, we show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L_2$ regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: Published in ICML 2020

Journal ref: Published in Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

arXiv:2006.07837 [pdf, ps, other]

Representative Committees of Peers

Authors: Reshef Meir, Fedor Sandomirskiy, Moshe Tennenholtz

Abstract: A population of voters must elect representatives among themselves to decide on a sequence of possibly unforeseen binary issues. Voters care only about the final decision, not the elected representatives. The disutility of a voter is proportional to the fraction of issues, where his preferences disagree with the decision. While an issue-by-issue vote by all voters would maximize social welfare,… ▽ More A population of voters must elect representatives among themselves to decide on a sequence of possibly unforeseen binary issues. Voters care only about the final decision, not the elected representatives. The disutility of a voter is proportional to the fraction of issues, where his preferences disagree with the decision. While an issue-by-issue vote by all voters would maximize social welfare, we are interested in how well the preferences of the population can be approximated by a small committee. We show that a k-sortition (a random committee of k voters with the majority vote within the committee) leads to an outcome within the factor 1+O(1/k) of the optimal social cost for any number of voters n, any number of issues $m$, and any preference profile. For a small number of issues m, the social cost can be made even closer to optimal by delegation procedures that weigh committee members according to their number of followers. However, for large m, we demonstrate that the k-sortition is the worst-case optimal rule within a broad family of committee-based rules that take into account metric information about the preference profile of the whole population. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2005.06326 [pdf, ps, other]

Cumulative Games: Who is the current player?

Authors: Urban Larsson, Reshef Meir, Yair Zick

Abstract: Combinatorial Game Theory (CGT) is a branch of game theory that has developed almost independently from Economic Game Theory (EGT), and is concerned with deep mathematical properties of 2-player 0-sum games that are defined over various combinatorial structures. The aim of this work is to lay foundations to bridging the conceptual and technical gaps between CGT and EGT, here interpreted as so-call… ▽ More Combinatorial Game Theory (CGT) is a branch of game theory that has developed almost independently from Economic Game Theory (EGT), and is concerned with deep mathematical properties of 2-player 0-sum games that are defined over various combinatorial structures. The aim of this work is to lay foundations to bridging the conceptual and technical gaps between CGT and EGT, here interpreted as so-called Extensive Form Games, so they can be treated within a unified framework. More specifically, we introduce a class of $n$-player, general-sum games, called Cumulative Games, that can be analyzed by both CGT and EGT tools. We show how two of the most fundamental definitions of CGT---the outcome function, and the disjunctive sum operator---naturally extend to the class of Cumulative Games. The outcome function allows for an efficient equilibrium computation under certain restrictions, and the disjunctive sum operator lets us define a partial order over games, according to the advantage that a certain player has. Finally, we show that any Extensive Form Game can be written as a Cumulative Game. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: 54 pages, 4 figures

MSC Class: 91A46; 91A05

arXiv:2003.05878 [pdf, other]

Option Discovery in the Absence of Rewards with Manifold Analysis

Authors: Amitay Bar, Ronen Talmon, Ron Meir

Abstract: Options have been shown to be an effective tool in reinforcement learning, facilitating improved exploration and learning. In this paper, we present an approach based on spectral graph theory and derive an algorithm that systematically discovers options without access to a specific reward or task assignment. As opposed to the common practice used in previous methods, our algorithm makes full use o… ▽ More Options have been shown to be an effective tool in reinforcement learning, facilitating improved exploration and learning. In this paper, we present an approach based on spectral graph theory and derive an algorithm that systematically discovers options without access to a specific reward or task assignment. As opposed to the common practice used in previous methods, our algorithm makes full use of the spectrum of the graph Laplacian. Incorporating modes associated with higher graph frequencies unravels domain subtleties, which are shown to be useful for option discovery. Using geometric and manifold-based analysis, we present a theoretical justification for the algorithm. In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods. △ Less

Submitted 19 August, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

arXiv:2002.03169 [pdf, ps, other]

Distance-based Equilibria in Normal-Form Games

Authors: Erman Acar, Reshef Meir

Abstract: We propose a simple uncertainty modification for the agent model in normal-form games; at any given strategy profile, the agent can access only a set of "possible profiles" that are within a certain distance from the actual action profile. We investigate the various instantiations in which the agent chooses her strategy using well-known rationales e.g., considering the worst case, or trying to min… ▽ More We propose a simple uncertainty modification for the agent model in normal-form games; at any given strategy profile, the agent can access only a set of "possible profiles" that are within a certain distance from the actual action profile. We investigate the various instantiations in which the agent chooses her strategy using well-known rationales e.g., considering the worst case, or trying to minimize the regret, to cope with such uncertainty. Any such modification in the behavioral model naturally induces a corresponding notion of equilibrium; a distance-based equilibrium. We characterize the relationships between the various equilibria, and also their connections to well-known existing solution concepts such as Trembling-hand perfection. Furthermore, we deliver existence results, and show that for some class of games, such solution concepts can actually lead to better outcomes. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Comments: Author's preprint

arXiv:2001.05271 [pdf, other]

doi 10.1007/978-3-031-20614-6_15

Safe Voting: Resilience to Abstention and Sybils

Authors: Reshef Meir, Gal Shahaf, Ehud Shapiro, Nimrod Talmon

Abstract: Voting rules may implement the will of the society when all eligible voters vote, and only them. However, they may fail to do so when sybil (fake or duplicate) votes are present and when only some honest (non sybil) voters actively participate. As, unfortunately, sometimes this is the case, our aim here is to address social choice in the presence of sybils and voter abstention. To do so we build u… ▽ More Voting rules may implement the will of the society when all eligible voters vote, and only them. However, they may fail to do so when sybil (fake or duplicate) votes are present and when only some honest (non sybil) voters actively participate. As, unfortunately, sometimes this is the case, our aim here is to address social choice in the presence of sybils and voter abstention. To do so we build upon the framework of Reality-aware Social Choice: we assume the status-quo as an ever-present distinguished alternative, and study Status-Quo Enforcing voting rules, which add virtual votes in support of the status-quo. We characterize the tradeoff between safety and liveness (the ability of active honest voters to maintain/change the status-quo, respectively) in several domains, and show that the Status-Quo Enforcing voting rules are often optimal. We comment on the applicability of our methods and analyses to the governance of digital communities. △ Less

Submitted 7 April, 2024; v1 submitted 15 January, 2020; originally announced January 2020.

arXiv:1912.11323 [pdf, other]

Bidding in Spades

Authors: Gal Cohensius, Reshef Meir, Nadav Oved, Roni Stern

Abstract: We present a Spades bidding algorithm that is superior to recreational human players and to publicly available bots. Like in Bridge, the game of Spades is composed of two independent phases, \textit{bidding} and \textit{playing}. This paper focuses on the bidding algorithm, since this phase holds a precise challenge: based on the input, choose the bid that maximizes the agent's winning probability… ▽ More We present a Spades bidding algorithm that is superior to recreational human players and to publicly available bots. Like in Bridge, the game of Spades is composed of two independent phases, \textit{bidding} and \textit{playing}. This paper focuses on the bidding algorithm, since this phase holds a precise challenge: based on the input, choose the bid that maximizes the agent's winning probability. Our \emph{Bidding-in-Spades} (BIS) algorithm heuristically determines the bidding strategy by comparing the expected utility of each possible bid. A major challenge is how to estimate these expected utilities. To this end, we propose a set of domain-specific heuristics, and then correct them via machine learning using data from real-world players. The \BIS algorithm we present can be attached to any playing algorithm. It beats rule-based bidding bots when all use the same playing component. When combined with a rule-based playing algorithm, it is superior to the average recreational human. △ Less

Submitted 10 February, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: 13 pages, 7 figures, to be published in ECAI 2020

ACM Class: I.2.1

arXiv:1909.10492 [pdf, other]

Modeling Peoples Voting Behavior with Poll Information

Authors: Roy Fairstein, Adam Lauz, Kobi Gal, Reshef Meir

Abstract: Despite the prevalence of voting systems in the real world there is no consensus among researchers of how people vote strategically, even in simple voting settings. This paper addresses this gap by comparing different approaches that have been used to model strategic voting, including expected utility maximization, heuristic decisionmaking, and bounded rationality models. The models are applied to… ▽ More Despite the prevalence of voting systems in the real world there is no consensus among researchers of how people vote strategically, even in simple voting settings. This paper addresses this gap by comparing different approaches that have been used to model strategic voting, including expected utility maximization, heuristic decisionmaking, and bounded rationality models. The models are applied to data collected from hundreds of people in controlled voting experiments, where people vote after observing non-binding poll information. We introduce a new voting model, the Attainability- Utility (AU) heuristic, which weighs the popularity of a candidate according to the poll, with the utility of the candidate to the voter. We argue that the AU model is cognitively plausible, and show that it is able to predict peoples voting behavior significantly better than other models from the literature. It was almost at par with (and sometimes better than) a machine learning algorithm that uses substantially more information. Our results provide new insights into the strategic considerations of voters, that undermine the prevalent assumptions of much theoretical work in social choice. △ Less

Submitted 23 September, 2019; originally announced September 2019.

arXiv:1906.09713 [pdf, other]

Penalty Bidding Mechanisms for Allocating Resources and Overcoming Present Bias

Authors: Hongyao Ma, Reshef Meir, David C. Parkes, Elena Wu-Yan

Abstract: From skipped exercise classes to last-minute cancellation of dentist appointments, underutilization of reserved resources abounds. Likely reasons include uncertainty about the future, further exacerbated by present bias. In this paper, we unite resource allocation and commitment devices through the design of contingent payment mechanisms, and propose the two-bid penalty-bidding mechanism. This ext… ▽ More From skipped exercise classes to last-minute cancellation of dentist appointments, underutilization of reserved resources abounds. Likely reasons include uncertainty about the future, further exacerbated by present bias. In this paper, we unite resource allocation and commitment devices through the design of contingent payment mechanisms, and propose the two-bid penalty-bidding mechanism. This extends an earlier mechanism proposed by Ma et al. (2019), assigning the resources based on willingness to accept a no-show penalty, while also allowing each participant to increase her own penalty in order to counter present bias. We establish a simple dominant strategy equilibrium, regardless of an agent's level of present bias or degree of "sophistication". Via simulations, we show that the proposed mechanism substantially improves utilization and achieves higher welfare and better equity in comparison with mechanisms used in practice and mechanisms that optimize welfare in the absence of present bias. △ Less

Submitted 8 May, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

arXiv:1905.09951 [pdf, other]

PAC Guarantees for Cooperative Multi-Agent Reinforcement Learning with Restricted Communication

Authors: Or Raveh, Ron Meir

Abstract: We develop model free PAC performance guarantees for multiple concurrent MDPs, extending recent works where a single learner interacts with multiple non-interacting agents in a noise free environment. Our framework allows noisy and resource limited communication between agents, and develops novel PAC guarantees in this extended setting. By allowing communication between the agents themselves, we s… ▽ More We develop model free PAC performance guarantees for multiple concurrent MDPs, extending recent works where a single learner interacts with multiple non-interacting agents in a noise free environment. Our framework allows noisy and resource limited communication between agents, and develops novel PAC guarantees in this extended setting. By allowing communication between the agents themselves, we suggest improved PAC-exploration algorithms that can overcome the communication noise and lead to improved sample complexity bounds. We provide a theoretically motivated algorithm that optimally combines information from the resource limited agents, thereby analyzing the interaction between noise and communication constraints that are ubiquitous in real-world systems. We present empirical results for a simple task that supports our theoretical formulations and improve upon naive information fusion methods. △ Less

Submitted 10 October, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1905.00629 [pdf, other]

Frustratingly Easy Truth Discovery

Authors: Reshef Meir, Ofra Amir, Omer Ben-Porat, Tsviel Ben-Shabat, Gal Cohensius, Lirong Xia

Abstract: Truth discovery is a general name for a broad range of statistical methods aimed to extract the correct answers to questions, based on multiple answers coming from noisy sources. For example, workers in a crowdsourcing platform. In this paper, we consider an extremely simple heuristic for estimating workers' competence using average proximity to other workers. We prove that this estimates well the… ▽ More Truth discovery is a general name for a broad range of statistical methods aimed to extract the correct answers to questions, based on multiple answers coming from noisy sources. For example, workers in a crowdsourcing platform. In this paper, we consider an extremely simple heuristic for estimating workers' competence using average proximity to other workers. We prove that this estimates well the actual competence level and enables separating high and low quality workers in a wide spectrum of domains and statistical models. Under Gaussian noise, this simple estimate is the unique solution to the MLE with a constant regularization factor. Finally, weighing workers according to their average proximity in a crowdsourcing setting, results in substantial improvement over unweighted aggregation and other truth discovery algorithms in practice. △ Less

Submitted 2 December, 2022; v1 submitted 2 May, 2019; originally announced May 2019.

Comments: Full version of a paper accepted to AAAI'23

arXiv:1902.08070 [pdf, ps, other]

Strategyproof Facility Location for Three Agents on a Circle

Authors: Reshef Meir

Abstract: We consider the facility location problem in a metric space, focusing on the case of three agents. We show that selecting the reported location of each agent with probability proportional to the distance between the other two agents results in a mechanism that is strategyproof in expectation, and dominates the random dictator mechanism in terms of utilitarian social welfare. We further improve the… ▽ More We consider the facility location problem in a metric space, focusing on the case of three agents. We show that selecting the reported location of each agent with probability proportional to the distance between the other two agents results in a mechanism that is strategyproof in expectation, and dominates the random dictator mechanism in terms of utilitarian social welfare. We further improve the upper bound for three agents on a circle to 7/6 (whereas random dictator obtains 4/3); and provide the first lower bounds for randomized strategyproof facility location in any metric space, using linear programming. △ Less

Submitted 7 July, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: this is a full version of a paper accepted to SAGT'19. A preliminary version appeared as an extended abstract in AAMAS'19

arXiv:1902.01449 [pdf, other]

Generalization Bounds For Unsupervised and Semi-Supervised Learning With Autoencoders

Authors: Baruch Epstein, Ron Meir

Abstract: Autoencoders are widely used for unsupervised learning and as a regularization scheme in semi-supervised learning. However, theoretical understanding of their generalization properties and of the manner in which they can assist supervised learning has been lacking. We utilize recent advances in the theory of deep learning generalization, together with a novel reconstruction loss, to provide genera… ▽ More Autoencoders are widely used for unsupervised learning and as a regularization scheme in semi-supervised learning. However, theoretical understanding of their generalization properties and of the manner in which they can assist supervised learning has been lacking. We utilize recent advances in the theory of deep learning generalization, together with a novel reconstruction loss, to provide generalization bounds for autoencoders. To the best of our knowledge, this is the first such bound. We further show that, under appropriate assumptions, an autoencoder with good generalization properties can improve any semi-supervised learning scheme. We support our theoretical results with empirical demonstrations. △ Less

Submitted 4 February, 2019; originally announced February 2019.

Comments: Submitted to COLT 2019

arXiv:1811.05529 [pdf, ps, other]

Heuristic Voting as Ordinal Dominance Strategies

Authors: Omer Lev, Reshef Meir, Svetlana Obraztsova, Maria Polukarov

Abstract: Decision making under uncertainty is a key component of many AI settings, and in particular of voting scenarios where strategic agents are trying to reach a joint decision. The common approach to handle uncertainty is by maximizing expected utility, which requires a cardinal utility function as well as detailed probabilistic information. However, often such probabilities are not easy to estimate o… ▽ More Decision making under uncertainty is a key component of many AI settings, and in particular of voting scenarios where strategic agents are trying to reach a joint decision. The common approach to handle uncertainty is by maximizing expected utility, which requires a cardinal utility function as well as detailed probabilistic information. However, often such probabilities are not easy to estimate or apply. To this end, we present a framework that allows "shades of gray" of likelihood without probabilities. Specifically, we create a hierarchy of sets of world states based on a prospective poll, with inner sets contain more likely outcomes. This hierarchy of likelihoods allows us to define what we term ordinally-dominated strategies. We use this approach to justify various known voting heuristics as bounded-rational strategies. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: This is the full version of paper #6080 accepted to AAAI'19

arXiv:1808.01960 [pdf, other]

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

Authors: Dror Freirich, Ron Meir, Aviv Tamar

Abstract: The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep gen… ▽ More The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value. We use this insight to propose a GAN-based approach to DiRL, which leverages the strengths of GANs in learning distributions of high-dimensional data. In particular, we show that our GAN approach can be used for DiRL with multivariate rewards, an important setting which cannot be tackled with prior methods. The multivariate setting also allows us to unify learning the distribution of values and state transitions, and we exploit this idea to devise a novel exploration method that is driven by the discrepancy in estimating both values and states. △ Less

Submitted 6 August, 2018; originally announced August 2018.

arXiv:1806.06257 [pdf, other]

Efficient Crowdsourcing via Proxy Voting

Authors: Gal Cohensius, Omer Ben Porat, Reshef Meir, Ofra Amir

Abstract: Crowdsourcing platforms offer a way to label data by aggregating answers of multiple unqualified workers. We introduce a \textit{simple} and \textit{budget efficient} crowdsourcing method named Proxy Crowdsourcing (PCS). PCS collects answers from two sets of workers: \textit{leaders} (a.k.a proxies) and \textit{followers}. Each leader completely answers the survey while each follower answers only… ▽ More Crowdsourcing platforms offer a way to label data by aggregating answers of multiple unqualified workers. We introduce a \textit{simple} and \textit{budget efficient} crowdsourcing method named Proxy Crowdsourcing (PCS). PCS collects answers from two sets of workers: \textit{leaders} (a.k.a proxies) and \textit{followers}. Each leader completely answers the survey while each follower answers only a small subset of it. We then weigh every leader according to the number of followers to which his answer are closest, and aggregate the answers of the leaders using any standard aggregation method (e.g., Plurality for categorical labels or Mean for continuous labels). We compare empirically the performance of PCS to unweighted aggregation, kee** the total number of questions (the budget) fixed. We show that PCS improves the accuracy of aggregated answers across several datasets, both with categorical and continuous labels. Overall, our suggested method improves accuracy while being simple and easy to implement. △ Less

Submitted 16 June, 2018; originally announced June 2018.

arXiv:1805.09368 [pdf, other]

Cumulative subtraction games

Authors: Gal Cohensius, Urban Larsson, Reshef Meir, David Wahlstedt

Abstract: We study zero-sum games, a variant of the classical combinatorial Subtraction games (studied for example in the monumental work "Winning Ways", by Berlekamp, Conway and Guy), called Cumulative Subtraction (CS). Two players alternate in moving, and get points for taking pebbles out of a joint pile. We prove that the outcome in optimal play (game value) of a CS with a finite number of possible actio… ▽ More We study zero-sum games, a variant of the classical combinatorial Subtraction games (studied for example in the monumental work "Winning Ways", by Berlekamp, Conway and Guy), called Cumulative Subtraction (CS). Two players alternate in moving, and get points for taking pebbles out of a joint pile. We prove that the outcome in optimal play (game value) of a CS with a finite number of possible actions is eventually periodic, with period $2s$, where $s$ is the size of the largest available action. This settles a conjecture by Stewart in his Ph.D. thesis (2011). Specifically, we find a quadratic bound, in the size of $s$, on when the outcome function must have become periodic. In case of two possible actions, we give an explicit description of optimal play. We generalize the periodicity result to games with a so-called reward function, where at each stage of game, the change of `score' does not necessarily equal the number of pebbles you collect. △ Less

Submitted 12 February, 2020; v1 submitted 23 May, 2018; originally announced May 2018.

Comments: 24 pages, 6 figures

MSC Class: 11B75; 91A05; 91A46

arXiv:1805.07606 [pdf, other]

Predicting Strategic Voting Behavior with Poll Information

Authors: Roy Fairstein, Adam Lauz, Kobi Gal, Reshef Meir

Abstract: The question of how people vote strategically under uncertainty has attracted much attention in several disciplines. Theoretical decision models have been proposed which vary in their assumptions on the sophistication of the voters and on the information made available to them about others' preferences and their voting behavior. This work focuses on modeling strategic voting behavior under poll in… ▽ More The question of how people vote strategically under uncertainty has attracted much attention in several disciplines. Theoretical decision models have been proposed which vary in their assumptions on the sophistication of the voters and on the information made available to them about others' preferences and their voting behavior. This work focuses on modeling strategic voting behavior under poll information. It proposes a new heuristic for voting behavior that weighs the success of each candidate according to the poll score with the utility of the candidate given the voters' preferences. The model weights can be tuned individually for each voter. We compared this model with other relevant voting models from the literature on data obtained from a recently released large scale study. We show that the new heuristic outperforms all other tested models. The prediction errors of the model can be partly explained due to inconsistent voters that vote for (weakly) dominated candidates. △ Less

Submitted 19 May, 2018; originally announced May 2018.

arXiv:1804.02268 [pdf, ps, other]

Social Choice with Non Quasi-linear Utilities

Authors: Hongyao Ma, Reshef Meir, David C. Parkes

Abstract: Without monetary payments, the Gibbard-Satterthwaite theorem proves that under mild requirements all truthful social choice mechanisms must be dictatorships. When payments are allowed, the Vickrey-Clarke-Groves (VCG) mechanism implements the value-maximizing choice, and has many other good properties: it is strategy-proof, onto, deterministic, individually rational, and does not make positive tran… ▽ More Without monetary payments, the Gibbard-Satterthwaite theorem proves that under mild requirements all truthful social choice mechanisms must be dictatorships. When payments are allowed, the Vickrey-Clarke-Groves (VCG) mechanism implements the value-maximizing choice, and has many other good properties: it is strategy-proof, onto, deterministic, individually rational, and does not make positive transfers to the agents. By Roberts' theorem, with three or more alternatives, the weighted VCG mechanisms are essentially unique for domains with quasi-linear utilities. The goal of this paper is to characterize domains of non-quasi-linear utilities where "reasonable" mechanisms (with VCG-like properties) exist. Our main result is a tight characterization of the maximal non quasi-linear utility domain, which we call the largest parallel domain. We extend Roberts' theorem to parallel domains, and use the generalized theorem to prove two impossibility results. First, any reasonable mechanism must be dictatorial when the utility domain is quasi-linear together with any single non-parallel type. Second, for richer utility domains that still differ very slightly from quasi-linearity, every strategy-proof, onto and deterministic mechanism must be a dictatorship. △ Less

Submitted 25 April, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

arXiv:1711.01806 [pdf, ps, other]

Directed Graph Minors and Serial-Parallel Width

Authors: Argyrios Deligkas, Reshef Meir

Abstract: Graph minors are a primary tool in understanding the structure of undirected graphs, with many conceptual and algorithmic implications. We propose new variants of \emph{directed graph minors} and \emph{directed graph embeddings}, by modifying familiar definitions. For the class of 2-terminal directed acyclic graphs (TDAGs) our two definitions coincide, and the class is closed under both operations… ▽ More Graph minors are a primary tool in understanding the structure of undirected graphs, with many conceptual and algorithmic implications. We propose new variants of \emph{directed graph minors} and \emph{directed graph embeddings}, by modifying familiar definitions. For the class of 2-terminal directed acyclic graphs (TDAGs) our two definitions coincide, and the class is closed under both operations. The usefulness of our directed minor operations is demonstrated by characterizing all TDAGs with serial-parallel width at most $k$; a class of networks known to guarantee bounded negative externality in nonatomic routing games. Our characterization implies that a TDAG has serial-parallel width of $1$ if and only if it is a directed series-parallel graph. We also study the computational complexity of finding a directed minor and computing the serial-parallel width. △ Less

Submitted 29 May, 2019; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: published in MFCS 2018

ACM Class: F.2.2; G.2.2

arXiv:1711.01244 [pdf, other]

Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory

Authors: Ron Amit, Ron Meir

Abstract: In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are 'related' to previous tasks, the accumulated knowledge should be learned in a way which captures the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of new tasks. We prese… ▽ More In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are 'related' to previous tasks, the accumulated knowledge should be learned in a way which captures the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of new tasks. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm which minimizes an objective function derived from the bounds and demonstrate its effectiveness numerically with deep neural networks. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network. △ Less

Submitted 20 May, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

Comments: Accepted to ICML 2018

arXiv:1705.10494 [pdf, other]

Joint auto-encoders: a flexible multi-task learning framework

Authors: Baruch Epstein. Ron Meir, Tomer Michaeli

Abstract: The incorporation of prior knowledge into learning is essential in achieving good performance based on small noisy samples. Such knowledge is often incorporated through the availability of related data arising from domains and tasks similar to the one of current interest. Ideally one would like to allow both the data for the current task and for previous related tasks to self-organize the learning… ▽ More The incorporation of prior knowledge into learning is essential in achieving good performance based on small noisy samples. Such knowledge is often incorporated through the availability of related data arising from domains and tasks similar to the one of current interest. Ideally one would like to allow both the data for the current task and for previous related tasks to self-organize the learning system in such a way that commonalities and differences between the tasks are learned in a data-driven fashion. We develop a framework for learning multiple tasks simultaneously, based on sharing features that are common to all tasks, achieved through the use of a modular deep feedforward neural network consisting of shared branches, dealing with the common features of all tasks, and private branches, learning the specific unique aspects of each task. Once an appropriate weight sharing architecture has been established, learning takes place through standard algorithms for feedforward networks, e.g., stochastic gradient descent and its variations. The method deals with domain adaptation and multi-task learning in a unified fashion, and can easily deal with data arising from different types of sources. Numerical experiments demonstrate the effectiveness of learning in domain adaptation and transfer learning setups, and provide evidence for the flexible and task-oriented representations arising in the network. △ Less

Submitted 30 May, 2017; originally announced May 2017.

arXiv:1705.07300 [pdf, other]

Contract Design for Energy Demand Response

Authors: Reshef Meir, Hongyao Ma, Valentin Robu

Abstract: Power companies such as Southern California Edison (SCE) uses Demand Response (DR) contracts to incentivize consumers to reduce their power consumption during periods when demand forecast exceeds supply. Current mechanisms in use offer contracts to consumers independent of one another, do not take into consideration consumers' heterogeneity in consumption profile or reliability, and fail to achiev… ▽ More Power companies such as Southern California Edison (SCE) uses Demand Response (DR) contracts to incentivize consumers to reduce their power consumption during periods when demand forecast exceeds supply. Current mechanisms in use offer contracts to consumers independent of one another, do not take into consideration consumers' heterogeneity in consumption profile or reliability, and fail to achieve high participation. We introduce DR-VCG, a new DR mechanism that offers a flexible set of contracts (which may include the standard SCE contracts) and uses VCG pricing. We prove that DR-VCG elicits truthful bids, incentivizes honest preparation efforts, enables efficient computation of allocation and prices. With simple fixed-penalty contracts, the optimization goal of the mechanism is an upper bound on probability that the reduction target is missed. Extensive simulations show that compared to the current mechanism deployed in by SCE, the DR-VCG mechanism achieves higher participation, increased reliability, and significantly reduced total expenses. △ Less

Submitted 20 May, 2017; originally announced May 2017.

Comments: full version of paper accepted to IJCAI'17

arXiv:1701.07398 [pdf, other]

Learning an attention model in an artificial visual system

Authors: Alon Hazan, Yuval Harel, Ron Meir

Abstract: The Human visual perception of the world is of a large fixed image that is highly detailed and sharp. However, receptor density in the retina is not uniform: a small central region called the fovea is very dense and exhibits high resolution, whereas a peripheral region around it has much lower spatial resolution. Thus, contrary to our perception, we are only able to observe a very small region aro… ▽ More The Human visual perception of the world is of a large fixed image that is highly detailed and sharp. However, receptor density in the retina is not uniform: a small central region called the fovea is very dense and exhibits high resolution, whereas a peripheral region around it has much lower spatial resolution. Thus, contrary to our perception, we are only able to observe a very small region around the line of sight with high resolution. The perception of a complete and stable view is aided by an attention mechanism that directs the eyes to the numerous points of interest within the scene. The eyes move between these targets in quick, unconscious movements, known as "saccades". Once a target is centered at the fovea, the eyes fixate for a fraction of a second while the visual system extracts the necessary information. An artificial visual system was built based on a fully recurrent neural network set within a reinforcement learning protocol, and learned to attend to regions of interest while solving a classification task. The model is consistent with several experimentally observed phenomena, and suggests novel predictions. △ Less

Submitted 24 January, 2017; originally announced January 2017.

Journal ref: IEEE International Conference on the Science of Electrical Engineering (ICSEE) (2016)

arXiv:1611.08308 [pdf, other]

Proxy Voting for Better Outcomes

Authors: Gal Cohensius, Shie Manor, Reshef Meir, Eli Meirom, Ariel Orda

Abstract: We consider a social choice problem where only a small number of people out of a large population are sufficiently available or motivated to vote. A common solution to increase participation is to allow voters use a proxy, that is, transfer their voting rights to another voter. Considering social choice problems on metric spaces, we compare voting with and without the use of proxies to see which m… ▽ More We consider a social choice problem where only a small number of people out of a large population are sufficiently available or motivated to vote. A common solution to increase participation is to allow voters use a proxy, that is, transfer their voting rights to another voter. Considering social choice problems on metric spaces, we compare voting with and without the use of proxies to see which mechanism better approximates the optimal outcome, and characterize the regimes in which proxy voting is beneficial. When voters' opinions are located on an interval, both the median mechanism and the mean mechanism are substantially improved by proxy voting. When voters vote on many binary issues, proxy voting is better when the sample of active voters is too small to provide a good outcome. Our theoretical results extend to situations where available voters choose strategically whether to participate. We support our theoretical findings with empirical results showing substantial benefits of proxy voting on simulated and real preference data. △ Less

Submitted 24 November, 2016; originally announced November 2016.

arXiv:1609.01682 [pdf, ps, other]

Random Tie-breaking with Stochastic Dominance

Authors: Reshef Meir

Abstract: Consider Plurality with random tie-breaking. This paper uses standard axiomatic extensions of preferences over elements to preferences over sets (Kelly, Gardenfors, Responsiveness) to characterize all better-replies of a voter under stochastic dominance. Consider Plurality with random tie-breaking. This paper uses standard axiomatic extensions of preferences over elements to preferences over sets (Kelly, Gardenfors, Responsiveness) to characterize all better-replies of a voter under stochastic dominance. △ Less

Submitted 6 September, 2016; originally announced September 2016.

arXiv:1607.06511 [pdf, other]

Contingent Payment Mechanisms for Resource Utilization

Authors: Hongyao Ma, Reshef Meir, David C. Parkes, James Zou

Abstract: We introduce the problem of assigning resources to improve their utilization. The motivation comes from settings where agents have uncertainty about their own values for using a resource, and where it is in the interest of a group that resources be used and not wasted. Done in the right way, improved utilization maximizes social welfare--- balancing the utility of a high value but unreliable agent… ▽ More We introduce the problem of assigning resources to improve their utilization. The motivation comes from settings where agents have uncertainty about their own values for using a resource, and where it is in the interest of a group that resources be used and not wasted. Done in the right way, improved utilization maximizes social welfare--- balancing the utility of a high value but unreliable agent with the group's preference that resources be used. We introduce the family of contingent payment mechanisms (CP), which may charge an agent contingent on use (a penalty). A CP mechanism is parameterized by a maximum penalty, and has a dominant-strategy equilibrium. Under a set of axiomatic properties, we establish welfare-optimality for the special case CP(W), with CP instantiated for a maximum penalty equal to societal value W for utilization. CP(W) is not dominated for expected welfare by any other mechanism, and second, amongst mechanisms that always allocate the resource and have a simple indirect structure, CP(W) strictly dominates every other mechanism. The special case with no upper bound on penalty, the contingent second-price mechanism, maximizes utilization. We extend the mechanisms to assign multiple, heterogeneous resources, and present a simulation study of the welfare properties of these mechanisms. △ Less

Submitted 1 November, 2018; v1 submitted 21 July, 2016; originally announced July 2016.

Showing 1–50 of 64 results for author: Meir, R