Search | arXiv e-print repository

ABCinML: Anticipatory Bias Correction in Machine Learning Applications

Authors: Abdulaziz A. Almuzaini, Chidansh A. Bhatt, David M. Pennock, Vivek K. Singh

Abstract: The idealization of a static machine-learned model, trained once and deployed forever, is not practical. As input distributions change over time, the model will not only lose accuracy, any constraints to reduce bias against a protected class may fail to work as intended. Thus, researchers have begun to explore ways to maintain algorithmic fairness over time. One line of work focuses on dynamic lea… ▽ More The idealization of a static machine-learned model, trained once and deployed forever, is not practical. As input distributions change over time, the model will not only lose accuracy, any constraints to reduce bias against a protected class may fail to work as intended. Thus, researchers have begun to explore ways to maintain algorithmic fairness over time. One line of work focuses on dynamic learning: retraining after each batch, and the other on robust learning which tries to make algorithms robust against all possible future changes. Dynamic learning seeks to reduce biases soon after they have occurred and robust learning often yields (overly) conservative models. We propose an anticipatory dynamic learning approach for correcting the algorithm to mitigate bias before it occurs. Specifically, we make use of anticipations regarding the relative distributions of population subgroups (e.g., relative ratios of male and female applicants) in the next cycle to identify the right parameters for an importance weighing fairness approach. Results from experiments over multiple real-world datasets suggest that this approach has promise for anticipatory bias correction. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2109.06443 [pdf, other]

Designing a Combinatorial Financial Options Market

Authors: Xintong Wang, David M. Pennock, Nikhil R. Devanur, David M. Rothschild, Biaoshuai Tao, Michael P. Wellman

Abstract: Financial options are contracts that specify the right to buy or sell an underlying asset at a strike price by an expiration date. Standard exchanges offer options of predetermined strike values and trade options of different strikes independently, even for those written on the same underlying asset. Such independent market design can introduce arbitrage opportunities and lead to the thin market p… ▽ More Financial options are contracts that specify the right to buy or sell an underlying asset at a strike price by an expiration date. Standard exchanges offer options of predetermined strike values and trade options of different strikes independently, even for those written on the same underlying asset. Such independent market design can introduce arbitrage opportunities and lead to the thin market problem. The paper first proposes a mechanism that consolidates and matches orders on standard options related to the same underlying asset, while providing agents the flexibility to specify any custom strike value. The mechanism generalizes the classic double auction, runs in time polynomial to the number of orders, and poses no risk to the exchange, regardless of the value of the underlying asset at expiration. Empirical analysis on real-market options data shows that the mechanism can find new matches for options of different strike prices and reduce bid-ask spreads. Extending standard options written on a single asset, we propose and define a new derivative instrument -- combinatorial financial options that offer contract holders the right to buy or sell any linear combination of multiple underlying assets. We generalize our single-asset mechanism to match options written on different combinations of assets, and prove that optimal clearing of combinatorial financial options is coNP-hard. To facilitate market operations, we propose an algorithm that finds the exact optimal match through iterative constraint generation, and evaluate its performance on synthetically generated combinatorial options markets of different scales. As option prices reveal the market's collective belief of an underlying asset's future value, a combinatorial options market enables the expression of aggregate belief about future correlations among assets. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: To appear in EC21

arXiv:2102.07308 [pdf, other]

Log-time Prediction Markets for Interval Securities

Authors: Miroslav Dudík, Xintong Wang, David M. Pennock, David M. Rothschild

Abstract: We design a prediction market to recover a complete and fully general probability distribution over a random variable. Traders buy and sell interval securities that pay \$1 if the outcome falls into an interval and \$0 otherwise. Our market takes the form of a central automated market maker and allows traders to express interval endpoints of arbitrary precision. We present two designs in both of w… ▽ More We design a prediction market to recover a complete and fully general probability distribution over a random variable. Traders buy and sell interval securities that pay \$1 if the outcome falls into an interval and \$0 otherwise. Our market takes the form of a central automated market maker and allows traders to express interval endpoints of arbitrary precision. We present two designs in both of which market operations take time logarithmic in the number of intervals (that traders distinguish), providing the first computationally efficient market for a continuous variable. Our first design replicates the popular logarithmic market scoring rule (LMSR), but operates exponentially faster than a standard LMSR by exploiting its modularity properties to construct a balanced binary tree and decompose computations along the tree nodes. The second design consists of two or more parallel LMSR market makers that mediate submarkets of increasingly fine-grained outcome partitions. This design remains computationally efficient for all operations, including arbitrage removal across submarkets. It adds two additional benefits for the market designer: (1) the ability to express utility for information at various resolutions by assigning different liquidity values, and (2) the ability to guarantee a true constant bounded loss by appropriately decreasing the liquidity in each submarket. △ Less

Submitted 15 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: To appear in AAMAS 2021

arXiv:2101.01816 [pdf, ps, other]

Incentive-Compatible Forecasting Competitions

Authors: Jens Witkowski, Rupert Freeman, Jennifer Wortman Vaughan, David M. Pennock, Andreas Krause

Abstract: We initiate the study of incentive-compatible forecasting competitions in which multiple forecasters make predictions about one or more events and compete for a single prize. We have two objectives: (1) to incentivize forecasters to report truthfully and (2) to award the prize to the most accurate forecaster. Proper scoring rules incentivize truthful reporting if all forecasters are paid according… ▽ More We initiate the study of incentive-compatible forecasting competitions in which multiple forecasters make predictions about one or more events and compete for a single prize. We have two objectives: (1) to incentivize forecasters to report truthfully and (2) to award the prize to the most accurate forecaster. Proper scoring rules incentivize truthful reporting if all forecasters are paid according to their scores. However, incentives become distorted if only the best-scoring forecaster wins a prize, since forecasters can often increase their probability of having the highest score by reporting more extreme beliefs. In this paper, we introduce two novel forecasting competition mechanisms. Our first mechanism is incentive compatible and guaranteed to select the most accurate forecaster with probability higher than any other forecaster. Moreover, we show that in the standard single-event, two-forecaster setting and under mild technical conditions, no other incentive-compatible mechanism selects the most accurate forecaster with higher probability. Our second mechanism is incentive compatible when forecasters' beliefs are such that information about one event does not lead to belief updates on other events, and it selects the best forecaster with probability approaching 1 as the number of events grows. Our notion of incentive compatibility is more general than previous definitions of dominant strategy incentive compatibility in that it allows for reports to be correlated with the event outcomes. Moreover, our mechanisms are easy to implement and can be generalized to the related problems of outputting a ranking over forecasters and hiring a forecaster with high accuracy on future events. △ Less

Submitted 7 September, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: 38 pages. Relative to the previous version Appendix A and Theorem 5 are new. This version additionally contains some expanded exposition

arXiv:2007.12653 [pdf, other]

Beating Greedy For Approximating Reserve Prices in Multi-Unit VCG Auctions

Authors: Mahsa Derakhshan, David M. Pennock, Aleksandrs Slivkins

Abstract: We study the problem of finding personalized reserve prices for unit-demand buyers in multi-unit eager VCG auctions with correlated buyers. The input to this problem is a dataset of submitted bids of $n$ buyers in a set of auctions. The goal is to find a vector of reserve prices, one for each buyer, that maximizes the total revenue across all auctions. Roughgarden and Wang (2016) showed that thi… ▽ More We study the problem of finding personalized reserve prices for unit-demand buyers in multi-unit eager VCG auctions with correlated buyers. The input to this problem is a dataset of submitted bids of $n$ buyers in a set of auctions. The goal is to find a vector of reserve prices, one for each buyer, that maximizes the total revenue across all auctions. Roughgarden and Wang (2016) showed that this problem is APX-hard but admits a greedy $\frac{1}{2}$-approximation algorithm. Later, Derakhshan, Golrezai, and Paes Leme (2019) gave an LP-based algorithm achieving a $0.68$-approximation for the (important) special case of the problem with a single-item, thereby beating greedy. We show in this paper that the algorithm of Derakhshan et al. in fact does not beat greedy for the general multi-item problem. This raises the question of whether or not the general problem admits a better-than-$\frac{1}{2}$ approximation. In this paper, we answer this question in the affirmative and provide a polynomial-time algorithm with a significantly better approximation-factor of $0.63$. Our solution is based on a novel linear programming formulation, for which we propose two different rounding schemes. We prove that the best of these two and the no-reserve case (all-zero vector) is a $0.63$-approximation. △ Less

Submitted 24 July, 2020; originally announced July 2020.

arXiv:2002.08837 [pdf, other]

No-Regret and Incentive-Compatible Online Learning

Authors: Rupert Freeman, David M. Pennock, Chara Podimata, Jennifer Wortman Vaughan

Abstract: We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold. First, we want the learning algorithm to be no-regret with respect to the best fixed expert in hindsight. Second, we want incentive compatibility, a guarantee that e… ▽ More We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold. First, we want the learning algorithm to be no-regret with respect to the best fixed expert in hindsight. Second, we want incentive compatibility, a guarantee that each expert's best strategy is to report his true beliefs about the realization of each event. To achieve this goal, we build on the literature on wagering mechanisms, a type of multi-agent scoring rule. We provide algorithms that achieve no regret and incentive compatibility for myopic experts for both the full and partial information settings. In experiments on datasets from FiveThirtyEight, our algorithms have regret comparable to classic no-regret algorithms, which are not incentive-compatible. Finally, we identify an incentive-compatible algorithm for forward-looking strategic agents that exhibits diminishing regret in practice. △ Less

Submitted 30 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: Appears in ICML2020

arXiv:1905.00457 [pdf, other]

doi 10.1016/j.jet.2021.105234

Truthful Aggregation of Budget Proposals

Authors: Rupert Freeman, David M. Pennock, Dominik Peters, Jennifer Wortman Vaughan

Abstract: We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single aggregate division. Under $\ell_1$ preferences -- for which a voter's disutility is given by the $\ell_1$ distance b… ▽ More We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single aggregate division. Under $\ell_1$ preferences -- for which a voter's disutility is given by the $\ell_1$ distance between the aggregate division and the division he or she most prefers -- the social welfare-maximizing mechanism, which minimizes the average $\ell_1$ distance between the outcome and each voter's proposal, is incentive compatible (Goel et al. 2016). However, it fails to satisfy the natural fairness notion of proportionality, placing too much weight on majority preferences. Leveraging a connection between market prices and the generalized median rules of Moulin (1980), we introduce the independent markets mechanism, which is both incentive compatible and proportional. We unify the social welfare-maximizing mechanism and the independent markets mechanism by defining a broad class of moving phantom mechanisms that includes both. We show that every moving phantom mechanism is incentive compatible. Finally, we characterize the social welfare-maximizing mechanism as the unique Pareto-optimal mechanism in this class, suggesting an inherent tradeoff between Pareto optimality and proportionality. △ Less

Submitted 21 January, 2022; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: 28 pages, final journal version

Journal ref: Journal of Economic Theory, Volume 193, April 2021, 105234

arXiv:1612.04885 [pdf, other]

Crowdsourced Outcome Determination in Prediction Markets

Authors: Rupert Freeman, Sebastien Lahaie, David M. Pennock

Abstract: A prediction market is a useful means of aggregating information about a future event. To function, the market needs a trusted entity who will verify the true outcome in the end. Motivated by the recent introduction of decentralized prediction markets, we introduce a mechanism that allows for the outcome to be determined by the votes of a group of arbiters who may themselves hold stakes in the mar… ▽ More A prediction market is a useful means of aggregating information about a future event. To function, the market needs a trusted entity who will verify the true outcome in the end. Motivated by the recent introduction of decentralized prediction markets, we introduce a mechanism that allows for the outcome to be determined by the votes of a group of arbiters who may themselves hold stakes in the market. Despite the potential conflict of interest, we derive conditions under which we can incentivize arbiters to vote truthfully by using funds raised from market fees to implement a peer prediction mechanism. Finally, we investigate what parameter values could be used in a real-world implementation of our mechanism. △ Less

Submitted 14 December, 2016; originally announced December 2016.

arXiv:1602.07362 [pdf, other]

The Possibilities and Limitations of Private Prediction Markets

Authors: Rachel Cummings, David M. Pennock, Jennifer Wortman Vaughan

Abstract: We consider the design of private prediction markets, financial markets designed to elicit predictions about uncertain events without revealing too much information about market participants' actions or beliefs. Our goal is to design market mechanisms in which participants' trades or wagers influence the market's behavior in a way that leads to accurate predictions, yet no single participant has t… ▽ More We consider the design of private prediction markets, financial markets designed to elicit predictions about uncertain events without revealing too much information about market participants' actions or beliefs. Our goal is to design market mechanisms in which participants' trades or wagers influence the market's behavior in a way that leads to accurate predictions, yet no single participant has too much influence over what others are able to observe. We study the possibilities and limitations of such mechanisms using tools from differential privacy. We begin by designing a private one-shot wagering mechanism in which bettors specify a belief about the likelihood of a future event and a corresponding monetary wager. Wagers are redistributed among bettors in a way that more highly rewards those with accurate predictions. We provide a class of wagering mechanisms that are guaranteed to satisfy truthfulness, budget balance in expectation, and other desirable properties while additionally guaranteeing epsilon-joint differential privacy in the bettors' reported beliefs, and analyze the trade-off between the achievable level of privacy and the sensitivity of a bettor's payment to her own report. We then ask whether it is possible to obtain privacy in dynamic prediction markets, focusing our attention on the popular cost-function framework in which securities with payments linked to future events are bought and sold by an automated market maker. We show that under general conditions, it is impossible for such a market maker to simultaneously achieve bounded worst-case loss and epsilon-differential privacy without allowing the privacy guarantee to degrade extremely quickly as the number of trades grows, making such markets impractical in settings in which privacy is valued. We conclude by suggesting several avenues for potentially circumventing this lower bound. △ Less

Submitted 23 February, 2016; originally announced February 2016.

arXiv:1510.02045 [pdf, other]

Budget Constraints in Prediction Markets

Authors: Nikhil Devanur, Miroslav Dudík, Zhiyi Huang, David M. Pennock

Abstract: We give a detailed characterization of optimal trades under budget constraints in a prediction market with a cost-function-based automated market maker. We study how the budget constraints of individual traders affect their ability to impact the market price. As a concrete application of our characterization, we give sufficient conditions for a property we call budget additivity: two traders with… ▽ More We give a detailed characterization of optimal trades under budget constraints in a prediction market with a cost-function-based automated market maker. We study how the budget constraints of individual traders affect their ability to impact the market price. As a concrete application of our characterization, we give sufficient conditions for a property we call budget additivity: two traders with budgets B and B' and the same beliefs would have a combined impact equal to a single trader with budget B+B'. That way, even if a single trader cannot move the market much, a crowd of like-minded traders can have the same desired effect. When the set of payoff vectors associated with outcomes, with coordinates corresponding to securities, is affinely independent, we obtain that a generalization of the heavily-used logarithmic market scoring rule is budget additive, but the quadratic market scoring rule is not. Our results may be used both descriptively, to understand if a particular market maker is affected by budget constraints or not, and prescriptively, as a recipe to construct markets. △ Less

Submitted 7 October, 2015; originally announced October 2015.

Journal ref: In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, pages 238-247, 2015

arXiv:1302.3593 [pdf]

Toward a Market Model for Bayesian Inference

Authors: David M. Pennock, Michael P. Wellman

Abstract: We present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise map** from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities rep… ▽ More We present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise map** from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities represented by the Bayesian network. A computational market model such as this may provide a useful framework for investigations of belief aggregation, distributed probabilistic inference, resource allocation under uncertainty, and other problems of decentralized uncertainty. △ Less

Submitted 13 February, 2013; originally announced February 2013.

Comments: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996)

Report number: UAI-P-1996-PG-405-413

arXiv:1302.1564 [pdf]

Representing Aggregate Belief through the Competitive Equilibrium of a Securities Market

Authors: David M. Pennock, Michael P. Wellman

Abstract: We consider the problem of belief aggregation: given a group of individual agents with probabilistic beliefs over a set of uncertain events, formulate a sensible consensus or aggregate probability distribution over these events. Researchers have proposed many aggregation methods, although on the question of which is best the general consensus is that there is no consensus. We develop a market-ba… ▽ More We consider the problem of belief aggregation: given a group of individual agents with probabilistic beliefs over a set of uncertain events, formulate a sensible consensus or aggregate probability distribution over these events. Researchers have proposed many aggregation methods, although on the question of which is best the general consensus is that there is no consensus. We develop a market-based approach to this problem, where agents bet on uncertain events by buying or selling securities contingent on their outcomes. Each agent acts in the market so as to maximize expected utility at given securities prices, limited in its activity only by its own risk aversion. The equilibrium prices of goods in this market represent aggregate beliefs. For agents with constant risk aversion, we demonstrate that the aggregate probability exhibits several desirable properties, and is related to independently motivated techniques. We argue that the market-based approach provides a plausible mechanism for belief aggregation in multiagent systems, as it directly addresses self-motivated agent incentives for participation and for truthfulness, and can provide a decision-theoretic foundation for the "expert weights" often employed in centralized pooling techniques. △ Less

Submitted 6 February, 2013; originally announced February 2013.

Comments: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997)

Report number: UAI-P-1997-PG-392-400

arXiv:1301.7406 [pdf]

Logarithmic Time Parallel Bayesian Inference

Authors: David M. Pennock

Abstract: I present a parallel algorithm for exact probabilistic inference in Bayesian networks. For polytree networks with n variables, the worst-case time complexity is O(log n) on a CREW PRAM (concurrent-read, exclusive-write parallel random-access machine) with n processors, for any constant number of evidence variables. For arbitrary networks, the time complexity is O(r^{3w}*log n) for n processors, o… ▽ More I present a parallel algorithm for exact probabilistic inference in Bayesian networks. For polytree networks with n variables, the worst-case time complexity is O(log n) on a CREW PRAM (concurrent-read, exclusive-write parallel random-access machine) with n processors, for any constant number of evidence variables. For arbitrary networks, the time complexity is O(r^{3w}*log n) for n processors, or O(w*log n) for r^{3w}*n processors, where r is the maximum range of any variable, and w is the induced width (the maximum clique size), after moralizing and triangulating the network. △ Less

Submitted 30 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998)

Report number: UAI-P-1998-PG-431-438

arXiv:1301.6732 [pdf]

Graphical Representations of Consensus Belief

Authors: David M. Pennock, Michael P. Wellman

Abstract: Graphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Eve… ▽ More Graphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Even weaker conditions rule out local aggregation within conditional probability tables. On a more positive note, we show that if probabilities are combined with the logarithmic opinion pool (LogOP), then commonly held Markov independencies are maintained. This suggests a straightforward procedure for constructing a consensus Markov network. We describe an algorithm for computing the LogOP with time complexity comparable to that of exact Bayesian inference. △ Less

Submitted 23 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI1999)

Report number: UAI-P-1999-PG-531-540

arXiv:1301.3886 [pdf]

Compact Securities Markets for Pareto Optimal Reallocation of Risk

Authors: David M. Pennock, Michael P. Wellman

Abstract: The emph{securities market} is the fundamental theoretical framework in economics and finance for resource allocation under uncertainty. Securities serve both to reallocate risk and to disseminate probabilistic information. emph{Complete} securities markets - which contain one security for every possible state of nature - support Pareto optimal allocations of risk. Complete markets suffer from the… ▽ More The emph{securities market} is the fundamental theoretical framework in economics and finance for resource allocation under uncertainty. Securities serve both to reallocate risk and to disseminate probabilistic information. emph{Complete} securities markets - which contain one security for every possible state of nature - support Pareto optimal allocations of risk. Complete markets suffer from the same exponential dependence on the number of underlying events as do joint probability distributions. We examine whether markets can be structured and "compacted" in the same manner as Bayesian network representations of joint distributions. We show that, if all agents' risk-neutral independencies agree with the independencies encoded in the market structure, then the market is emph{operationally complete}: risk is still Pareto optimally allocated, yet the number of securities can be exponentially smaller. For collections of agents of a certain type, agreement on Markov independencies is sufficient to admit compact and operationally complete markets. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-481-488

arXiv:1301.3885 [pdf]

Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach

Authors: David M. Pennock, Eric J. Horvitz, Steve Lawrence, C. Lee Giles

Abstract: The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. Such systems leverage knowledge about the known preferences of multiple users to recommend items of interest to other users. CF methods have been harnessed to make recommendations about such items as web pages, movies, books, and toys. Researchers have proposed and evaluated ma… ▽ More The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. Such systems leverage knowledge about the known preferences of multiple users to recommend items of interest to other users. CF methods have been harnessed to make recommendations about such items as web pages, movies, books, and toys. Researchers have proposed and evaluated many approaches for generating recommendations. We describe and evaluate a new method called emph{personality diagnosis (PD)}. Given a user's preferences for some items, we compute the probability that he or she is of the same "personality type" as other users, and, in turn, the probability that he or she will like new items. PD retains some of the advantages of traditional similarity-weighting techniques in that all data is brought to bear on each prediction and new data can be added easily and incrementally. Additionally, PD has a meaningful probabilistic interpretation, which may be leveraged to justify, explain, and augment results. We report empirical results on the EachMovie database of movie ratings, and on user profile data collected from the CiteSeer digital library of Computer Science research papers. The probabilistic framework naturally supports a variety of descriptive measurements - in particular, we consider the applicability of a value of information (VOI) computation. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-473-480

arXiv:1301.2303 [pdf]

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments

Authors: Alexandrin Popescul, Lyle H. Ungar, David M Pennock, Steve Lawrence

Abstract: Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and (largely ad-hoc) hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann's [1999] aspect model to incorporate three-way co-occurrence… ▽ More Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and (largely ad-hoc) hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann's [1999] aspect model to incorporate three-way co-occurrence data among users, items, and item content. The relative influence of collaboration data versus content data is not imposed as an exogenous parameter, but rather emerges naturally from the given data sources. Global probabilistic models coupled with standard Expectation Maximization (EM) learning algorithms tend to drastically overfit in sparse-data situations, as is typical in recommendation applications. We show that secondary content information can often be used to overcome sparsity. Experiments on data from the ResearchIndex library of Computer Science publications show that appropriate mixture models incorporating secondary data produce significantly better quality recommenders than k-nearest neighbors (k-NN). Global probabilistic models also allow more general inferences than local methods like k-NN. △ Less

Submitted 10 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

Report number: UAI-P-2001-PG-437-444

arXiv:1301.0594 [pdf]

Modelling Information Incorporation in Markets, with Application to Detecting and Explaining Events

Authors: David M Pennock, Sandip Debnath, Eric Glover, C. Lee Giles

Abstract: We develop a model of how information flows into a market, and derive algorithms for automatically detecting and explaining relevant events. We analyze data from twenty-two "political stock markets" (i.e., betting markets on political outcomes) on the Iowa Electronic Market (IEM). We prove that, under certain efficiency assumptions, prices in such betting markets will on average approach the corr… ▽ More We develop a model of how information flows into a market, and derive algorithms for automatically detecting and explaining relevant events. We analyze data from twenty-two "political stock markets" (i.e., betting markets on political outcomes) on the Iowa Electronic Market (IEM). We prove that, under certain efficiency assumptions, prices in such betting markets will on average approach the correct outcomes over time, and show that IEM data conforms closely to the theory. We present a simple model of a betting market where information is revealed over time, and show a qualitative correspondence between the model and real market data. We also present an algorithm for automatically detecting significant events and generating semantic explanations of their origin. The algorithm operates by discovering significant changes in vocabulary on online news sources (using expected entropy loss) that align with major price spikes in related betting markets. △ Less

Submitted 12 December, 2012; originally announced January 2013.

Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Report number: UAI-P-2002-PG-405-413

arXiv:1212.2477 [pdf]

1 Billion Pages = 1 Million Dollars? Mining the Web to Play "Who Wants to be a Millionaire?"

Authors: Shyong, K. Lam, David M Pennock, Dan Cosley, Steve Lawrence

Abstract: We exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show 'Who Wants To Be A Millionaire?' The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple information retrieval algorithms, and… ▽ More We exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show 'Who Wants To Be A Millionaire?' The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple information retrieval algorithms, and multiple search engines; results are combined in the spirit of ensemble learning using an adaptive weighting scheme. Empirically, the system correctly answers about 75% of questions from the Millionaire CD-ROM, 3rd edition - general-interest trivia questions often about popular culture and common knowledge. The decision-making module chooses from allowable actions in the game in order to maximize expected risk-adjusted winnings, where the estimated probability of answering correctly is a function of past performance and confidence in in correctly answering the current question. When given a six question head start (i.e., when starting from the $2,000 level), we find that the system performs about as well on average as humans starting at the beginning. Our system demonstrates the potential of simple but well-chosen techniques for mining answers from unstructured information such as the web. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-337-345

arXiv:1206.6814 [pdf]

An Empirical Comparison of Algorithms for Aggregating Expert Predictions

Authors: Varsha Dani, Omid Madani, David M Pennock, Sumit Sanghai, Brian Galebach

Abstract: Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from a… ▽ More Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert's prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

Report number: UAI-P-2006-PG-106-113

arXiv:1206.5252 [pdf]

A Utility Framework for Bounded-Loss Market Makers

Authors: Yiling Chen, David M Pennock

Abstract: We introduce a class of utility-based market makers that always accept orders at their risk-neutral prices. We derive necessary and sufficient conditions for such market makers to have bounded loss. We prove that hyperbolic absolute risk aversion utility market makers are equivalent to weighted pseudospherical scoring rule market makers. In particular, Hanson's logarithmic scoring rule market make… ▽ More We introduce a class of utility-based market makers that always accept orders at their risk-neutral prices. We derive necessary and sufficient conditions for such market makers to have bounded loss. We prove that hyperbolic absolute risk aversion utility market makers are equivalent to weighted pseudospherical scoring rule market makers. In particular, Hanson's logarithmic scoring rule market maker corresponds to a negative exponential utility market maker in our framework. We describe a third equivalent formulation based on maintaining a cost function that seems most natural for implementation purposes, and we illustrate how to translate among the three equivalent formulations. We examine the tradeoff between the market's liquidity and the market maker's worst-case loss. For a fixed bound on worst-case loss, some market makers exhibit greater liquidity near uniform prices and some exhibit greater liquidity near extreme prices, but no market maker can exhibit uniformly greater liquidity in all regimes. For a fixed minimum liquidity level, we give the lower bound of market maker's worst-case loss under some regularity conditions. △ Less

Submitted 20 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

Report number: UAI-P-2007-PG-49-56

arXiv:1202.3756 [pdf]

Price Updating in Combinatorial Prediction Markets with Bayesian Networks

Authors: David M. Pennock, Lirong Xia

Abstract: To overcome the #P-hardness of computing/updating prices in logarithm market scoring rule-based (LMSR-based) combinatorial prediction markets, Chen et al. [5] recently used a simple Bayesian network to represent the prices of securities in combinatorial predictionmarkets for tournaments, and showed that two types of popular securities are structure preserving. In this paper, we significantly exten… ▽ More To overcome the #P-hardness of computing/updating prices in logarithm market scoring rule-based (LMSR-based) combinatorial prediction markets, Chen et al. [5] recently used a simple Bayesian network to represent the prices of securities in combinatorial predictionmarkets for tournaments, and showed that two types of popular securities are structure preserving. In this paper, we significantly extend this idea by employing Bayesian networks in general combinatorial prediction markets. We reveal a very natural connection between LMSR-based combinatorial prediction markets and probabilistic belief aggregation,which leads to a complete characterization of all structure preserving securities for decomposable network structures. Notably, the main results by Chen et al. [5] are corollaries of our characterization. We then prove that in order for a very basic set of securities to be structure preserving, the graph of the Bayesian network must be decomposable. We also discuss some approximation techniques for securities that are not structure preserving. △ Less

Submitted 14 February, 2012; originally announced February 2012.

Report number: UAI-P-2011-PG-581-588

arXiv:0802.1362 [pdf, ps, other]

Complexity of Combinatorial Market Makers

Authors: Yiling Chen, Lance Fortnow, Nicolas Lambert, David M. Pennock, Jennifer Wortman

Abstract: We analyze the computational complexity of market maker pricing algorithms for combinatorial prediction markets. We focus on Hanson's popular logarithmic market scoring rule market maker (LMSR). Our goal is to implicitly maintain correct LMSR prices across an exponentially large outcome space. We examine both permutation combinatorics, where outcomes are permutations of objects, and Boolean comb… ▽ More We analyze the computational complexity of market maker pricing algorithms for combinatorial prediction markets. We focus on Hanson's popular logarithmic market scoring rule market maker (LMSR). Our goal is to implicitly maintain correct LMSR prices across an exponentially large outcome space. We examine both permutation combinatorics, where outcomes are permutations of objects, and Boolean combinatorics, where outcomes are combinations of binary events. We look at three restrictive languages that limit what traders can bet on. Even with severely limited languages, we find that LMSR pricing is $\SP$-hard, even when the same language admits polynomial-time matching without the market maker. We then propose an approximation technique for pricing permutation markets based on a recent algorithm for online permutation learning. The connections we draw between LMSR pricing and the vast literature on online learning with expert advice may be of independent interest. △ Less

Submitted 10 February, 2008; originally announced February 2008.

ACM Class: J.4

arXiv:cs/0203024 [pdf, ps, other]

The structure of broad topics on the Web

Authors: Soumen Chakrabarti, Mukul M. Joshi, Kunal Punera, David M. Pennock

Abstract: The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the… ▽ More The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of content-based clusters and communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measure the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems. △ Less

Submitted 20 March, 2002; originally announced March 2002.

Comments: PDF, HTML, LaTeX source, all images

ACM Class: H.5.4; H.5.3; H.1.0

Showing 1–24 of 24 results for author: Pennock, D M