Search | arXiv e-print repository

Online Learning in Betting Markets: Profit versus Prediction

Authors: Haiqing Zhu, Alexander Soen, Yun Kuen Cheung, Lexing Xie

Abstract: We examine two types of binary betting markets, whose primary goal is for profit (such as sports gambling) or to gain information (such as prediction markets). We articulate the interplay between belief and price-setting to analyse both types of markets, and show that the goals of maximising bookmaker profit and eliciting information are fundamentally incompatible. A key insight is that profit hin… ▽ More We examine two types of binary betting markets, whose primary goal is for profit (such as sports gambling) or to gain information (such as prediction markets). We articulate the interplay between belief and price-setting to analyse both types of markets, and show that the goals of maximising bookmaker profit and eliciting information are fundamentally incompatible. A key insight is that profit hinges on the deviation between (the distribution of) bettor and true beliefs, and that heavier tails in bettor belief distribution imply higher profit. Our algorithmic contribution is to introduce online learning methods for price-setting. Traditionally bookmakers update their prices rather infrequently, we present two algorithms that guide price updates upon seeing each bet, assuming very little of bettor belief distributions. The online pricing algorithm achieves stochastic regret of $\mathcal{O}(\sqrt{T})$ against the worst local maximum, or $ \mathcal{O}(\sqrt{T \log T}) $ with high probability against the global maximum under fair odds. More broadly, the inherent trade-off between profit and information-seeking in binary betting may inspire new understandings of large-scale multi-agent behaviour. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2312.07146 [pdf, other]

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Authors: Lifan Luo, Boyang Zhang, Zhijie Peng, Yik Kin Cheung, Guanlan Zhang, Zhigang Li, Michael Yu Wang, Hongyu Yu

Abstract: As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating… ▽ More As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating the need for complex modality conversion. Stereo units with far-focus lenses can see through the transparent elastomer for depth estimation beyond the contact surface. Simultaneously, tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing, proving its capability for reliable external object depth estimation and precise measurement of tangential and normal contact forces. The dual modalities and compact design make the sensor a versatile tool for robotic manipulation. △ Less

Submitted 15 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2302.06226 [pdf, other]

doi 10.1145/3543507.3583315

Stability and Efficiency of Personalised Cultural Markets

Authors: Haiqing Zhu, Yun Kuen Cheung, Lexing Xie

Abstract: This work is concerned with the dynamics of online cultural markets, namely, attention allocation of many users on a set of digital goods with infinite supply. Such dynamic is important in sha** processes and outcomes in society, from trending items in entertainment, collective knowledge creation, to election outcomes. The outcomes of online cultural markets are susceptible to intricate social i… ▽ More This work is concerned with the dynamics of online cultural markets, namely, attention allocation of many users on a set of digital goods with infinite supply. Such dynamic is important in sha** processes and outcomes in society, from trending items in entertainment, collective knowledge creation, to election outcomes. The outcomes of online cultural markets are susceptible to intricate social influence dynamics, particularly so when the community comprises consumers with heterogeneous interests. This has made formal analysis of these markets improbable. In this paper, we remedy this by establishing robust connections between influence dynamics and optimization processes, in trial-offer markets where the consumer preferences are modelled by multinomial logit. Among other results, we show that the proportional-response-esque influence dynamic is equivalent to stochastic mirror descent on a convex objective function, thus leading to a stable and predictable outcome. When all consumers are homogeneous, the objective function has a natural interpretation as a weighted sum of efficiency and diversity of the culture market. In simulations driven by real-world preferences collected from a large-scale recommender system, we observe that ranking strategies aligned with the underlying heterogeneous preferences are more stable, and achieves higher efficiency and diversity. In simulations driven by real-world preferences collected from a large-scale recommender system, we observe that ranking strategies aligned with the underlying heterogeneous preferences are more stable, and achieves higher efficiency and diversity. △ Less

Submitted 24 April, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

arXiv:2112.02346 [pdf, other]

doi 10.1145/3490422.3502360

Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

Authors: Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

Abstract: FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency… ▽ More FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs than via the direct use of off-the-shelf, hand-designed networks. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. Choosing appropriate K a priori is challenging, and doing so at even high granularity, e.g. per layer, is a time-consuming and error-prone process that leaves FPGAs' spatial flexibility underexploited. Furthermore, prior works see LUT inputs connected randomly, which does not guarantee a good choice of network topology. To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference. By removing LUT inputs determined to be of low importance, our method increases the efficiency of the resultant accelerators. Our GPU-friendly solution to LUT input removal is capable of processing large topologies during their training with negligible slowdown. With logic shrinkage, we better the area and energy efficiency of the best-performing LUTNet implementation of the CNV network classifying CIFAR-10 by 1.54x and 1.31x, respectively, while matching its accuracy. This implementation also reaches 2.71x the area efficiency of an equally accurate, heavily pruned BNN. On ImageNet with the Bi-Real Net architecture, employment of logic shrinkage results in a post-synthesis area reduction of 2.67x vs LUTNet, allowing for implementation that was previously impossible on today's largest FPGAs. △ Less

Submitted 2 January, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: Accepted manuscript uploaded 04/12/21. DOA 22/11/21

arXiv:2106.12332 [pdf, other]

From Griefing to Stability in Blockchain Mining Economies

Authors: Yun Kuen Cheung, Stefanos Leonardos, Georgios Piliouras, Shyam Sridhar

Abstract: We study a game-theoretic model of blockchain mining economies and show that griefing, a practice according to which participants harm other participants at some lesser cost to themselves, is a prevalent threat at its Nash equilibria. The proof relies on a generalization of evolutionary stability to non-homogeneous populations via griefing factors (ratios that measure network losses relative to de… ▽ More We study a game-theoretic model of blockchain mining economies and show that griefing, a practice according to which participants harm other participants at some lesser cost to themselves, is a prevalent threat at its Nash equilibria. The proof relies on a generalization of evolutionary stability to non-homogeneous populations via griefing factors (ratios that measure network losses relative to deviator's own losses) which leads to a formal theoretical argument for the dissipation of resources, consolidation of power and high entry barriers that are currently observed in practice. A critical assumption in this type of analysis is that miners' decisions have significant influence in aggregate network outcomes (such as network hashrate). However, as networks grow larger, the miner's interaction more closely resembles a distributed production economy or Fisher market and its stability properties change. In this case, we derive a proportional response (PR) update protocol which converges to market equilibria at which griefing is irrelevant. Convergence holds for a wide range of miners risk profiles and various degrees of resource mobility between blockchains with different mining technologies. Our empirical findings in a case study with four mineable cryptocurrencies suggest that risk diversification, restricted mobility of resources (as enforced by different mining technologies) and network growth, all are contributing factors to the stability of the inherently volatile blockchain ecosystem. △ Less

Submitted 23 June, 2021; originally announced June 2021.

MSC Class: 91B54; 91B55; 91A22; 91A26; 91-10;

arXiv:2106.04748 [pdf, other]

Online Optimization in Games via Control Theory: Connecting Regret, Passivity and Poincaré Recurrence

Authors: Yun Kuen Cheung, Georgios Piliouras

Abstract: We present a novel control-theoretic understanding of online optimization and learning in games, via the notion of passivity. Passivity is a fundamental concept in control theory, which abstracts energy conservation and dissipation in physical systems. It has become a standard tool in analysis of general feedback systems, to which game dynamics belong. Our starting point is to show that all contin… ▽ More We present a novel control-theoretic understanding of online optimization and learning in games, via the notion of passivity. Passivity is a fundamental concept in control theory, which abstracts energy conservation and dissipation in physical systems. It has become a standard tool in analysis of general feedback systems, to which game dynamics belong. Our starting point is to show that all continuous-time Follow-the-Regularized-Leader (FTRL) dynamics, which include the well-known Replicator Dynamic, are lossless, i.e. it is passive with no energy dissipation. Interestingly, we prove that passivity implies bounded regret, connecting two fundamental primitives of control theory and online optimization. The observation of energy conservation in FTRL inspires us to present a family of lossless learning dynamics, each of which has an underlying energy function with a simple gradient structure. This family is closed under convex combination; as an immediate corollary, any convex combination of FTRL dynamics is lossless and thus has bounded regret. This allows us to extend the framework of Fox and Shamma [Games, 2013] to prove not just global asymptotic stability results for game dynamics, but Poincaré recurrence results as well. Intuitively, when a lossless game (e.g. graphical constant-sum game) is coupled with lossless learning dynamics, their feedback interconnection is also lossless, which results in a pendulum-like energy-preserving recurrent behavior, generalizing the results of Piliouras and Shamma [SODA, 2014] and Mertikopoulos, Papadimitriou and Piliouras [SODA, 2018]. △ Less

Submitted 15 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: In ICML 2021

arXiv:2103.08529 [pdf, other]

Learning in Markets: Greed Leads to Chaos but Following the Price is Right

Authors: Yun Kuen Cheung, Stefanos Leonardos, Georgios Piliouras

Abstract: We study learning dynamics in distributed production economies such as blockchain mining, peer-to-peer file sharing and crowdsourcing. These economies can be modelled as multi-product Cournot competitions or all-pay auctions (Tullock contests) when individual firms have market power, or as Fisher markets with quasi-linear utilities when every firm has negligible influence on market outcomes. In th… ▽ More We study learning dynamics in distributed production economies such as blockchain mining, peer-to-peer file sharing and crowdsourcing. These economies can be modelled as multi-product Cournot competitions or all-pay auctions (Tullock contests) when individual firms have market power, or as Fisher markets with quasi-linear utilities when every firm has negligible influence on market outcomes. In the former case, we provide a formal proof that Gradient Ascent (GA) can be Li-Yorke chaotic for a step size as small as $Θ(1/n)$, where $n$ is the number of firms. In stark contrast, for the Fisher market case, we derive a Proportional Response (PR) protocol that converges to market equilibrium. The positive results on the convergence of the PR dynamics are obtained in full generality, in the sense that they hold for Fisher markets with \emph{any} quasi-linear utility functions. Conversely, the chaos results for the GA dynamics are established even in the simplest possible setting of two firms and one good, and they hold for a wide range of price functions with different demand elasticities. Our findings suggest that by considering multi-agent interactions from a market rather than a game-theoretic perspective, we can formally derive natural learning protocols which are stable and converge to effective outcomes rather than being chaotic. △ Less

Submitted 17 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

MSC Class: 91A26; 91B55;

arXiv:2102.04270 [pdf, other]

Enabling Binary Neural Network Training on the Edge

Authors: Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

Abstract: The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the co… ▽ More The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this article, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions while inducing little to no accuracy loss vs Courbariaux & Bengio's standard approach. These decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees memory requirement reductions of 3--5$\times$, while reaching similar test accuracy in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78$\times$ memory reduction. Our work is open-source, and includes the Raspberry Pi-targeted prototype we used to verify our modeled memory decreases and capture the associated energy drops. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency, and safeguarding end-user privacy. △ Less

Submitted 24 September, 2023; v1 submitted 8 February, 2021; originally announced February 2021.

arXiv:2008.00540 [pdf, ps, other]

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions

Authors: Yun Kuen Cheung, Yixin Tao

Abstract: Machine learning processes, e.g. ''learning in games'', can be viewed as non-linear dynamical systems. In general, such systems exhibit a wide spectrum of behaviors, ranging from stability/recurrence to the undesirable phenomena of chaos (or ''butterfly effect''). Chaos captures sensitivity of round-off errors and can severely affect predictability and reproducibility of ML systems, but AI/ML comm… ▽ More Machine learning processes, e.g. ''learning in games'', can be viewed as non-linear dynamical systems. In general, such systems exhibit a wide spectrum of behaviors, ranging from stability/recurrence to the undesirable phenomena of chaos (or ''butterfly effect''). Chaos captures sensitivity of round-off errors and can severely affect predictability and reproducibility of ML systems, but AI/ML community's understanding of it remains rudimentary. It has a lot out there that await exploration. Recently, Cheung and Piliouras employed volume-expansion argument to show that Lyapunov chaos occurs in the cumulative payoff space, when some popular learning algorithms, including Multiplicative Weights Update (MWU), Follow-the-Regularized-Leader (FTRL) and Optimistic MWU (OMWU), are used in several subspaces of games, e.g. zero-sum, coordination or graphical constant-sum games. It is natural to ask: can these results generalize to much broader families of games? We take on a game decomposition approach and answer the question affirmatively. Among other results, we propose a notion of ''matrix domination'' and design a linear program, and use them to characterize bimatrix games where MWU is Lyapunov chaotic almost everywhere. Such family of games has positive Lebesgue measure in the bimatrix game space, indicating that chaos is a substantial issue of learning in games. For multi-player games, we present a local equivalence of volume change between general games and graphical games, which is used to perform volume and chaos analyses of MWU and OMWU in potential games. △ Less

Submitted 2 August, 2020; originally announced August 2020.

arXiv:2005.13996 [pdf, ps, other]

Chaos, Extremism and Optimism: Volume Analysis of Learning in Games

Authors: Yun Kuen Cheung, Georgios Piliouras

Abstract: We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games. Such analyses provide new insights into these game dynamical systems, which seem hard to achieve via the classical techniques within Computer Science and Machine Learning. The first step is to examine these dynamics not in their origin… ▽ More We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games. Such analyses provide new insights into these game dynamical systems, which seem hard to achieve via the classical techniques within Computer Science and Machine Learning. The first step is to examine these dynamics not in their original space (simplex of actions) but in a dual space (aggregate payoff space of actions). The second step is to explore how the volume of a set of initial conditions evolves over time when it is pushed forward according to the algorithm. This is reminiscent of approaches in Evolutionary Game Theory where replicator dynamics, the continuous-time analogue of MWU, is known to always preserve volume in all games. Interestingly, when we examine discrete-time dynamics, both the choice of the game and the choice of the algorithm play a critical role. So whereas MWU expands volume in zero-sum games and is thus Lyapunov chaotic, we show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior. However, we also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts. Using these tools, we prove two novel, rather negative properties of MWU in zero-sum games: (1) Extremism: even in games with unique fully mixed Nash equilibrium, the system recurrently gets stuck near pure-strategy profiles, despite them being clearly unstable from game theoretic perspective. (2) Unavoidability: given any set of good points (with your own interpretation of "good"), the system cannot avoid bad points indefinitely. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: 20 pages, 4 figures

arXiv:1910.12625 [pdf, other]

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

Abstract: Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, i… ▽ More Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is capable of implementing far more than an XNOR: it can perform any K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We describe the realization of both unrolled and tiled LUTNet architectures, with the latter facilitating smaller, less power-hungry deployment over the former while sacrificing area and energy efficiency along with throughput. For both varieties, we demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarized neural network implementation, we achieve up to twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable. △ Less

Submitted 2 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.00938. Accepted manuscript uploaded 02/03/20. DOA 01/03/20

arXiv:1910.10075 [pdf, other]

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Authors: Yiren Zhao, Xitong Gao, Xuan Guo, Junyi Liu, Erwei Wang, Robert Mullins, Peter Y. K. Cheung, George Constantinides, Cheng-Zhong Xu

Abstract: Modern deep Convolutional Neural Networks (CNNs) are computationally demanding, yet real applications often require high throughput and low latency. To help tackle these problems, we propose Tomato, a framework designed to automate the process of generating efficient CNN accelerators. The generated design is pipelined and each convolution layer uses different arithmetics at various precisions. Usi… ▽ More Modern deep Convolutional Neural Networks (CNNs) are computationally demanding, yet real applications often require high throughput and low latency. To help tackle these problems, we propose Tomato, a framework designed to automate the process of generating efficient CNN accelerators. The generated design is pipelined and each convolution layer uses different arithmetics at various precisions. Using Tomato, we showcase state-of-the-art multi-precision multi-arithmetic networks, including MobileNet-V1, running on FPGAs. To our knowledge, this is the first multi-precision multi-arithmetic auto-generation framework for CNNs. In software, Tomato fine-tunes pretrained networks to use a mixture of short powers-of-2 and fixed-point weights with a minimal loss in classification accuracy. The fine-tuned parameters are combined with the templated hardware designs to automatically produce efficient inference circuits in FPGAs. We demonstrate how our approach significantly reduces model sizes and computation complexities, and permits us to pack a complete ImageNet network onto a single FPGA without accessing off-chip memories for the first time. Furthermore, we show how Tomato produces implementations of networks with various sizes running on single or multiple FPGAs. To the best of our knowledge, our automatically generated accelerators outperform closest FPGA-based competitors by at least 2-4x for lantency and throughput; the generated accelerator runs ImageNet classification at a rate of more than 3000 frames per second. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: To be published in International Conference on Field Programmable Technology 2019

arXiv:1905.08396 [pdf, ps, other]

Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

Authors: Yun Kuen Cheung, Georgios Piliouras

Abstract: We establish that algorithmic experiments in zero-sum games "fail miserably" to confirm the unique, sharp prediction of maxmin equilibration. Contradicting nearly a century of economic thought that treats zero-sum games nearly axiomatically as the exemplar symbol of economic stability, we prove that no meaningful prediction can be made about the day-to-day behavior of online learning dynamics in z… ▽ More We establish that algorithmic experiments in zero-sum games "fail miserably" to confirm the unique, sharp prediction of maxmin equilibration. Contradicting nearly a century of economic thought that treats zero-sum games nearly axiomatically as the exemplar symbol of economic stability, we prove that no meaningful prediction can be made about the day-to-day behavior of online learning dynamics in zero-sum games. Concretely, Multiplicative Weights Updates (MWU) with constant step-size is Lyapunov chaotic in the dual (payoff) space. Simply put, let's assume that an observer asks the agents playing Matching-Pennies whether they prefer Heads or Tails (and by how much in terms of aggregate payoff so far). The range of possible answers consistent with any arbitrary small set of initial conditions blows up exponentially with time everywhere in the payoff space. This result is robust both algorithmically as well as game theoretically: 1) Algorithmic robustness: Chaos is robust to agents using any of a general sub-family of Follow-the-Regularized-Leader (FTRL) algorithms, the well known regret-minimizing dynamics, even when agents mix-and-match dynamics, use different or slowly decreasing step-sizes. 2) Game theoretic robustness: Chaos is robust to all affine variants of zero-sum games (strictly competitive games), network variants with arbitrary large number of agents and even to competitive settings beyond these. Our result is in stark contrast with the time-average convergence of online learning to (approximate) Nash equilibrium, a result widely reported as "(weak) convergence to equilibrium". △ Less

Submitted 29 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: 25 pages, 1 figure, accepted to COLT 2019

arXiv:1904.00938 [pdf, other]

LUTNet: Rethinking Inference in FPGA Soft Logic

Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

Abstract: Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binary values. Network binarisation on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is c… ▽ More Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binary values. Network binarisation on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is capable of implementing far more than an XNOR: it can perform any K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarised neural network implementation, we achieve twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: Accepted manuscript uploaded 01/04/19. DOA 03/03/19

arXiv:1901.06955 [pdf, other]

doi 10.1145/3309551

Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going

Authors: Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides

Abstract: Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms… ▽ More Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-efficient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. This article represents the first survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the field. △ Less

Submitted 8 July, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

Comments: Accepted manuscript uploaded 21/01/19. DOA 15/01/19

Journal ref: ACM Comput. Surv. 52, 2, Article 40 (May 2019), 39 pages

arXiv:1811.05087 [pdf, ps, other]

Parallel Stochastic Asynchronous Coordinate Descent: Tight Bounds on the Possible Parallelism

Authors: Yun Kuen Cheung, Richard Cole, Yixin Tao

Abstract: Several works have shown linear speedup is achieved by an asynchronous parallel implementation of stochastic coordinate descent so long as there is not too much parallelism. More specifically, it is known that if all updates are of similar duration, then linear speedup is possible with up to $Θ(\sqrt n L_{\max}/L_{\overline{\mathrm{res}}})$ processors, where $L_{\max}$ and… ▽ More Several works have shown linear speedup is achieved by an asynchronous parallel implementation of stochastic coordinate descent so long as there is not too much parallelism. More specifically, it is known that if all updates are of similar duration, then linear speedup is possible with up to $Θ(\sqrt n L_{\max}/L_{\overline{\mathrm{res}}})$ processors, where $L_{\max}$ and $L_{\overline{\mathrm{res}}}$ are suitable Lipschitz parameters. This paper shows the bound is tight for almost all possible values of these parameters. △ Less

Submitted 19 November, 2020; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1811.03254 [pdf, ps, other]

Fully Asynchronous Stochastic Coordinate Descent: A Tight Lower Bound on the Parallelism Achieving Linear Speedup

Authors: Yun Kuen Cheung, Richard Cole, Yixin Tao

Abstract: We seek tight bounds on the viable parallelism in asynchronous implementations of coordinate descent that achieves linear speedup. We focus on asynchronous coordinate descent (ACD) algorithms on convex functions which consist of the sum of a smooth convex part and a possibly non-smooth separable convex part. We quantify the shortfall in progress compared to the standard sequential stochastic gra… ▽ More We seek tight bounds on the viable parallelism in asynchronous implementations of coordinate descent that achieves linear speedup. We focus on asynchronous coordinate descent (ACD) algorithms on convex functions which consist of the sum of a smooth convex part and a possibly non-smooth separable convex part. We quantify the shortfall in progress compared to the standard sequential stochastic gradient descent. This leads to a simple yet tight analysis of the standard stochastic ACD in a partially asynchronous environment, generalizing and improving the bounds in prior work. We also give a considerably more involved analysis for general asynchronous environments in which the only constraint is that each update can overlap with at most q others. The new lower bound on the maximum degree of parallelism attaining linear speedup is tight and improves the best prior bound almost quadratically. △ Less

Submitted 2 August, 2020; v1 submitted 7 November, 2018; originally announced November 2018.

Comments: Accepted for publication in Mathematical Programming (Series A)

arXiv:1807.10577 [pdf, other]

Accuracy to Throughput Trade-offs for Reduced Precision Neural Networks on Reconfigurable Logic

Authors: Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip Leong, Peter Y. K. Cheung

Abstract: Modern CNN are typically based on floating point linear algebra based implementations. Recently, reduced precision NN have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accu… ▽ More Modern CNN are typically based on floating point linear algebra based implementations. Recently, reduced precision NN have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accuracy of the resultant network. In this work, we investigate the accuracy-throughput trade-off for various parameter precision applied to different types of NN models. We firstly propose a quantization training strategy that allows reduced precision NN inference with a lower memory footprint and competitive model accuracy. Then, we quantitatively formulate the relationship between data representation and hardware efficiency. Our experiments finally provide insightful observation. For example, one of our tests show 32-bit floating point is more hardware efficient than 1-bit parameters to achieve 99% MNIST accuracy. In general, 2-bit and 4-bit fixed point parameters show better hardware trade-off on small-scale datasets like MNIST and CIFAR-10 while 4-bit provide the best trade-off in large-scale tasks like AlexNet on ImageNet dataset within our tested problem domain. △ Less

Submitted 17 July, 2018; originally announced July 2018.

Comments: Accepted by ARC 2018

arXiv:1806.10952 [pdf, other]

Amortized Analysis of Asynchronous Price Dynamics

Authors: Yun Kuen Cheung, Richard Cole

Abstract: We extend a recently developed framework for analyzing asynchronous coordinate descent algorithms to show that an asynchronous version of tatonnement, a fundamental price dynamic widely studied in general equilibrium theory, converges toward a market equilibrium for Fisher markets with CES utilities or Leontief utilities, for which tatonnement is equivalent to coordinate descent. We extend a recently developed framework for analyzing asynchronous coordinate descent algorithms to show that an asynchronous version of tatonnement, a fundamental price dynamic widely studied in general equilibrium theory, converges toward a market equilibrium for Fisher markets with CES utilities or Leontief utilities, for which tatonnement is equivalent to coordinate descent. △ Less

Submitted 27 June, 2018; originally announced June 2018.

Comments: 17 pages. In ESA 2018. arXiv admin note: text overlap with arXiv:1612.09171

arXiv:1806.04746 [pdf, ps, other]

Dynamics of Distributed Updating in Fisher Markets

Authors: Yun Kuen Cheung, Richard Cole, Yixin Tao

Abstract: A major goal in Algorithmic Game Theory is to justify equilibrium concepts from an algorithmic and complexity perspective. One appealing approach is to identify natural distributed algorithms that converge quickly to an equilibrium. This paper established new convergence results for two generalizations of Proportional Response in Fisher markets with buyers having CES utility functions. The startin… ▽ More A major goal in Algorithmic Game Theory is to justify equilibrium concepts from an algorithmic and complexity perspective. One appealing approach is to identify natural distributed algorithms that converge quickly to an equilibrium. This paper established new convergence results for two generalizations of Proportional Response in Fisher markets with buyers having CES utility functions. The starting points are respectively a new convex and a new convex-concave formulation of such markets. The two generalizations correspond to suitable mirror descent algorithms applied to these formulations. Several of our new results are a consequence of new notions of strong Bregman convexity and of strong Bregman convex-concave functions, and associated linear rates of convergence, which may be of independent interest. Among other results, we analyze a damped generalized Proportional Response and show a linear rate of convergence in a Fisher market with buyers whose utility functions cover the full spectrum of CES utilities aside the extremes of linear and Leontief utilities; when these utilities are included, we obtain an empirical O(1/T) rate of convergence. △ Less

Submitted 12 June, 2018; originally announced June 2018.

arXiv:1805.06232 [pdf, other]

On Fair Division of Indivisible Items

Authors: Bhaskar Chaudhury, Yun Kuen Cheung, Jugal Garg, Naveen Garg, Martin Hoefer, Kurt Mehlhorn

Abstract: We consider the task of assigning indivisible goods to a set of agents in a fair manner. Our notion of fairness is Nash social welfare, i.e., the goal is to maximize the geometric mean of the utilities of the agents. Each good comes in multiple items or copies, and the utility of an agent diminishes as it receives more items of the same good. The utility of a bundle of items for an agent is the su… ▽ More We consider the task of assigning indivisible goods to a set of agents in a fair manner. Our notion of fairness is Nash social welfare, i.e., the goal is to maximize the geometric mean of the utilities of the agents. Each good comes in multiple items or copies, and the utility of an agent diminishes as it receives more items of the same good. The utility of a bundle of items for an agent is the sum of the utilities of the items in the bundle. Each agent has a utility cap beyond which he does not value additional items. We give a polynomial time approximation algorithm that maximizes Nash social welfare up to a factor of $e^{1/{e}} \approx 1.445$. The computed allocation is Pareto-optimal and approximates envy-freeness up to one item up to a factor of $2 + \eps$ △ Less

Submitted 10 May, 2019; v1 submitted 16 May, 2018; originally announced May 2018.

Journal ref: FSTTCS 2018

arXiv:1804.08017 [pdf, other]

Tracing Equilibrium in Dynamic Markets via Distributed Adaptation

Authors: Yun Kuen Cheung, Martin Hoefer, Paresh Nakhe

Abstract: Competitive equilibrium is a central concept in economics with numerous applications beyond markets, such as scheduling, fair allocation of goods, or bandwidth distribution in networks. Computation of competitive equilibria has received a significant amount of interest in algorithmic game theory, mainly for the prominent case of Fisher markets. Natural and decentralized processes like tatonnement… ▽ More Competitive equilibrium is a central concept in economics with numerous applications beyond markets, such as scheduling, fair allocation of goods, or bandwidth distribution in networks. Computation of competitive equilibria has received a significant amount of interest in algorithmic game theory, mainly for the prominent case of Fisher markets. Natural and decentralized processes like tatonnement and proportional response dynamics (PRD) converge quickly towards equilibrium in large classes of Fisher markets. Almost all of the literature assumes that the market is a static environment and that the parameters of agents and goods do not change over time. In contrast, many large real-world markets are subject to frequent and dynamic changes. In this paper, we provide the first provable performance guarantees of discrete-time tatonnement and PRD in markets that are subject to perturbation over time. We analyze the prominent class of Fisher markets with CES utilities and quantify the impact of changes in supplies of goods, budgets of agents, and utility functions of agents on the convergence of tatonnement to market equilibrium. Since the equilibrium becomes a dynamic object and will rarely be reached, our analysis provides bounds expressing the distance to equilibrium that will be maintained via tatonnement and PRD updates. Our results indicate that in many cases, tatonnement and PRD follow the equilibrium rather closely and quickly recover conditions of approximate market clearing. Our approach can be generalized to analyzing a general class of Lyapunov dynamical systems with changing system parameters, which might be of independent interest. △ Less

Submitted 21 April, 2018; originally announced April 2018.

arXiv:1802.07632 [pdf, other]

Spanning Tree Congestion and Computation of Generalized Győri-Lovász Partition

Authors: L. Sunil Chandran, Yun Kuen Cheung, Davis Issac

Abstract: We study a natural problem in graph sparsification, the Spanning Tree Congestion (\STC) problem. Informally, the \STC problem seeks a spanning tree with no tree-edge \emph{routing} too many of the original edges. The root of this problem dates back to at least 30 years ago, motivated by applications in network design, parallel computing and circuit design. Variants of the problem have also seen al… ▽ More We study a natural problem in graph sparsification, the Spanning Tree Congestion (\STC) problem. Informally, the \STC problem seeks a spanning tree with no tree-edge \emph{routing} too many of the original edges. The root of this problem dates back to at least 30 years ago, motivated by applications in network design, parallel computing and circuit design. Variants of the problem have also seen algorithmic applications as a preprocessing step of several important graph algorithms. For any general connected graph with $n$ vertices and $m$ edges, we show that its STC is at most $\mathcal{O}(\sqrt{mn})$, which is asymptotically optimal since we also demonstrate graphs with STC at least $Ω(\sqrt{mn})$. We present a polynomial-time algorithm which computes a spanning tree with congestion $\mathcal{O}(\sqrt{mn}\cdot \log n)$. We also present another algorithm for computing a spanning tree with congestion $\mathcal{O}(\sqrt{mn})$; this algorithm runs in sub-exponential time when $m = ω(n \log^2 n)$. For achieving the above results, an important intermediate theorem is \emph{generalized Győri-Lovász theorem}, for which Chen et al. gave a non-constructive proof. We give the first elementary and constructive proof by providing a local search algorithm with running time $\mathcal{O}^*\left( 4^n \right)$, which is a key ingredient of the above-mentioned sub-exponential time algorithm. We discuss a few consequences of the theorem concerning graph partitioning, which might be of independent interest. We also show that for any graph which satisfies certain \emph{expanding properties}, its STC is at most $\mathcal{O}(n)$, and a corresponding spanning tree can be computed in polynomial time. We then use this to show that a random graph has STC $Θ(n)$ with high probability. △ Less

Submitted 25 April, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

arXiv:1703.08790 [pdf, ps, other]

Steiner Point Removal --- Distant Terminals Don't (Really) Bother

Authors: Yun Kuen Cheung

Abstract: Given a weighted graph $G=(V,E,w)$ with a set of $k$ terminals $T\subset V$, the Steiner Point Removal problem seeks for a minor of the graph with vertex set $T$, such that the distance between every pair of terminals is preserved within a small multiplicative distortion. Kamma, Krauthgamer and Nguyen (SODA 2014, SICOMP 2015) used a ball-growing algorithm to show that the distortion is at most… ▽ More Given a weighted graph $G=(V,E,w)$ with a set of $k$ terminals $T\subset V$, the Steiner Point Removal problem seeks for a minor of the graph with vertex set $T$, such that the distance between every pair of terminals is preserved within a small multiplicative distortion. Kamma, Krauthgamer and Nguyen (SODA 2014, SICOMP 2015) used a ball-growing algorithm to show that the distortion is at most $\mathcal{O}(\log^5 k)$ for general graphs. In this paper, we improve the distortion bound to $\mathcal{O}(\log^2 k)$. The improvement is achieved based on a known algorithm that constructs terminal-distance exact-preservation minor with $\mathcal{O}(k^4)$ (which is independent of $|V|$) vertices, and also two tail bounds on the sum of independent exponential random variables, which allow us to show that it is unlikely for a non-terminal being contracted to a distant terminal. △ Less

Submitted 26 March, 2017; originally announced March 2017.

arXiv:1612.09171 [pdf, ps, other]

A Unified Approach to Analyzing Asynchronous Coordinate Descent and Tatonnement

Authors: Yun Kuen Cheung, Richard Cole

Abstract: This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process, although distributed and asynchronous variants have been studied since the 1980s. Coordinate de… ▽ More This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process, although distributed and asynchronous variants have been studied since the 1980s. Coordinate descent is a commonly studied version of gradient descent. In this paper, we focus on asynchronous coordinate descent on convex functions $F:\mathbb{R}^n\rightarrow\mathbb{R}$ of the form $F(x) = f(x) + \sum_{k=1}^n Ψ_k(x_k)$, where $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is a smooth convex function, and each $Ψ_k:\mathbb{R}\rightarrow\mathbb{R}$ is a univariate and possibly non-smooth convex function. Such functions occur in many data analysis and machine learning problems. We give new analyses of cyclic coordinate descent, a parallel asynchronous stochastic coordinate descent, and a rather general worst-case parallel asynchronous coordinate descent. For all of these, we either obtain sharply improved bounds, or provide the first analyses. Our analyses all use a common amortized framework. The application of this framework to the asynchronous stochastic version requires some new ideas, for it is not obvious how to ensure a uniform distribution where it is needed in the face of asynchronous actions that may undo uniformity. We believe that our approach may well be applicable to the analysis of other iterative asynchronous stochastic processes. We extend the framework to show that an asynchronous version of tatonnement, a fundamental price dynamic widely studied in general equilibrium theory, converges toward a market equilibrium for Fisher markets with CES utilities or Leontief utilities, for which tatonnement is equivalent to coordinate descent. △ Less

Submitted 29 December, 2016; originally announced December 2016.

Comments: 41 pages

arXiv:1604.08342 [pdf, other]

Graph Minors for Preserving Terminal Distances Approximately - Lower and Upper Bounds

Authors: Yun Kuen Cheung, Gramoz Goranci, Monika Henzinger

Abstract: Given a graph where vertices are partitioned into $k$ terminals and non-terminals, the goal is to compress the graph (i.e., reduce the number of non-terminals) using minor operations while preserving terminal distances approximately.The distortion of a compressed graph is the maximum multiplicative blow-up of distances between all pairs of terminals. We study the trade-off between the number of no… ▽ More Given a graph where vertices are partitioned into $k$ terminals and non-terminals, the goal is to compress the graph (i.e., reduce the number of non-terminals) using minor operations while preserving terminal distances approximately.The distortion of a compressed graph is the maximum multiplicative blow-up of distances between all pairs of terminals. We study the trade-off between the number of non-terminals and the distortion. This problem generalizes the Steiner Point Removal (SPR) problem, in which all non-terminals must be removed. We introduce a novel black-box reduction to convert any lower bound on distortion for the SPR problem into a super-linear lower bound on the number of non-terminals, with the same distortion, for our problem. This allows us to show that there exist graphs such that every minor with distortion less than $2~/~2.5~/~3$ must have $Ω(k^2)~/~Ω(k^{5/4})~/~Ω(k^{6/5})$ non-terminals, plus more trade-offs in between. The black-box reduction has an interesting consequence: if the tight lower bound on distortion for the SPR problem is super-constant, then allowing any $O(k)$ non-terminals will not help improving the lower bound to a constant. We also build on the existing results on spanners, distance oracles and connected 0-extensions to show a number of upper bounds for general graphs, planar graphs, graphs that exclude a fixed minor and bounded treewidth graphs. Among others, we show that any graph admits a minor with $O(\log k)$ distortion and $O(k^{2})$ non-terminals, and any planar graph admits a minor with $1+\varepsilon$ distortion and $\widetilde{O}((k/\varepsilon)^{2})$ non-terminals. △ Less

Submitted 28 April, 2016; originally announced April 2016.

Comments: An extended abstract will appear in Proceedings of ICALP 2016

ACM Class: G.2.2

arXiv:1604.05243 [pdf, ps, other]

Better Strategyproof Mechanisms without Payments or Prior --- An Analytic Approach

Authors: Yun Kuen Cheung

Abstract: We revisit the problem of designing strategyproof mechanisms for allocating divisible items among two agents who have linear utilities, where payments are disallowed and there is no prior information on the agents' preferences. The objective is to design strategyproof mechanisms which are competitive against the most efficient (but not strategyproof) mechanism. For the case with two items: (1)… ▽ More We revisit the problem of designing strategyproof mechanisms for allocating divisible items among two agents who have linear utilities, where payments are disallowed and there is no prior information on the agents' preferences. The objective is to design strategyproof mechanisms which are competitive against the most efficient (but not strategyproof) mechanism. For the case with two items: (1) We provide a set of sufficient conditions for strategyproofness. (2) We use an analytic approach to derive strategyproof mechanisms which are more competitive than all prior strategyproof mechanisms. (3) We improve the linear-program-based proof of Guo and Conitzer to show new upper bounds on competitive ratios. (4) We provide the first "mathematical" upper bound proof. For the cases with any number of items: (1) We build on the Partial Allocation mechanisms introduced by Cole et al. to design a strategyproof mechanism which is 0.67776-competitive, breaking the 2/3 barrier. (2) We propose a new subclass of strategyproof mechanisms called Dynamical-Increasing-Price mechanisms, where each agent purchases the items using virtual money, and the prices of the items depend on other agents' preferences. △ Less

Submitted 12 April, 2017; v1 submitted 18 April, 2016; originally announced April 2016.

Comments: 14 pages. This is the full version of and IJCAI 2016 conference paper of the same title

arXiv:1509.09147 [pdf, ps, other]

Combinatorial Auctions with Conflict-Based Externalities

Authors: Yun Kuen Cheung, Monika Henzinger, Martin Hoefer, Martin Starnberger

Abstract: Combinatorial auctions (CA) are a well-studied area in algorithmic mechanism design. However, contrary to the standard model, empirical studies suggest that a bidder's valuation often does not depend solely on the goods assigned to him. For instance, in adwords auctions an advertiser might not want his ads to be displayed next to his competitors' ads. In this paper, we propose and analyze several… ▽ More Combinatorial auctions (CA) are a well-studied area in algorithmic mechanism design. However, contrary to the standard model, empirical studies suggest that a bidder's valuation often does not depend solely on the goods assigned to him. For instance, in adwords auctions an advertiser might not want his ads to be displayed next to his competitors' ads. In this paper, we propose and analyze several natural graph-theoretic models that incorporate such negative externalities, in which bidders form a directed conflict graph with maximum out-degree $Δ$. We design algorithms and truthful mechanisms for social welfare maximization that attain approximation ratios depending on $Δ$. For CA, our results are twofold: (1) A lottery that eliminates conflicts by discarding bidders/items independent of the bids. It allows to apply any truthful $α$-approximation mechanism for conflict-free valuations and yields an $\mathcal{O}(αΔ)$-approximation mechanism. (2) For fractionally sub-additive valuations, we design a rounding algorithm via a novel combination of a semi-definite program and a linear program, resulting in a cone program; the approximation ratio is $\mathcal{O}((Δ\log \log Δ)/\log Δ)$. The ratios are almost optimal given existing hardness results. For the prominent application of adwords auctions, we present several algorithms for the most relevant scenario when the number of items is small. In particular, we design a truthful mechanism with approximation ratio $o(Δ)$ when the number of items is only logarithmic in the number of bidders. △ Less

Submitted 30 September, 2015; originally announced September 2015.

Comments: This is the full version of our WINE 2015 conference paper

arXiv:1412.0159 [pdf, ps, other]

Amortized Analysis on Asynchronous Gradient Descent

Authors: Yun Kuen Cheung, Richard Cole

Abstract: Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process. Distributed and asynchronous variants of gradient descent have been studied since the 1980s, and they have been experiencing a resurgence due to demand from large-scale machine learning problems running on multi-core processors… ▽ More Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process. Distributed and asynchronous variants of gradient descent have been studied since the 1980s, and they have been experiencing a resurgence due to demand from large-scale machine learning problems running on multi-core processors. We provide a version of asynchronous gradient descent (AGD) in which communication between cores is minimal and for which there is little synchronization overhead. We also propose a new timing model for its analysis. With this model, we give the first amortized analysis of AGD on convex functions. The amortization allows for bad updates (updates that increase the value of the convex function); in contrast, most prior work makes the strong assumption that every update must be significantly improving. Typically, the step sizes used in AGD are smaller than those used in its synchronous counterpart. We provide a method to determine the step sizes in AGD based on the Hessian entries for the convex function. In certain circumstances, the resulting step sizes are a constant fraction of those used in the corresponding synchronous algorithm, enabling the overall performance of AGD to improve linearly with the number of cores. We give two applications of our amortized analysis. △ Less

Submitted 29 November, 2014; originally announced December 2014.

Comments: 40 pages

arXiv:1211.2268 [pdf, ps, other]

Tatonnement in Ongoing Markets of Complementary Goods

Authors: Yun Kuen Cheung, Richard Cole, Ashish Rastogi

Abstract: This paper continues the study, initiated by Cole and Fleischer, of the behavior of a tatonnement price update rule in Ongoing Fisher Markets. The prior work showed fast convergence toward an equilibrium when the goods satisfied the weak gross substitutes property and had bounded demand and income elasticities. The current work shows that fast convergence also occurs for the following types of m… ▽ More This paper continues the study, initiated by Cole and Fleischer, of the behavior of a tatonnement price update rule in Ongoing Fisher Markets. The prior work showed fast convergence toward an equilibrium when the goods satisfied the weak gross substitutes property and had bounded demand and income elasticities. The current work shows that fast convergence also occurs for the following types of markets: - All pairs of goods are complements to each other, and - the demand and income elasticities are suitably bounded. In particular, these conditions hold when all buyers in the market are equipped with CES utilities, where all the parameters $ρ$, one per buyer, satisfy $-1 < ρ\le 0$. In addition, we extend the above result to markets in which a mixture of complements and substitutes occur. This includes characterizing a class of nested CES utilities for which fast convergence holds. An interesting technical contribution, which may be of independent interest, is an amortized analysis for handling asynchronous events in settings in which there are a mix of continuous changes and discrete events. △ Less

Submitted 9 November, 2012; originally announced November 2012.

Comments: 44 pages, ACM EC 2012

arXiv:1004.4901 [pdf, ps, other]

Fréchet Distance Problems in Weighted Regions

Authors: Yam Ki Cheung, Ovidiu Daescu

Abstract: We discuss two versions of the Fréchet distance problem in weighted planar subdivisions. In the first one, the distance between two points is the weighted length of the line segment joining the points. In the second one, the distance between two points is the length of the shortest path between the points. In both cases, we give algorithms for finding a (1+epsilon)-factor approximation of the Fréc… ▽ More We discuss two versions of the Fréchet distance problem in weighted planar subdivisions. In the first one, the distance between two points is the weighted length of the line segment joining the points. In the second one, the distance between two points is the length of the shortest path between the points. In both cases, we give algorithms for finding a (1+epsilon)-factor approximation of the Fréchet distance between two polygonal curves. We also consider the Fréchet distance between two polygonal curves among polyhedral obstacles in R^3 and present a (1+epsilon)-factor approximation algorithm. △ Less

Submitted 27 April, 2010; originally announced April 2010.

Comments: 24 pages 6 figures

arXiv:1004.1588 [pdf, ps, other]

Approximate Point-to-Face Shortest Paths in R^3

Authors: Yam Ki Cheung, Ovidiu Daescu

Abstract: We address the point-to-face approximate shortest path problem in R: Given a set of polyhedral obstacles with a total of n vertices, a source point s, an obstacle face f, and a real positive parameter epsilon, compute a path from s to f that avoids the interior of the obstacles and has length at most (1+epsilon) times the length of the shortest obstacle avoiding path from s to f. We present three… ▽ More We address the point-to-face approximate shortest path problem in R: Given a set of polyhedral obstacles with a total of n vertices, a source point s, an obstacle face f, and a real positive parameter epsilon, compute a path from s to f that avoids the interior of the obstacles and has length at most (1+epsilon) times the length of the shortest obstacle avoiding path from s to f. We present three approximation algorithms that take by extending three well-known "point-to-point" shortest path algorithms. △ Less

Submitted 9 April, 2010; originally announced April 2010.

Comments: 16 pages, Latex

arXiv:1003.0150 [pdf, ps, other]

Multidimensional Divide-and-Conquer and Weighted Digital Sums

Authors: Y. K. Cheung, Philippe Flajolet, Mordecai Golin, C. Y. James Lee

Abstract: This paper studies three types of functions arising separately in the analysis of algorithms that we analyze exactly using similar Mellin transform techniques. The first is the solution to a Multidimensional Divide-and-Conquer (MDC) recurrence that arises when solving problems on points in $d$-dimensional space. The second involves weighted digital sums. Write $n$ in its binary representation… ▽ More This paper studies three types of functions arising separately in the analysis of algorithms that we analyze exactly using similar Mellin transform techniques. The first is the solution to a Multidimensional Divide-and-Conquer (MDC) recurrence that arises when solving problems on points in $d$-dimensional space. The second involves weighted digital sums. Write $n$ in its binary representation $n=(b_i b_{i-1}... b_1 b_0)_2$ and set $S_M(n) = \sum_{t=0}^i t^{\bar{M}} b_t 2^t$. We analyze the average $TS_M(n) = \frac{1}{n}\sum_{j<n} S_M(j)$. The third is a different variant of weighted digital sums. Write $n$ as $n=2^{i_1} + 2^{i_2} + ... + 2^{i_k}$ with $i_1 > i_2 > ... > i_k\geq 0$ and set $W_M(n) = \sum_{t=1}^k t^M 2^{i_t}$. We analyze the average $TW_M(n) = \frac{1}{n}\sum_{j<n} W_M(j)$. We show that both the MDC functions and $TS_M(n)$ (with $d=M+1$) have solutions of the form $λ_d n \lg^{d-1}n + \sum_{m=0}^{d-2}(n\lg^m n)A_{d,m}(\lg n) + c_d,$ where $λ_d,c_d$ are constants and $A_{d,m}(u)$'s are periodic functions with period one (given by absolutely convergent Fourier series). We also show that $TW_M(n)$ has a solution of the form $n G_M(\lg n) + d_M \lg^M n + \sum_{d=0}^{M-1}(\lg^d n)G_{M,d}(\lg n),$ where $d_M$ is a constant, $G_M(u)$ and $G_{M,d}(u)$'s are again periodic functions with period one (given by absolutely convergent Fourier series). △ Less

Submitted 28 February, 2010; originally announced March 2010.

Comments: 44 pages, 8 figures

Showing 1–33 of 33 results for author: Cheung, Y K