-
Exploring and Improving Drafts in Blockwise Parallel Decoding
Authors:
Taehyeon Kim,
Ananda Theertha Suresh,
Kishore Papineni,
Michael Riley,
Sanjiv Kumar,
Adrian Benton
Abstract:
Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verifie…
▽ More
Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified and conditionally accepted by the autoregressive model. This paper contributes to the understanding and improvement of block drafts in two ways. First, we analyze the token distributions produced by multiple prediction heads. Secondly, we leverage this analysis to develop algorithms to improve BPD inference speed by refining the block drafts using n-gram and neural language models. Experiments demonstrate that refined block drafts yield a +5-21% increase in block efficiency (i.e., the number of accepted tokens from the block draft) across diverse datasets.
△ Less
Submitted 5 June, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Balancing Robustness and Sensitivity using Feature Contrastive Learning
Authors:
Seungyeon Kim,
Daniel Glasner,
Srikumar Ramalingam,
Cho-Jui Hsieh,
Kishore Papineni,
Sanjiv Kumar
Abstract:
It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns. In this paper, we discuss this trade-off between sensitivity and robustness to natural (non-adversarial) perturbations by introduc…
▽ More
It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns. In this paper, we discuss this trade-off between sensitivity and robustness to natural (non-adversarial) perturbations by introducing two notions: contextual feature utility and contextual feature sensitivity. We propose Feature Contrastive Learning (FCL) that encourages a model to be more sensitive to the features that have higher contextual utility. Empirical results demonstrate that models trained with FCL achieve a better balance of robustness and sensitivity, leading to improved generalization in the presence of noise on both vision and NLP datasets.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Text Segmentation by Cross Segment Attention
Authors:
Michal Lukasik,
Boris Dadachev,
Gonçalo Simões,
Kishore Papineni
Abstract:
Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a ne…
▽ More
Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a new state-of-the-art, reducing in particular the error rates by a large margin in all cases. We further analyze model sizes and find that we can build models with many fewer parameters while kee** good performance, thus facilitating real-world applications.
△ Less
Submitted 7 December, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Bidding for Representative Allocations for Display Advertising
Authors:
Arpita Ghosh,
Preston McAfee,
Kishore Papineni,
Sergei Vassilvitskii
Abstract:
Display advertising has traditionally been sold via guaranteed contracts -- a guaranteed contract is a deal between a publisher and an advertiser to allocate a certain number of impressions over a certain period, for a pre-specified price per impression. However, as spot markets for display ads, such as the RightMedia Exchange, have grown in prominence, the selection of advertisements to show on…
▽ More
Display advertising has traditionally been sold via guaranteed contracts -- a guaranteed contract is a deal between a publisher and an advertiser to allocate a certain number of impressions over a certain period, for a pre-specified price per impression. However, as spot markets for display ads, such as the RightMedia Exchange, have grown in prominence, the selection of advertisements to show on a given page is increasingly being chosen based on price, using an auction. As the number of participants in the exchange grows, the price of an impressions becomes a signal of its value. This correlation between price and value means that a seller implementing the contract through bidding should offer the contract buyer a range of prices, and not just the cheapest impressions necessary to fulfill its demand.
Implementing a contract using a range of prices, is akin to creating a mutual fund of advertising impressions, and requires {\em randomized bidding}. We characterize what allocations can be implemented with randomized bidding, namely those where the desired share obtained at each price is a non-increasing function of price. In addition, we provide a full characterization of when a set of campaigns are compatible and how to implement them with randomized bidding strategies.
△ Less
Submitted 5 October, 2009;
originally announced October 2009.