Search | arXiv e-print repository

Scaling Laws for Downstream Task Performance of Large Language Models

Authors: Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

Abstract: Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we… ▽ More Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the pretraining data and its size affect downstream performance (translation quality) as judged by two metrics: downstream cross-entropy and BLEU score. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and BLEU score improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream BLEU score with good accuracy using a log-law. However, there are also cases where moderate misalignment causes the BLEU score to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these observations, we provide new practical insights for choosing appropriate pretraining data. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2012.00738 [pdf, other]

Searching, Sorting, and Cake Cutting in Rounds

Authors: Simina Brânzei, Dimitris Paparas, Nicholas Recker

Abstract: We study searching and sorting in rounds motivated by a fair division question: given a cake cutting problem with $n$ players, compute a fair allocation in at most $k$ rounds of interaction with the players. Rounds interpolate between the simultaneous and the fully adaptive settings, also capturing parallel complexity. We find that proportional cake cutting in rounds is equivalent to sorting with… ▽ More We study searching and sorting in rounds motivated by a fair division question: given a cake cutting problem with $n$ players, compute a fair allocation in at most $k$ rounds of interaction with the players. Rounds interpolate between the simultaneous and the fully adaptive settings, also capturing parallel complexity. We find that proportional cake cutting in rounds is equivalent to sorting with rank queries in rounds. We design a protocol for proportional cake cutting in rounds, while lower bounds for sorting in rounds with rank queries were given by Alon and Azar. Inspired by the rank query model, we then consider two basic search problems: ordered and unordered search. In unordered search, we get an array $\vec{x}=(x_1, \ldots, x_n)$ and an element $z$ promised to be in $\vec{x}$. We have access to an oracle that receives queries of the form "Is $z$ at location $i$?" and answers "Yes" or "No". The goal is to find the location of $z$ with success probability at least $p$ in at most $k$ rounds of interaction with the oracle. We show the expected query complexity of randomized algorithms on a worst case input is $np\bigl(\frac{k+1}{2k}\bigr) \pm O(1)$, while that of deterministic algorithms on a worst case input distribution is $np \bigl(1 - \frac{k-1}{2k}p \bigr) \pm O(1)$. These bounds apply even to fully adaptive unordered search, where the ratio between the two complexities converges to $2-p$ as the size of the array grows. In ordered search, we get sorted array $\vec{x}=(x_1, \ldots, x_n)$ and element $z$ promised to be in $\vec{x}$. We have access to an oracle that gets comparison queries. Here we find that the expected query complexity of randomized algorithms on a worst case input and deterministic algorithms on a worst case input distribution is essentially the same: $p k \cdot n^{\frac{1}{k}} \pm O(1+pk)$. △ Less

Submitted 19 November, 2023; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:1702.07032 [pdf, ps, other]

On the Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer

Authors: Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis

Abstract: We show that the Revenue-Optimal Deterministic Mechanism Design problem for a single additive buyer is #P-hard, even when the distributions have support size 2 for each item and, more importantly, even when the optimal solution is guaranteed to be of a very simple kind: the seller picks a price for each individual item and a price for the grand bundle of all the items; the buyer can purchase eithe… ▽ More We show that the Revenue-Optimal Deterministic Mechanism Design problem for a single additive buyer is #P-hard, even when the distributions have support size 2 for each item and, more importantly, even when the optimal solution is guaranteed to be of a very simple kind: the seller picks a price for each individual item and a price for the grand bundle of all the items; the buyer can purchase either the grand bundle at its given price or any subset of items at their total individual prices. The following problems are also #P-hard, as immediate corollaries of the proof: 1. determining if individual item pricing is optimal for a given instance, 2. determining if grand bundle pricing is optimal, and 3. computing the optimal (deterministic) revenue. On the positive side, we show that when the distributions are i.i.d. with support size 2, the optimal revenue obtainable by any mechanism, even a randomized one, can be achieved by a simple solution of the above kind (individual item pricing with a discounted price for the grand bundle) and furthermore, it can be computed in polynomial time. The problem can be solved in polynomial time too when the number of items is constant. △ Less

Submitted 14 July, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

arXiv:1311.2138 [pdf, ps, other]

The Complexity of Optimal Multidimensional Pricing

Authors: Xi Chen, Ilias Diakonikolas, Dimitris Paparas, Xiaorui Sun, Mihalis Yannakakis

Abstract: We resolve the complexity of revenue-optimal deterministic auctions in the unit-demand single-buyer Bayesian setting, i.e., the optimal item pricing problem, when the buyer's values for the items are independent. We show that the problem of computing a revenue-optimal pricing can be solved in polynomial time for distributions of support size 2, and its decision version is NP-complete for distribut… ▽ More We resolve the complexity of revenue-optimal deterministic auctions in the unit-demand single-buyer Bayesian setting, i.e., the optimal item pricing problem, when the buyer's values for the items are independent. We show that the problem of computing a revenue-optimal pricing can be solved in polynomial time for distributions of support size 2, and its decision version is NP-complete for distributions of support size 3. We also show that the problem remains NP-complete for the case of identical distributions. △ Less

Submitted 9 November, 2013; originally announced November 2013.

arXiv:1211.4918 [pdf, other]

The Complexity of Non-Monotone Markets

Authors: Xi Chen, Dimitris Paparas, Mihalis Yannakakis

Abstract: We introduce the notion of non-monotone utilities, which covers a wide variety of utility functions in economic theory. We then prove that it is PPAD-hard to compute an approximate Arrow-Debreu market equilibrium in markets with linear and non-monotone utilities. Building on this result, we settle the long-standing open problem regarding the computation of an approximate Arrow-Debreu market equili… ▽ More We introduce the notion of non-monotone utilities, which covers a wide variety of utility functions in economic theory. We then prove that it is PPAD-hard to compute an approximate Arrow-Debreu market equilibrium in markets with linear and non-monotone utilities. Building on this result, we settle the long-standing open problem regarding the computation of an approximate Arrow-Debreu market equilibrium in markets with CES utility functions, by proving that it is PPAD-complete when the Constant Elasticity of Substitution parameter ρis any constant less than -1. △ Less

Submitted 20 November, 2012; originally announced November 2012.

Showing 1–5 of 5 results for author: Paparas, D