-
The German Tank Problem with Multiple Factories
Authors:
Steven J. Miller,
Kishan Sharma,
Andrew K. Yang
Abstract:
During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted two methods to estimate this information: espionage and statistical analysis. The latter approach was far more successful and is as follows: assuming that the tanks are sequentially numbered starting from 1, if we observe $k$ serial numbers from an unknown total of $N$ tanks…
▽ More
During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted two methods to estimate this information: espionage and statistical analysis. The latter approach was far more successful and is as follows: assuming that the tanks are sequentially numbered starting from 1, if we observe $k$ serial numbers from an unknown total of $N$ tanks, with the highest observed number being $M$, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$. This is now known as the German Tank Problem. Suppose one wishes to estimate the productivity of a rival by inspecting captured or destroyed tanks, each with a unique serial number. In many situations, the original German Tank Problem is insufficient, since typically there are $l>1$ factories, and tanks produced by different factories may have serial numbers in disjoint ranges that are often far separated, let alone sequentially numbered starting from 1. We wish to estimate the total tank production across all of the factories. We construct an efficient procedure to estimate the total productivity and prove that our procedure effectively estimates $N$ when $\log l/\log k$ is sufficiently small, and is robust against both large and small gaps between factories. In the final section, we show that given information about the gaps, we can make a far better estimator that is also effective when we have a small number of samples. When the number of samples is small compared to the number of gaps, the Mean Squared Error of this new estimator is several orders of magnitude smaller than the one that assumes no information. This quantifies the importance of hiding such information if one wishes to conceal their productivity from a rival.
△ Less
Submitted 11 April, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters
Authors:
Alexander F. Almeida,
Kevin Dayaratna,
Steven J. Miller,
Andrew K. Yang
Abstract:
Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $γ$ such that the winning percentage is approximately ${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$. One important consequence is to determine the value of di…
▽ More
Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $γ$ such that the winning percentage is approximately ${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$. One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then use the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is consistently better than earlier models over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood.
△ Less
Submitted 20 February, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
The Reversed Zeckendorf Game
Authors:
Zoë X. Batterman,
Aditya Jambhale,
Steven J. Miller,
Akash L. Narayanan,
Kishan Sharma,
Andrew K. Yang,
Chris Yao
Abstract:
Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy f…
▽ More
Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy for $n\geq 3$. Since their proof was non-constructive, other authors have studied the game to find a constructive winning strategy, and lacking success there turned to related problems. For example, Cheigh, Moura, Jeong, Duke, Milgrim, Miller, and Ngamlamai studied minimum and maximum game lengths and randomly played games. We explore a new direction and introduce the reversed Zeckendorf game, which starts at the ending state of the Zeckendorf game and flips all the moves, so the reversed game ends with all pieces in the first bin. We show that Player 1 has a winning strategy for $n = F_{i+1} + F_{i-2}$ and solve various modified games.
△ Less
Submitted 4 October, 2023; v1 submitted 22 September, 2023;
originally announced September 2023.