-
Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race
Authors:
Joonas Pääkkönen
Abstract:
In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places…
▽ More
In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Fenton-Wilkinson Order Statistics and German Tanks: A Case Study of an Orienteering Relay Race
Authors:
Joonas Pääkkönen
Abstract:
Ordinal regression falls between discrete-valued classification and continuous-valued regression. Ordinal target variables can be associated with ranked random variables. These random variables are known as order statistics and they are closely related to ordinal regression. However, the challenge of using order statistics for ordinal regression prediction is finding a suitable parent distribution…
▽ More
Ordinal regression falls between discrete-valued classification and continuous-valued regression. Ordinal target variables can be associated with ranked random variables. These random variables are known as order statistics and they are closely related to ordinal regression. However, the challenge of using order statistics for ordinal regression prediction is finding a suitable parent distribution. In this work, we provide a case study of a real-world orienteering relay race by viewing it as a random process. For this process, we show that accurate order statistical ordinal regression predictions of final team rankings, or places, can be obtained by assuming a lognormal distribution of individual leg times. Moreover, we apply Fenton-Wilkinson approximations to intermediate changeover times alongside an estimator for the total number of teams as in the notorious German tank problem. The purpose of this work is, in part, to spark interest in studying the applicability of order statistics in ordinal regression problems.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Traffic Minimizing Caching and Latent Variable Distributions of Order Statistics
Authors:
Joonas Pääkkönen,
Prathapasinghe Dharmawansa,
Ragnar Freij-Hollanti,
Camilla Hollanti,
Olav Tirkkonen
Abstract:
Given a statistical model for the request frequencies and sizes of data objects in a caching system, we derive the probability density of the size of the file that accounts for the largest amount of data traffic. This is equivalent to finding the required size of the cache for a caching placement that maximizes the expected byte hit ratio for given file size and popularity distributions. The file…
▽ More
Given a statistical model for the request frequencies and sizes of data objects in a caching system, we derive the probability density of the size of the file that accounts for the largest amount of data traffic. This is equivalent to finding the required size of the cache for a caching placement that maximizes the expected byte hit ratio for given file size and popularity distributions. The file that maximizes the expected byte hit ratio is the file for which the product of its size and popularity is the highest -- thus, it is the file that incurs the greatest load on the network. We generalize this theoretical problem to cover factors and addends of arbitrary order statistics for given parent distributions. Further, we study the asymptotic behavior of these distributions. We give several factor and addend densities of widely-used distributions, and verify our results by extensive computer simulations.
△ Less
Submitted 13 April, 2017;
originally announced April 2017.
-
Coded Caching Clusters with Device-to-Device Communications
Authors:
Joonas Pääkkönen,
Amaro Barreal,
Camilla Hollanti,
Olav Tirkkonen
Abstract:
We consider a geographically constrained caching community where popular data files are cached on mobile terminals and distributed through Device-to-Device (D2D) communications. Further, to ensure availability, data files are protected against user mobility, or churn, with erasure coding. Communication and storage costs (in units of energy) are considered. We focus on finding the coding method tha…
▽ More
We consider a geographically constrained caching community where popular data files are cached on mobile terminals and distributed through Device-to-Device (D2D) communications. Further, to ensure availability, data files are protected against user mobility, or churn, with erasure coding. Communication and storage costs (in units of energy) are considered. We focus on finding the coding method that minimizes the overall cost in the network. Closed-form expressions for the expected energy consumption incurred by data delivery and redundancy maintenance are derived, and it is shown that coding significantly decreases the overall energy consumption -- by more than 90% in a realistic scenario. It is further shown that D2D caching can also yield notable economical savings for telecommunication operators. Our results are illustrated by numerical examples and verified by extensive computer simulations.
△ Less
Submitted 29 May, 2016;
originally announced May 2016.
-
Planet Hunters. VIII. Characterization of 41 Long-Period Exoplanet Candidates from Kepler Archival Data
Authors:
Ji Wang,
Debra A. Fischer,
Thomas Barclay,
Alyssa Picard,
Bo Ma,
Brendan P. Bowler,
Joseph R. Schmitt,
Tabetha S. Boyajian,
Kian J. Jek,
Daryll LaCourse,
Christoph Baranec,
Reed Riddle,
Nicholas M. Law,
Chris Lintott,
Kevin Schawinski,
Dean Joseph Simister,
Boscher Gregoire,
Sean P. Babin,
Trevor Poile,
Thomas Lee Jacobs,
Tony Jebson,
Mark R. Omohundro,
Hans Martin Schwengeler,
Johann Sejpka,
Ivan A. Terentev
, et al. (8 additional authors not shown)
Abstract:
The census of exoplanets is incomplete for orbital distances larger than 1 AU. Here, we present 41 long-period planet candidates in 38 systems identified by Planet Hunters based on Kepler archival data (Q0-Q17). Among them, 17 exhibit only one transit, 14 have two visible transits and 10 have more than three visible transits. For planet candidates with only one visible transit, we estimate their o…
▽ More
The census of exoplanets is incomplete for orbital distances larger than 1 AU. Here, we present 41 long-period planet candidates in 38 systems identified by Planet Hunters based on Kepler archival data (Q0-Q17). Among them, 17 exhibit only one transit, 14 have two visible transits and 10 have more than three visible transits. For planet candidates with only one visible transit, we estimate their orbital periods based on transit duration and host star properties. The majority of the planet candidates in this work (75%) have orbital periods that correspond to distances of 1-3 AU from their host stars. We conduct follow-up imaging and spectroscopic observations to validate and characterize planet host stars. In total, we obtain adaptive optics images for 33 stars to search for possible blending sources. Six stars have stellar companions within 4". We obtain high-resolution spectra for 6 stars to determine their physical properties. Stellar properties for other stars are obtained from the NASA Exoplanet Archive and the Kepler Stellar Catalog by Huber et al. (2014). We validate 7 planet candidates that have planet confidence over 0.997 (3-σ level). These validated planets include 3 single-transit planets (KIC-3558849b, KIC-5951458b, and KIC-8540376c), 3 planets with double transits (KIC-8540376b, KIC-9663113b, and KIC-10525077b), and 1 planet with 4 transits (KIC-5437945b). This work provides assessment regarding the existence of planets at wide separations and the associated false positive rate for transiting observation (17%-33%). More than half of the long-period planets with at least three transits in this paper exhibit transit timing variations up to 41 hours, which suggest additional components that dynamically interact with the transiting planet candidates. The nature of these components can be determined by follow-up radial velocity and transit observations.
△ Less
Submitted 17 December, 2015; v1 submitted 8 December, 2015;
originally announced December 2015.
-
A Low-Complexity Message Recovery Method for Compute-and-Forward Relaying
Authors:
Amaro Barreal,
Joonas Pääkkönen,
David Karpuk,
Camilla Hollanti,
Olav Tirkkonen
Abstract:
The Compute-and-Forward relaying strategy achieves high computation rates by decoding linear combinations of transmitted messages at intermediate relays. However, if the involved relays independently choose which combinations of the messages to decode, there is no guarantee that the overall system of linear equations is solvable at the destination. In this article it is shown that, for a Gaussian…
▽ More
The Compute-and-Forward relaying strategy achieves high computation rates by decoding linear combinations of transmitted messages at intermediate relays. However, if the involved relays independently choose which combinations of the messages to decode, there is no guarantee that the overall system of linear equations is solvable at the destination. In this article it is shown that, for a Gaussian fading channel model with two transmitters and two relays, always choosing the combination that maximizes the computation rate often leads to a case where the original messages cannot be recovered. It is further shown that by limiting the relays to select from carefully designed sets of equations, a solvable system can be guaranteed while maintaining high computation rates. The proposed method has a constant computational complexity and requires no information exchange between the relays.
△ Less
Submitted 20 April, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.
-
Device-to-Device Data Storage with Regenerating Codes
Authors:
Joonas Pääkkönen,
Camilla Hollanti,
Olav Tirkkonen
Abstract:
Caching data files directly on mobile user devices combined with device-to-device (D2D) communications has recently been suggested to improve the capacity of wireless net6works. We investigate the performance of regenerating codes in terms of the total energy consumption of a cellular network. We show that regenerating codes can offer large performance gains. It turns out that using redundancy aga…
▽ More
Caching data files directly on mobile user devices combined with device-to-device (D2D) communications has recently been suggested to improve the capacity of wireless net6works. We investigate the performance of regenerating codes in terms of the total energy consumption of a cellular network. We show that regenerating codes can offer large performance gains. It turns out that using redundancy against storage node failures is only beneficial if the popularity of the data is between certain thresholds. As our major contribution, we investigate under which circumstances regenerating codes with multiple redundant data fragments outdo uncoded caching.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Device-to-Device Data Storage for Mobile Cellular Systems
Authors:
J. Pääkkönen,
C. Hollanti,
O. Tirkkonen
Abstract:
As an alternative to downloading content from a cellular access network, mobile devices could be used to store data files and distribute them through device-to-device (D2D) communication. We consider a D2D-based storage community that is comprised of mobile users. Assuming that transmitting data from a base station to a mobile user consumes more energy than transmitting data between two mobile use…
▽ More
As an alternative to downloading content from a cellular access network, mobile devices could be used to store data files and distribute them through device-to-device (D2D) communication. We consider a D2D-based storage community that is comprised of mobile users. Assuming that transmitting data from a base station to a mobile user consumes more energy than transmitting data between two mobile users, we show that it can be beneficial to use redundant storage to ensure that data files stay available to the community even if some of the storing users leave the network. We derive a tractable closed-form equation stating when redundancy should be used in order to minimize the expected energy consumption of data retrieval. We find that replication is the preferred method of adding redundancy as opposed to regenerating codes. Our findings are verified by computer simulations.
△ Less
Submitted 24 September, 2013;
originally announced September 2013.
-
Planet Hunters: New Kepler planet candidates from analysis of quarter 2
Authors:
Chris J. Lintott,
Megan E. Schwamb,
Thomas Barclay,
Charlie Sharzer,
Debra A. Fischer,
John Brewer,
Matthew Giguere,
Stuart Lynn,
Michael Parrish,
Natalie Batalha,
Steve Bryson,
Jon Jenkins,
Darin Ragozzine,
Jason F. Rowe,
Kevin Schwainski,
Robert Gagliano,
Joe Gilardi,
Kian J. Jek,
Jari-Pekka Pääkkönen,
Tjapko Smits
Abstract:
We present new planet candidates identified in NASA Kepler quarter two public release data by volunteers engaged in the Planet Hunters citizen science project. The two candidates presented here survive checks for false-positives, including examination of the pixel offset to constrain the possibility of a background eclipsing binary. The orbital periods of the planet candidates are 97.46 days (KIC…
▽ More
We present new planet candidates identified in NASA Kepler quarter two public release data by volunteers engaged in the Planet Hunters citizen science project. The two candidates presented here survive checks for false-positives, including examination of the pixel offset to constrain the possibility of a background eclipsing binary. The orbital periods of the planet candidates are 97.46 days (KIC 4552729) and 284.03 (KIC 10005758) days and the modeled planet radii are 5.3 and 3.8 R_Earth. The latter star has an additional known planet candidate with a radius of 5.05 R_Earth and a period of 134.49 which was detected by the Kepler pipeline. The discovery of these candidates illustrates the value of massively distributed volunteer review of the Kepler database to recover candidates which were otherwise uncatalogued.
△ Less
Submitted 25 October, 2012; v1 submitted 27 February, 2012;
originally announced February 2012.