-
Application of data compression techniques to time series forecasting
Authors:
K. S. Chirikhin,
B. Ya. Ryabko
Abstract:
In this study we show that standard well-known file compression programs (zlib, bzip2, etc.) are able to forecast real-world time series data well. The strength of our approach is its ability to use a set of data compression algorithms and "automatically" choose the best one of them during the process of forecasting. Besides, modern data-compressors are able to find many kinds of latent regulariti…
▽ More
In this study we show that standard well-known file compression programs (zlib, bzip2, etc.) are able to forecast real-world time series data well. The strength of our approach is its ability to use a set of data compression algorithms and "automatically" choose the best one of them during the process of forecasting. Besides, modern data-compressors are able to find many kinds of latent regularities using some methods of artificial intelligence (for example, some data-compressors are based on finding the smallest formal grammar that describes the time series). Thus, our approach makes it possible to apply some particular methods of artificial intelligence for time-series forecasting.
As examples of the application of the proposed method, we made forecasts for the monthly T-index and the Kp-index time series using standard compressors. In both cases, we used the Mean Absolute Error (MAE) as an accuracy measure.
For the monthly T-index time series, we made 18 forecasts beyond the available data for each month since January 2011 to July 2017. We show that, in comparison with the forecasts made by the Australian Bureau of Meteorology, our method more accurately predicts one value ahead.
The Kp-index time series consists of 3-hour values ranging from 0 to 9. For each day from February 4, 2018 to March 28, 2018, we made forecasts for 24 values ahead. We compared our forecasts with the forecasts made by the Space Weather Prediction Center (SWPC). The results showed that the accuracy of our method is similar to the accuracy of the SWPC's method. As in the previous case, we also obtained more accurate one-step forecasts.
△ Less
Submitted 7 April, 2019;
originally announced April 2019.
-
Fast Recursive Coding Based on Grou** of Symbols
Authors:
Nikolay Ponomarenko,
Vladimir Lukin,
Karen Egiazarian,
Jaakko Astola,
Boris Y Ryabko
Abstract:
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
△ Less
Submitted 21 August, 2007;
originally announced August 2007.
-
Using Information Theory Approach to Randomness Testing
Authors:
B. Ya. Ryabko,
V. A. Monarev
Abstract:
We address the problem of detecting deviations of binary sequence from randomness,which is very important for random number (RNG) and pseudorandom number generators (PRNG). Namely, we consider a null hypothesis $H_0$ that a given bit sequence is generated by Bernoulli source with equal probabilities of 0 and 1 and the alternative hypothesis $H_1$ that the sequence is generated by a stationary an…
▽ More
We address the problem of detecting deviations of binary sequence from randomness,which is very important for random number (RNG) and pseudorandom number generators (PRNG). Namely, we consider a null hypothesis $H_0$ that a given bit sequence is generated by Bernoulli source with equal probabilities of 0 and 1 and the alternative hypothesis $H_1$ that the sequence is generated by a stationary and ergodic source which differs from the source under $H_0$. We show that data compression methods can be used as a basis for such testing and describe two new tests for randomness, which are based on ideas of universal coding. Known statistical tests and suggested ones are applied for testing PRNGs. Those experiments show that the power of the new tests is greater than of many known algorithms.
△ Less
Submitted 3 April, 2005;
originally announced April 2005.